The Frontend Engineer’s Guide to Data-Intensive Applications
What I Learned from Part 1: Foundations of Data Systems
A practical summary of “Designing Data-Intensive Applications” Part 1, translated for frontend engineers building production-grade, real-time experiences.
Introduction: Why This “Backend Book” Changed How I Build Frontend Apps
When I picked up Martin Kleppmann’s “Designing Data-Intensive Applications,” I expected a deep dive into databases, distributed systems, and backend architecture. What I didn’t expect was a complete paradigm shift in how I think about frontend development.
The revelation: Modern web applications—especially real-time, data-heavy frontend experiences—face the exact same fundamental challenges as backend distributed systems. We’re just solving them in a different context.
Part 1 of the book, “Foundations of Data Systems,” covers four critical chapters that every engineer should understand, regardless of whether they work on frontend, backend, or full-stack. Here’s what I learned and why it matters for building production-grade frontend applications.
Chapter 1: Reliable, Scalable, and Maintainable Applications
The Three Pillars
Kleppmann introduces three core concerns for any software system:
1. Reliability - “Continuing to work correctly, even when things go wrong”
- Handling hardware faults, software bugs, and human errors
- Graceful degradation and fault tolerance
2. Scalability - “Reasonable ways of dealing with growth”
- Handling increased data volume, traffic, and complexity
- Performance characteristics as load increases
3. Maintainability - “Many people working productively over time”
- Operability, simplicity, and evolvability
- Adapting to new use cases
The Frontend Translation
These aren’t just backend concerns. Let me show you:
Reliability in Frontend:
// Network failures → optimistic updates, retry logic
// Race conditions → request deduplication
// Partial failures → error boundaries, fallback UI
// State corruption → immutability, validation
// Preventing faults → ESLint, TypeScript strict mode
Scalability in Frontend:
// Bundle size → tree shaking, compression, code splitting
// State complexity → normalized state, memoization
// Real-time updates → efficient diffing, virtualization
// Asset optimization → CDN, caching strategies
Maintainability in Frontend:
// Code quality → ESLint, Prettier
// Type safety → TypeScript, Zod
// Component architecture → composition patterns
// Testing → unit, integration, E2E
// Documentation → Storybook, TSDoc
Key Insight: You’re a Data System Designer
When you build a modern frontend app, you’re composing multiple data systems:
- State managers (Redux, Zustand) - your “database”
- React Query / SWR - your async state and caching layer
- IndexedDB / LocalStorage - persistent storage
- WebSockets - message queues for real-time data
- Service Workers - background processing
Sound familiar? This is exactly the composite data system pattern Kleppmann describes. You’re not just an application developer—you’re a data system designer.
Chapter 2: Data Models and Query Languages
The Three Data Models
Chapter 2 explores how we structure and query data:
1. Relational Model (SQL)
- Data in tables
- Normalized - no duplication
- Requires joins for relationships
2. Document Model (NoSQL/JSON)
- Self-contained documents
- Better locality - everything in one place
- Natural for tree structures
3. Graph Model
- Nodes and edges
- Perfect for interconnected data
- Excels at relationship queries
The LinkedIn Profile Example
The book uses a brilliant example: representing a LinkedIn profile. This decision exists in frontend daily:
// Document style (nested)
const user = {
id: 251,
name: "Bill Gates",
posts: [
{ id: 1, title: "Climate Change", comments: [...] }
]
}
// Relational style (normalized)
const state = {
users: { 251: { id: 251, name: "Bill Gates" } },
posts: { 1: { id: 1, userId: 251, title: "..." } },
comments: { ... }
}
// Graph style (connections)
const graph = {
nodes: { users: {...}, posts: {...} },
edges: [
{ from: 'user:251', to: 'post:1', type: 'authored' },
{ from: 'user:301', to: 'post:1', type: 'liked' }
]
}
The Normalization Tradeoff
The book introduces a critical concept: normalization vs. duplication
Using IDs (normalized):
- No duplication, easy to update, consistency
- Requires lookups/joins
Storing data directly (denormalized):
- Better locality, one query
- Update overhead, inconsistency risk
This is EXACTLY the decision you face with Redux state structure or any similar state management tool:
// When to normalize?
// Frequent updates to individual entities
// Shared references (same user in multiple places)
// Many-to-many relationships
// When to nest?
// Read-heavy, display focus
// Tree structures
// Self-contained data
GraphQL and the Graph Model
GraphQL isn’t just a query language—it’s a graph traversal system:
query {
user(id: "1") {
friends {
# Traverse edge
posts {
# Traverse another edge
likedBy {
# Complex graph query
name
}
}
}
}
}
Apollo Client’s normalized cache is literally a graph database in your frontend!
Key Insight: State Management IS Data Modeling
Every time you design a Redux store or use the reducer state pattern or decide how to structure component state, you’re choosing a data model. Understanding relational vs. document vs. graph models helps you make that choice consciously, not by accident. Even in situation where you are working with an external API or backend, you face similar consideration of how to structure the data.
Chapter 3: Storage and Retrieval
How Databases Actually Work
Chapter 3 demystifies storage engines with a brilliantly simple example:
# The world's simplest database
db_set () {
echo "$1,$2" >> database
}
db_get () {
grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}
Write performance: O(1) - append is fast! Read performance: O(n) - scan entire file. Terrible!
This illustrates the fundamental tradeoff in storage systems.
The Write vs. Read Tradeoff
Indexes speed up reads but slow down writes.
- Index = Additional metadata to help locate data
- Every write must update both data AND indexes
- You choose indexes based on query patterns
Sound familiar?
// Frontend parallel: Memoization
const ExpensiveComponent = ({ data }) => {
// Without memoization (no index) - recompute every render
const processed = expensiveOperation(data);
// With memoization (adding an index) - compute once, cache result
const processed = useMemo(() => expensiveOperation(data), [data]);
// Tradeoff: More memory for faster reads
};
Storage Engine Patterns in Frontend
1. Append-Only Logs
Redux DevTools, Reducer pattern or a WebSocket client is literally an append-only log!
const actions = [
{ type: "SET_USER", payload: { id: 1, name: "Alice" } },
{ type: "SET_USER", payload: { id: 1, name: "Alice Smith" } }, // Append!
{ type: "INCREMENT_COUNTER", payload: 42 },
];
// Time-travel works because we can replay the log
2. Hash Indexes
JavaScript Maps are hash indexes:
// O(1) lookup - exactly like database hash index
const index = new Map([
["user:1", { offset: 0, length: 50 }],
["user:2", { offset: 50, length: 45 }],
]);
const location = index.get("user:1"); // O(1)!
React’s component keys? Hash indexes for reconciliation.
3. SSTables (Sorted String Tables)
IndexedDB is built on LevelDB, which uses SSTables! This is why IndexedDB supports range queries:
// Range query - only possible with sorted data
const recentPosts = await db.getAllFromIndex(
"posts",
"timestamp",
IDBKeyRange.bound(new Date("2025-01-01"), new Date("2025-12-31")),
);
// Binary search on sorted index - O(log n) instead of O(n)!
4. Segmentation and Compaction
Code splitting is segmentation:
// Instead of one giant bundle
// main.js (2MB)
// Split into segments
const Dashboard = lazy(() => import("./Dashboard")); // 500KB
const Admin = lazy(() => import("./Admin")); // 300KB
// Load segments on demand - just like storage engines!
Why Sequential Writes Matter
The book emphasizes: “Sequential writes are much faster than random writes.”
This applies to frontend too:
// SLOW: Random writes
for (let i = 0; i < 1000; i++) {
await db.put("store", { id: i, data: Math.random() });
}
// FAST: Sequential write (batch)
const batch = [];
for (let i = 0; i < 1000; i++) {
batch.push({ id: i, data: Math.random() });
}
await db.bulkAdd("store", batch);
Key Insight: Understanding Storage Explains Performance
When you understand how storage engines work, you understand:
- Why immutability simplifies things (append-only is crash-safe)
- Why batch operations are faster (sequential writes)
- Why indexes help some queries but not others (tradeoffs)
- Why Redux DevTools can time-travel (replay append-only log)
Chapter 4: Encoding and Evolution
The Core Challenge: Multiple Versions Running Simultaneously
Modern systems must handle different versions of code running at the same time:
- Backend: Rolling upgrades across servers
- Frontend: Users with old app versions still cached
The requirement: Backward and forward compatibility
Backward compatibility: New code can read old data Forward compatibility: Old code can read new data
The Problems with RPC (Remote Procedure Calls)
Kleppmann explains why trying to make network calls look like local function calls is fundamentally flawed:
Network requests are different from local calls:
- Unpredictable (can fail, timeout)
- Variable latency (milliseconds to seconds)
- May execute multiple times if retried
- Must encode everything as bytes
- Different programming languages
This is your daily reality as a frontend engineer:
// Treating network like local function (BAD)
function getUser(id) {
return fetch(`/api/users/${id}`).then((r) => r.json());
// What if timeout? Network down? Server error?
}
// Acknowledging network reality (GOOD)
function useUser(id) {
return useQuery({
queryKey: ["user", id],
queryFn: () => fetchUser(id),
retry: 3, // Network may fail
timeout: 10000, // May timeout
onError: handleError, // Handle gracefully
});
}
REST vs. GraphQL
REST: Doesn’t hide that it’s a network protocol (good!) GraphQL: Learned from RPC’s mistakes—strongly typed, client-controlled, acknowledges network reality
Message Passing: The Better Abstraction
Message brokers (RabbitMQ, Kafka) provide asynchronous communication:
- Buffer messages if recipient unavailable
- Automatic redelivery on failure
- Decouple sender from recipient
- One message to many recipients
Frontend equivalents:
// WebSockets = Message broker
const ws = new WebSocket('wss://api.example.com')
ws.onmessage = (event) => {
const update = JSON.parse(event.data)
dispatch({ type: update.type, payload: update.data })
}
// BroadcastChannel = Pub/Sub between tabs
const channel = new BroadcastChannel('updates')
channel.postMessage({ type: 'USER_UPDATE', data: {...} })
// Service Workers = Message broker between tabs
navigator.serviceWorker.controller.postMessage({...})
Rolling Upgrades in Frontend
You face the same challenge as backend services:
The scenario:
- User hasn’t refreshed in 3 days (v1.0)
- You deployed v1.1 yesterday
- User’s old code hits new API
- Must maintain backward compatibility!
The solution:
// API v1
interface User {
id: string;
name: string;
}
// API v2 - Adding field safely
interface User {
id: string;
name: string;
email?: string; // Optional! Old clients don't break
}
// Backend handles both versions
app.post("/api/users", (c) => {
const body = c.req.json();
const user = {
id: generateId(),
name: body.name,
email: body.email || null, // Gracefully handle missing field
};
db.insert(user);
c.json(user);
});
Key Insight: Frontend Apps Are Distributed Systems
Modern web apps face the same distributed systems challenges:
- Network unpredictability
- Multiple versions running simultaneously
- Need for backward/forward compatibility
- Async communication patterns
Understanding these fundamentals makes you a better frontend engineer.
The Bigger Picture: How It All Connects
Part 1 builds a foundation by progressing through layers:
Chapter 1: What are we trying to achieve?
- Reliable, scalable, maintainable systems
Chapter 2: How do we structure data?
- Choose the right model for your use case
Chapter 3: How do we store and retrieve data?
- Understand the tradeoffs between write and read performance
Chapter 4: How do data and code evolve over time?
- Design for compatibility and change
The Frontend Journey
As a frontend engineer, this foundation transforms how you think about:
Architecture Decisions:
- Normalized vs. nested state (Chapter 2)
- When to add indexes/memoization or caching (Chapter 3)
- API versioning strategies (Chapter 4)
Performance Optimization:
- Bundle splitting = segmentation (Chapter 3)
- Batch operations = sequential writes (Chapter 3)
- Optimistic updates = async messaging (Chapter 4)
Reliability Patterns:
- Error boundaries = fault tolerance (Chapter 1)
- Retry logic = handling network unpredictability (Chapter 4)
- Immutability = crash safety (Chapter 3)
Practical Takeaways: Decision Frameworks
When to Normalize State?
Dos: Normalize when:
- Frequent updates to individual entities
- Shared references (same data in multiple places)
- Many-to-many relationships
Don’ts Keep nested when:
- Read-heavy, display-focused
- Tree structures (one-to-many)
- Self-contained data
When to Add Indexes (Memoization)?
Dos Add indexes when:
- Expensive computations
- Frequently accessed
- Stable inputs (high cache hit rate)
Don’ts Skip indexes when:
- Simple, fast operations
- Constantly changing inputs
- Memory constrained
When to Use Message Passing vs. Request/Response?
Dos Message passing (WebSockets) when:
- Real-time updates
- Server-initiated updates
- Multiple subscribers
- Fire-and-forget communication
Dos Request/Response (REST/GraphQL) when:
- Client-initiated queries
- Need immediate response
- Request/response pattern
- Cacheable data
How to Handle API Evolution?
The Golden Rules:
- Add fields as optional (backward compatible)
- Never remove required fields (breaks old clients)
- Version your APIs (URL, header, or content negotiation)
- Use feature detection when possible
- Test with old clients before deploying
Key Patterns to Apply Immediately
1. The Append-Only Log Pattern
// Use for: Undo/redo, time-travel debugging, audit trails
const eventLog = [];
function dispatch(action) {
eventLog.push(action); // Append only!
currentState = reducer(currentState, action);
}
function undo() {
// Replay all but last action
const newLog = eventLog.slice(0, -1);
return newLog.reduce(reducer, initialState);
}
2. The Normalized Cache Pattern
// Use for: Shared data, frequent updates
const cache = {
users: { 1: {...}, 2: {...} },
posts: { 1: { userId: 1, ...}, 2: {...} }
}
// Update user once, reflected everywhere
cache.users[1].name = "New Name"
3. The Optimistic Update Pattern
// Use for: Variable network latency
const { mutate } = useMutation({
mutationFn: likePost,
onMutate: async (postId) => {
// Update immediately
queryClient.setQueryData(["post", postId], (old) => ({
...old,
liked: true,
}));
},
onError: (err, postId, context) => {
// Rollback on failure
queryClient.setQueryData(["post", postId], context.previous);
},
});
4. The Compatibility Pattern
// Use for: API evolution
interface Config {
version: number;
features: string[];
// Add new fields as optional
newFeature?: boolean;
}
function handleConfig(config: Config) {
// Feature detection
if (config.features?.includes("real-time")) {
enableWebSocket();
}
// Gracefully handle new fields
if (config.newFeature !== undefined) {
useNewFeature(config.newFeature);
}
}
Questions to Ask When Designing Systems
Reliability
- What happens when the network fails?
- How do we handle partial failures?
- What’s our error recovery strategy?
- How do we prevent data corruption?
Scalability
- How does performance change with more data?
- What’s our bundle size strategy?
- How do we handle growing state complexity?
- Where are the bottlenecks?
Maintainability
- Can new team members understand this?
- How do we test this effectively?
- How easy is it to change?
- What’s our documentation strategy?
Data Modeling
- Is this data naturally nested or interconnected?
- How often do we update vs. read?
- Do we have many-to-many relationships?
- What are the query patterns?
Storage
- What’s the read/write ratio?
- Do we need range queries?
- How important is locality?
- What’s the data volume?
Evolution
- How do we version this?
- What’s our compatibility strategy?
- How do we handle rolling upgrades?
- Can old clients still work?
Conclusion: The Foundation That Changes Everything
Part 1 of “Designing Data-Intensive Applications” isn’t just about databases and backend systems. It’s about fundamental principles that apply to any system that stores, processes, and communicates data.
As frontend engineers building production-grade, real-time experiences, we face the same challenges:
- Reliability: Networks fail, users make mistakes, bugs happen
- Scalability: Apps grow in complexity, data volume, and traffic
- Maintainability: Teams change, requirements evolve, systems must adapt
Understanding these foundations transforms how you:
- Design state management (Chapter 2: data models)
- Optimize performance (Chapter 3: storage patterns)
- Handle network communication (Chapter 4: distributed systems)
- Build for change (All chapters: evolution and compatibility)
What’s Next?
Part 2 of the book dives into Distributed Data:
- Chapter 5: Replication
- Chapter 6: Partitioning
- Chapter 7: Transactions
- Chapter 8: The Trouble with Distributed Systems
- Chapter 9: Consistency and Consensus
These chapters become even more relevant as you build:
- Real-time collaborative features
- Offline-first applications
- Multi-region systems
- Complex state synchronization
The Mindset Shift
The biggest takeaway from Part 1? Stop thinking of frontend and backend as separate worlds. They’re both data systems facing the same fundamental challenges, just at different scales and contexts.
When you understand these foundations, you don’t just write code—you design systems. And that’s what separates good engineers from great ones.
Resources for Further Learning
For deeper understanding:
- Martin Kleppmann’s blog
- Jepsen analyses - distributed systems testing
- The Morning Paper - research paper summaries
For frontend-specific applications:
- Redux Toolkit Entity Adapter - normalized state
- TanStack Query - async state management
- Zustand - lightweight state
- IndexedDB API - browser storage
For building real-time systems:
This summary is based on “Designing Data-Intensive Applications” by Martin Kleppmann. All concepts and examples are translated and adapted for frontend engineering context. For the full treatment and complete understanding, I highly recommend reading the original book.
Happy building! 🚀