The Frontend Engineer’s Guide to Data-Intensive Applications
What I Learned from Part 1: Foundations of Data Systems
A practical summary of “Designing Data-Intensive Applications” Part 1, translated for frontend engineers building production-grade, real-time experiences.
Introduction: Why This “Backend Book” Changed How I Build Frontend Apps
When I picked up Martin Kleppmann’s “Designing Data-Intensive Applications,” I expected a deep dive into databases, distributed systems, and backend architecture. What I didn’t expect was a complete paradigm shift in how I think about frontend development.
The revelation: Modern web applications—especially real-time, data-heavy frontend experiences—face the exact same fundamental challenges as backend distributed systems. We’re just solving them in a different context.
Part 1 of the book, “Foundations of Data Systems,” covers four critical chapters that every engineer should understand, regardless of whether they work on frontend, backend, or full-stack. Here’s what I learned and why it matters for building production-grade frontend applications.
Chapter 1: Reliable, Scalable, and Maintainable Applications
The Three Pillars
Kleppmann introduces three core concerns for any software system:
1. Reliability - “Continuing to work correctly, even when things go wrong”
- Handling hardware faults, software bugs, and human errors
- Graceful degradation and fault tolerance
2. Scalability - “Reasonable ways of dealing with growth”
- Handling increased data volume, traffic, and complexity
- Performance characteristics as load increases
3. Maintainability - “Many people working productively over time”
- Operability, simplicity, and evolvability
- Adapting to new use cases
The Frontend Translation
These aren’t just backend concerns. Let me show you:
Reliability in Frontend:
// Network failures → optimistic updates, retry logic// Race conditions → request deduplication// Partial failures → error boundaries, fallback UI// State corruption → immutability, validation// Preventing faults → ESLint, TypeScript strict modeScalability in Frontend:
// Bundle size → tree shaking, compression, code splitting// State complexity → normalized state, memoization// Real-time updates → efficient diffing, virtualization// Asset optimization → CDN, caching strategiesMaintainability in Frontend:
// Code quality → ESLint, Prettier// Type safety → TypeScript, Zod// Component architecture → composition patterns// Testing → unit, integration, E2E// Documentation → Storybook, TSDocKey Insight: You’re a Data System Designer
When you build a modern frontend app, you’re composing multiple data systems:
- State managers (Redux, Zustand) - your “database”
- React Query / SWR - your async state and caching layer
- IndexedDB / LocalStorage - persistent storage
- WebSockets - message queues for real-time data
- Service Workers - background processing
Sound familiar? This is exactly the composite data system pattern Kleppmann describes. You’re not just an application developer—you’re a data system designer.
Chapter 2: Data Models and Query Languages
The Three Data Models
Chapter 2 explores how we structure and query data:
1. Relational Model (SQL)
- Data in tables
- Normalized - no duplication
- Requires joins for relationships
2. Document Model (NoSQL/JSON)
- Self-contained documents
- Better locality - everything in one place
- Natural for tree structures
3. Graph Model
- Nodes and edges
- Perfect for interconnected data
- Excels at relationship queries
The LinkedIn Profile Example
The book uses a brilliant example: representing a LinkedIn profile. This decision exists in frontend daily:
// Document style (nested)const user = { id: 251, name: "Bill Gates", posts: [ { id: 1, title: "Climate Change", comments: [...] } ]}
// Relational style (normalized)const state = { users: { 251: { id: 251, name: "Bill Gates" } }, posts: { 1: { id: 1, userId: 251, title: "..." } }, comments: { ... }}
// Graph style (connections)const graph = { nodes: { users: {...}, posts: {...} }, edges: [ { from: 'user:251', to: 'post:1', type: 'authored' }, { from: 'user:301', to: 'post:1', type: 'liked' } ]}The Normalization Tradeoff
The book introduces a critical concept: normalization vs. duplication
Using IDs (normalized):
- No duplication, easy to update, consistency
- Requires lookups/joins
Storing data directly (denormalized):
- Better locality, one query
- Update overhead, inconsistency risk
This is EXACTLY the decision you face with Redux state structure or any similar state management tool:
// When to normalize?// Frequent updates to individual entities// Shared references (same user in multiple places)// Many-to-many relationships
// When to nest?// Read-heavy, display focus// Tree structures// Self-contained dataGraphQL and the Graph Model
GraphQL isn’t just a query language—it’s a graph traversal system:
query { user(id: "1") { friends { # Traverse edge posts { # Traverse another edge likedBy { # Complex graph query name } } } }}Apollo Client’s normalized cache is literally a graph database in your frontend!
Key Insight: State Management IS Data Modeling
Every time you design a Redux store or use the reducer state pattern or decide how to structure component state, you’re choosing a data model. Understanding relational vs. document vs. graph models helps you make that choice consciously, not by accident. Even in situation where you are working with an external API or backend, you face similar consideration of how to structure the data.
Chapter 3: Storage and Retrieval
How Databases Actually Work
Chapter 3 demystifies storage engines with a brilliantly simple example:
# The world's simplest databasedb_set () { echo "$1,$2" >> database}
db_get () { grep "^$1," database | sed -e "s/^$1,//" | tail -n 1}Write performance: O(1) - append is fast! Read performance: O(n) - scan entire file. Terrible!
This illustrates the fundamental tradeoff in storage systems.
The Write vs. Read Tradeoff
Indexes speed up reads but slow down writes.
- Index = Additional metadata to help locate data
- Every write must update both data AND indexes
- You choose indexes based on query patterns
Sound familiar?
// Frontend parallel: Memoizationconst ExpensiveComponent = ({ data }) => { // Without memoization (no index) - recompute every render const processed = expensiveOperation(data);
// With memoization (adding an index) - compute once, cache result const processed = useMemo(() => expensiveOperation(data), [data]); // Tradeoff: More memory for faster reads};Storage Engine Patterns in Frontend
1. Append-Only Logs
Redux DevTools, Reducer pattern or a WebSocket client is literally an append-only log!
const actions = [ { type: "SET_USER", payload: { id: 1, name: "Alice" } }, { type: "SET_USER", payload: { id: 1, name: "Alice Smith" } }, // Append! { type: "INCREMENT_COUNTER", payload: 42 },];
// Time-travel works because we can replay the log2. Hash Indexes
JavaScript Maps are hash indexes:
// O(1) lookup - exactly like database hash indexconst index = new Map([ ["user:1", { offset: 0, length: 50 }], ["user:2", { offset: 50, length: 45 }],]);
const location = index.get("user:1"); // O(1)!React’s component keys? Hash indexes for reconciliation.
3. SSTables (Sorted String Tables)
IndexedDB is built on LevelDB, which uses SSTables! This is why IndexedDB supports range queries:
// Range query - only possible with sorted dataconst recentPosts = await db.getAllFromIndex( "posts", "timestamp", IDBKeyRange.bound(new Date("2025-01-01"), new Date("2025-12-31")),);
// Binary search on sorted index - O(log n) instead of O(n)!4. Segmentation and Compaction
Code splitting is segmentation:
// Instead of one giant bundle// main.js (2MB)
// Split into segmentsconst Dashboard = lazy(() => import("./Dashboard")); // 500KBconst Admin = lazy(() => import("./Admin")); // 300KB
// Load segments on demand - just like storage engines!Why Sequential Writes Matter
The book emphasizes: “Sequential writes are much faster than random writes.”
This applies to frontend too:
// SLOW: Random writesfor (let i = 0; i < 1000; i++) { await db.put("store", { id: i, data: Math.random() });}
// FAST: Sequential write (batch)const batch = [];for (let i = 0; i < 1000; i++) { batch.push({ id: i, data: Math.random() });}await db.bulkAdd("store", batch);Key Insight: Understanding Storage Explains Performance
When you understand how storage engines work, you understand:
- Why immutability simplifies things (append-only is crash-safe)
- Why batch operations are faster (sequential writes)
- Why indexes help some queries but not others (tradeoffs)
- Why Redux DevTools can time-travel (replay append-only log)
Chapter 4: Encoding and Evolution
The Core Challenge: Multiple Versions Running Simultaneously
Modern systems must handle different versions of code running at the same time:
- Backend: Rolling upgrades across servers
- Frontend: Users with old app versions still cached
The requirement: Backward and forward compatibility
Backward compatibility: New code can read old data Forward compatibility: Old code can read new data
The Problems with RPC (Remote Procedure Calls)
Kleppmann explains why trying to make network calls look like local function calls is fundamentally flawed:
Network requests are different from local calls:
- Unpredictable (can fail, timeout)
- Variable latency (milliseconds to seconds)
- May execute multiple times if retried
- Must encode everything as bytes
- Different programming languages
This is your daily reality as a frontend engineer:
// Treating network like local function (BAD)function getUser(id) { return fetch(`/api/users/${id}`).then((r) => r.json()); // What if timeout? Network down? Server error?}
// Acknowledging network reality (GOOD)function useUser(id) { return useQuery({ queryKey: ["user", id], queryFn: () => fetchUser(id), retry: 3, // Network may fail timeout: 10000, // May timeout onError: handleError, // Handle gracefully });}REST vs. GraphQL
REST: Doesn’t hide that it’s a network protocol (good!) GraphQL: Learned from RPC’s mistakes—strongly typed, client-controlled, acknowledges network reality
Message Passing: The Better Abstraction
Message brokers (RabbitMQ, Kafka) provide asynchronous communication:
- Buffer messages if recipient unavailable
- Automatic redelivery on failure
- Decouple sender from recipient
- One message to many recipients
Frontend equivalents:
// WebSockets = Message brokerconst ws = new WebSocket('wss://api.example.com')ws.onmessage = (event) => { const update = JSON.parse(event.data) dispatch({ type: update.type, payload: update.data })}
// BroadcastChannel = Pub/Sub between tabsconst channel = new BroadcastChannel('updates')channel.postMessage({ type: 'USER_UPDATE', data: {...} })
// Service Workers = Message broker between tabsnavigator.serviceWorker.controller.postMessage({...})Rolling Upgrades in Frontend
You face the same challenge as backend services:
The scenario:
- User hasn’t refreshed in 3 days (v1.0)
- You deployed v1.1 yesterday
- User’s old code hits new API
- Must maintain backward compatibility!
The solution:
// API v1interface User { id: string; name: string;}
// API v2 - Adding field safelyinterface User { id: string; name: string; email?: string; // Optional! Old clients don't break}
// Backend handles both versionsapp.post("/api/users", (c) => { const body = c.req.json();
const user = { id: generateId(), name: body.name, email: body.email || null, // Gracefully handle missing field };
db.insert(user); c.json(user);});Key Insight: Frontend Apps Are Distributed Systems
Modern web apps face the same distributed systems challenges:
- Network unpredictability
- Multiple versions running simultaneously
- Need for backward/forward compatibility
- Async communication patterns
Understanding these fundamentals makes you a better frontend engineer.
The Bigger Picture: How It All Connects
Part 1 builds a foundation by progressing through layers:
Chapter 1: What are we trying to achieve?
- Reliable, scalable, maintainable systems
Chapter 2: How do we structure data?
- Choose the right model for your use case
Chapter 3: How do we store and retrieve data?
- Understand the tradeoffs between write and read performance
Chapter 4: How do data and code evolve over time?
- Design for compatibility and change
The Frontend Journey
As a frontend engineer, this foundation transforms how you think about:
Architecture Decisions:
- Normalized vs. nested state (Chapter 2)
- When to add indexes/memoization or caching (Chapter 3)
- API versioning strategies (Chapter 4)
Performance Optimization:
- Bundle splitting = segmentation (Chapter 3)
- Batch operations = sequential writes (Chapter 3)
- Optimistic updates = async messaging (Chapter 4)
Reliability Patterns:
- Error boundaries = fault tolerance (Chapter 1)
- Retry logic = handling network unpredictability (Chapter 4)
- Immutability = crash safety (Chapter 3)
Practical Takeaways: Decision Frameworks
When to Normalize State?
Dos: Normalize when:
- Frequent updates to individual entities
- Shared references (same data in multiple places)
- Many-to-many relationships
Don’ts Keep nested when:
- Read-heavy, display-focused
- Tree structures (one-to-many)
- Self-contained data
When to Add Indexes (Memoization)?
Dos Add indexes when:
- Expensive computations
- Frequently accessed
- Stable inputs (high cache hit rate)
Don’ts Skip indexes when:
- Simple, fast operations
- Constantly changing inputs
- Memory constrained
When to Use Message Passing vs. Request/Response?
Dos Message passing (WebSockets) when:
- Real-time updates
- Server-initiated updates
- Multiple subscribers
- Fire-and-forget communication
Dos Request/Response (REST/GraphQL) when:
- Client-initiated queries
- Need immediate response
- Request/response pattern
- Cacheable data
How to Handle API Evolution?
The Golden Rules:
- Add fields as optional (backward compatible)
- Never remove required fields (breaks old clients)
- Version your APIs (URL, header, or content negotiation)
- Use feature detection when possible
- Test with old clients before deploying
Key Patterns to Apply Immediately
1. The Append-Only Log Pattern
// Use for: Undo/redo, time-travel debugging, audit trailsconst eventLog = [];
function dispatch(action) { eventLog.push(action); // Append only! currentState = reducer(currentState, action);}
function undo() { // Replay all but last action const newLog = eventLog.slice(0, -1); return newLog.reduce(reducer, initialState);}2. The Normalized Cache Pattern
// Use for: Shared data, frequent updatesconst cache = { users: { 1: {...}, 2: {...} }, posts: { 1: { userId: 1, ...}, 2: {...} }}
// Update user once, reflected everywherecache.users[1].name = "New Name"3. The Optimistic Update Pattern
// Use for: Variable network latency3 collapsed lines
const { mutate } = useMutation({ mutationFn: likePost, onMutate: async (postId) => { // Update immediately queryClient.setQueryData(["post", postId], (old) => ({ ...old, liked: true, })); }, onError: (err, postId, context) => { // Rollback on failure queryClient.setQueryData(["post", postId], context.previous); },});4. The Compatibility Pattern
// Use for: API evolutioninterface Config {2 collapsed lines
version: number; features: string[]; // Add new fields as optional newFeature?: boolean;}
function handleConfig(config: Config) { // Feature detection if (config.features?.includes("real-time")) { enableWebSocket(); }
// Gracefully handle new fields if (config.newFeature !== undefined) { useNewFeature(config.newFeature); }}Questions to Ask When Designing Systems
Reliability
- What happens when the network fails?
- How do we handle partial failures?
- What’s our error recovery strategy?
- How do we prevent data corruption?
Scalability
- How does performance change with more data?
- What’s our bundle size strategy?
- How do we handle growing state complexity?
- Where are the bottlenecks?
Maintainability
- Can new team members understand this?
- How do we test this effectively?
- How easy is it to change?
- What’s our documentation strategy?
Data Modeling
- Is this data naturally nested or interconnected?
- How often do we update vs. read?
- Do we have many-to-many relationships?
- What are the query patterns?
Storage
- What’s the read/write ratio?
- Do we need range queries?
- How important is locality?
- What’s the data volume?
Evolution
- How do we version this?
- What’s our compatibility strategy?
- How do we handle rolling upgrades?
- Can old clients still work?
Conclusion: The Foundation That Changes Everything
Part 1 of “Designing Data-Intensive Applications” isn’t just about databases and backend systems. It’s about fundamental principles that apply to any system that stores, processes, and communicates data.
As frontend engineers building production-grade, real-time experiences, we face the same challenges:
- Reliability: Networks fail, users make mistakes, bugs happen
- Scalability: Apps grow in complexity, data volume, and traffic
- Maintainability: Teams change, requirements evolve, systems must adapt
Understanding these foundations transforms how you:
- Design state management (Chapter 2: data models)
- Optimize performance (Chapter 3: storage patterns)
- Handle network communication (Chapter 4: distributed systems)
- Build for change (All chapters: evolution and compatibility)
What’s Next?
Part 2 of the book dives into Distributed Data:
- Chapter 5: Replication
- Chapter 6: Partitioning
- Chapter 7: Transactions
- Chapter 8: The Trouble with Distributed Systems
- Chapter 9: Consistency and Consensus
These chapters become even more relevant as you build:
- Real-time collaborative features
- Offline-first applications
- Multi-region systems
- Complex state synchronization
The Mindset Shift
The biggest takeaway from Part 1? Stop thinking of frontend and backend as separate worlds. They’re both data systems facing the same fundamental challenges, just at different scales and contexts.
When you understand these foundations, you don’t just write code—you design systems. And that’s what separates good engineers from great ones.
Resources for Further Learning
For deeper understanding:
- Martin Kleppmann’s blog
- Jepsen analyses - distributed systems testing
- The Morning Paper - research paper summaries
For frontend-specific applications:
- Redux Toolkit Entity Adapter - normalized state
- TanStack Query - async state management
- Zustand - lightweight state
- IndexedDB API - browser storage
For building real-time systems:
- WebSocket API
- PartyKit - PartyServer, PartySocket, Y-PartyServer, partysub, partysync, partywhen
- Server-Sent Events
- BroadcastChannel API
This summary is based on “Designing Data-Intensive Applications” by Martin Kleppmann. All concepts and examples are translated and adapted for frontend engineering context. For the full treatment and complete understanding, I highly recommend reading the original book.
Happy building! 🚀
Update Changelog
November 5, 2025 - Enhanced code presentation with Expressive Code
What Changed:
- Added Expressive Code frames: All code blocks now render with syntax-highlighted frames featuring file name tabs where applicable (e.g.,
expensive-component.ts) - Improved code readability: TypeScript, bash, and GraphQL code blocks now display with better syntax highlighting using dual themes (Dracula for dark mode, GitHub Light for light mode)
- Enhanced visual hierarchy: Code examples throughout all four chapters now have clearer visual separation from surrounding text
- Better language identification: All code blocks now explicitly specify their language for accurate syntax highlighting
Why These Changes:
These improvements make the technical content more accessible and easier to follow. The enhanced code presentation helps readers quickly identify code examples and understand the concepts being illustrated. The dual-theme support ensures readability in both light and dark modes, improving the overall reading experience for this technical deep-dive.
November 6, 2025 - Added PartyKit resources
What Changed:
- Added PartyKit resources: Added PartyKit resources to the resources section