Cover image for Designing Data-Intensive Applications - What I Learned from Part 1: Foundations of Data Systems
Photo by [Amsterdam City Archives](https://unsplash.com/@amsterdamcityarchives?utm_source=oluwasetemi.dev&utm_medium=referral) on [Unsplash](https://unsplash.com/?utm_source=oluwasetemi.dev&utm_medium=referral)

The Frontend Engineer’s Guide to Data-Intensive Applications

What I Learned from Part 1: Foundations of Data Systems

A practical summary of “Designing Data-Intensive Applications” Part 1, translated for frontend engineers building production-grade, real-time experiences.


Introduction: Why This “Backend Book” Changed How I Build Frontend Apps

When I picked up Martin Kleppmann’s “Designing Data-Intensive Applications,” I expected a deep dive into databases, distributed systems, and backend architecture. What I didn’t expect was a complete paradigm shift in how I think about frontend development.

The revelation: Modern web applications—especially real-time, data-heavy frontend experiences—face the exact same fundamental challenges as backend distributed systems. We’re just solving them in a different context.

Part 1 of the book, “Foundations of Data Systems,” covers four critical chapters that every engineer should understand, regardless of whether they work on frontend, backend, or full-stack. Here’s what I learned and why it matters for building production-grade frontend applications.


Chapter 1: Reliable, Scalable, and Maintainable Applications

The Three Pillars

Kleppmann introduces three core concerns for any software system:

1. Reliability - “Continuing to work correctly, even when things go wrong”

  • Handling hardware faults, software bugs, and human errors
  • Graceful degradation and fault tolerance

2. Scalability - “Reasonable ways of dealing with growth”

  • Handling increased data volume, traffic, and complexity
  • Performance characteristics as load increases

3. Maintainability - “Many people working productively over time”

  • Operability, simplicity, and evolvability
  • Adapting to new use cases

The Frontend Translation

These aren’t just backend concerns. Let me show you:

Reliability in Frontend:

// Network failures → optimistic updates, retry logic
// Race conditions → request deduplication
// Partial failures → error boundaries, fallback UI
// State corruption → immutability, validation
// Preventing faults → ESLint, TypeScript strict mode

Scalability in Frontend:

// Bundle size → tree shaking, compression, code splitting
// State complexity → normalized state, memoization
// Real-time updates → efficient diffing, virtualization
// Asset optimization → CDN, caching strategies

Maintainability in Frontend:

// Code quality → ESLint, Prettier
// Type safety → TypeScript, Zod
// Component architecture → composition patterns
// Testing → unit, integration, E2E
// Documentation → Storybook, TSDoc

Key Insight: You’re a Data System Designer

When you build a modern frontend app, you’re composing multiple data systems:

  • State managers (Redux, Zustand) - your “database”
  • React Query / SWR - your async state and caching layer
  • IndexedDB / LocalStorage - persistent storage
  • WebSockets - message queues for real-time data
  • Service Workers - background processing

Sound familiar? This is exactly the composite data system pattern Kleppmann describes. You’re not just an application developer—you’re a data system designer.


Chapter 2: Data Models and Query Languages

The Three Data Models

Chapter 2 explores how we structure and query data:

1. Relational Model (SQL)

  • Data in tables
  • Normalized - no duplication
  • Requires joins for relationships

2. Document Model (NoSQL/JSON)

  • Self-contained documents
  • Better locality - everything in one place
  • Natural for tree structures

3. Graph Model

  • Nodes and edges
  • Perfect for interconnected data
  • Excels at relationship queries

The LinkedIn Profile Example

The book uses a brilliant example: representing a LinkedIn profile. This decision exists in frontend daily:

// Document style (nested)
const user = {
  id: 251,
  name: "Bill Gates",
  posts: [
    { id: 1, title: "Climate Change", comments: [...] }
  ]
}

// Relational style (normalized)
const state = {
  users: { 251: { id: 251, name: "Bill Gates" } },
  posts: { 1: { id: 1, userId: 251, title: "..." } },
  comments: { ... }
}

// Graph style (connections)
const graph = {
  nodes: { users: {...}, posts: {...} },
  edges: [
    { from: 'user:251', to: 'post:1', type: 'authored' },
    { from: 'user:301', to: 'post:1', type: 'liked' }
  ]
}

The Normalization Tradeoff

The book introduces a critical concept: normalization vs. duplication

Using IDs (normalized):

  • No duplication, easy to update, consistency
  • Requires lookups/joins

Storing data directly (denormalized):

  • Better locality, one query
  • Update overhead, inconsistency risk

This is EXACTLY the decision you face with Redux state structure or any similar state management tool:

// When to normalize?
// Frequent updates to individual entities
// Shared references (same user in multiple places)
// Many-to-many relationships

// When to nest?
// Read-heavy, display focus
// Tree structures
// Self-contained data

GraphQL and the Graph Model

GraphQL isn’t just a query language—it’s a graph traversal system:

query {
  user(id: "1") {
    friends {
      # Traverse edge
      posts {
        # Traverse another edge
        likedBy {
          # Complex graph query
          name
        }
      }
    }
  }
}

Apollo Client’s normalized cache is literally a graph database in your frontend!

Key Insight: State Management IS Data Modeling

Every time you design a Redux store or use the reducer state pattern or decide how to structure component state, you’re choosing a data model. Understanding relational vs. document vs. graph models helps you make that choice consciously, not by accident. Even in situation where you are working with an external API or backend, you face similar consideration of how to structure the data.


Chapter 3: Storage and Retrieval

How Databases Actually Work

Chapter 3 demystifies storage engines with a brilliantly simple example:

# The world's simplest database
db_set () {
  echo "$1,$2" >> database
}

db_get () {
  grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}

Write performance: O(1) - append is fast! Read performance: O(n) - scan entire file. Terrible!

This illustrates the fundamental tradeoff in storage systems.

The Write vs. Read Tradeoff

Indexes speed up reads but slow down writes.

  • Index = Additional metadata to help locate data
  • Every write must update both data AND indexes
  • You choose indexes based on query patterns

Sound familiar?

// Frontend parallel: Memoization
const ExpensiveComponent = ({ data }) => {
  // Without memoization (no index) - recompute every render
  const processed = expensiveOperation(data);

  // With memoization (adding an index) - compute once, cache result
  const processed = useMemo(() => expensiveOperation(data), [data]);
  // Tradeoff: More memory for faster reads
};

Storage Engine Patterns in Frontend

1. Append-Only Logs

Redux DevTools, Reducer pattern or a WebSocket client is literally an append-only log!

const actions = [
  { type: "SET_USER", payload: { id: 1, name: "Alice" } },
  { type: "SET_USER", payload: { id: 1, name: "Alice Smith" } }, // Append!
  { type: "INCREMENT_COUNTER", payload: 42 },
];

// Time-travel works because we can replay the log

2. Hash Indexes

JavaScript Maps are hash indexes:

// O(1) lookup - exactly like database hash index
const index = new Map([
  ["user:1", { offset: 0, length: 50 }],
  ["user:2", { offset: 50, length: 45 }],
]);

const location = index.get("user:1"); // O(1)!

React’s component keys? Hash indexes for reconciliation.

3. SSTables (Sorted String Tables)

IndexedDB is built on LevelDB, which uses SSTables! This is why IndexedDB supports range queries:

// Range query - only possible with sorted data
const recentPosts = await db.getAllFromIndex(
  "posts",
  "timestamp",
  IDBKeyRange.bound(new Date("2025-01-01"), new Date("2025-12-31")),
);

// Binary search on sorted index - O(log n) instead of O(n)!

4. Segmentation and Compaction

Code splitting is segmentation:

// Instead of one giant bundle
// main.js (2MB)

// Split into segments
const Dashboard = lazy(() => import("./Dashboard")); // 500KB
const Admin = lazy(() => import("./Admin")); // 300KB

// Load segments on demand - just like storage engines!

Why Sequential Writes Matter

The book emphasizes: “Sequential writes are much faster than random writes.”

This applies to frontend too:

// SLOW: Random writes
for (let i = 0; i < 1000; i++) {
  await db.put("store", { id: i, data: Math.random() });
}

// FAST: Sequential write (batch)
const batch = [];
for (let i = 0; i < 1000; i++) {
  batch.push({ id: i, data: Math.random() });
}
await db.bulkAdd("store", batch);

Key Insight: Understanding Storage Explains Performance

When you understand how storage engines work, you understand:

  • Why immutability simplifies things (append-only is crash-safe)
  • Why batch operations are faster (sequential writes)
  • Why indexes help some queries but not others (tradeoffs)
  • Why Redux DevTools can time-travel (replay append-only log)

Chapter 4: Encoding and Evolution

The Core Challenge: Multiple Versions Running Simultaneously

Modern systems must handle different versions of code running at the same time:

  • Backend: Rolling upgrades across servers
  • Frontend: Users with old app versions still cached

The requirement: Backward and forward compatibility

Backward compatibility: New code can read old data Forward compatibility: Old code can read new data

The Problems with RPC (Remote Procedure Calls)

Kleppmann explains why trying to make network calls look like local function calls is fundamentally flawed:

Network requests are different from local calls:

  • Unpredictable (can fail, timeout)
  • Variable latency (milliseconds to seconds)
  • May execute multiple times if retried
  • Must encode everything as bytes
  • Different programming languages

This is your daily reality as a frontend engineer:

// Treating network like local function (BAD)
function getUser(id) {
  return fetch(`/api/users/${id}`).then((r) => r.json());
  // What if timeout? Network down? Server error?
}

// Acknowledging network reality (GOOD)
function useUser(id) {
  return useQuery({
    queryKey: ["user", id],
    queryFn: () => fetchUser(id),
    retry: 3, // Network may fail
    timeout: 10000, // May timeout
    onError: handleError, // Handle gracefully
  });
}

REST vs. GraphQL

REST: Doesn’t hide that it’s a network protocol (good!) GraphQL: Learned from RPC’s mistakes—strongly typed, client-controlled, acknowledges network reality

Message Passing: The Better Abstraction

Message brokers (RabbitMQ, Kafka) provide asynchronous communication:

  • Buffer messages if recipient unavailable
  • Automatic redelivery on failure
  • Decouple sender from recipient
  • One message to many recipients

Frontend equivalents:

// WebSockets = Message broker
const ws = new WebSocket('wss://api.example.com')
ws.onmessage = (event) => {
  const update = JSON.parse(event.data)
  dispatch({ type: update.type, payload: update.data })
}

// BroadcastChannel = Pub/Sub between tabs
const channel = new BroadcastChannel('updates')
channel.postMessage({ type: 'USER_UPDATE', data: {...} })

// Service Workers = Message broker between tabs
navigator.serviceWorker.controller.postMessage({...})

Rolling Upgrades in Frontend

You face the same challenge as backend services:

The scenario:

  • User hasn’t refreshed in 3 days (v1.0)
  • You deployed v1.1 yesterday
  • User’s old code hits new API
  • Must maintain backward compatibility!

The solution:

// API v1
interface User {
  id: string;
  name: string;
}

// API v2 - Adding field safely
interface User {
  id: string;
  name: string;
  email?: string; // Optional! Old clients don't break
}

// Backend handles both versions
app.post("/api/users", (c) => {
  const body = c.req.json();

  const user = {
    id: generateId(),
    name: body.name,
    email: body.email || null, // Gracefully handle missing field
  };

  db.insert(user);
  c.json(user);
});

Key Insight: Frontend Apps Are Distributed Systems

Modern web apps face the same distributed systems challenges:

  • Network unpredictability
  • Multiple versions running simultaneously
  • Need for backward/forward compatibility
  • Async communication patterns

Understanding these fundamentals makes you a better frontend engineer.


The Bigger Picture: How It All Connects

Part 1 builds a foundation by progressing through layers:

Chapter 1: What are we trying to achieve?

  • Reliable, scalable, maintainable systems

Chapter 2: How do we structure data?

  • Choose the right model for your use case

Chapter 3: How do we store and retrieve data?

  • Understand the tradeoffs between write and read performance

Chapter 4: How do data and code evolve over time?

  • Design for compatibility and change

The Frontend Journey

As a frontend engineer, this foundation transforms how you think about:

Architecture Decisions:

  • Normalized vs. nested state (Chapter 2)
  • When to add indexes/memoization or caching (Chapter 3)
  • API versioning strategies (Chapter 4)

Performance Optimization:

  • Bundle splitting = segmentation (Chapter 3)
  • Batch operations = sequential writes (Chapter 3)
  • Optimistic updates = async messaging (Chapter 4)

Reliability Patterns:

  • Error boundaries = fault tolerance (Chapter 1)
  • Retry logic = handling network unpredictability (Chapter 4)
  • Immutability = crash safety (Chapter 3)

Practical Takeaways: Decision Frameworks

When to Normalize State?

Dos: Normalize when:

  • Frequent updates to individual entities
  • Shared references (same data in multiple places)
  • Many-to-many relationships

Don’ts Keep nested when:

  • Read-heavy, display-focused
  • Tree structures (one-to-many)
  • Self-contained data

When to Add Indexes (Memoization)?

Dos Add indexes when:

  • Expensive computations
  • Frequently accessed
  • Stable inputs (high cache hit rate)

Don’ts Skip indexes when:

  • Simple, fast operations
  • Constantly changing inputs
  • Memory constrained

When to Use Message Passing vs. Request/Response?

Dos Message passing (WebSockets) when:

  • Real-time updates
  • Server-initiated updates
  • Multiple subscribers
  • Fire-and-forget communication

Dos Request/Response (REST/GraphQL) when:

  • Client-initiated queries
  • Need immediate response
  • Request/response pattern
  • Cacheable data

How to Handle API Evolution?

The Golden Rules:

  1. Add fields as optional (backward compatible)
  2. Never remove required fields (breaks old clients)
  3. Version your APIs (URL, header, or content negotiation)
  4. Use feature detection when possible
  5. Test with old clients before deploying

Key Patterns to Apply Immediately

1. The Append-Only Log Pattern

// Use for: Undo/redo, time-travel debugging, audit trails
const eventLog = [];

function dispatch(action) {
  eventLog.push(action); // Append only!
  currentState = reducer(currentState, action);
}

function undo() {
  // Replay all but last action
  const newLog = eventLog.slice(0, -1);
  return newLog.reduce(reducer, initialState);
}

2. The Normalized Cache Pattern

// Use for: Shared data, frequent updates
const cache = {
  users: { 1: {...}, 2: {...} },
  posts: { 1: { userId: 1, ...}, 2: {...} }
}

// Update user once, reflected everywhere
cache.users[1].name = "New Name"

3. The Optimistic Update Pattern

// Use for: Variable network latency
const { mutate } = useMutation({
  mutationFn: likePost,
  onMutate: async (postId) => {
    // Update immediately
    queryClient.setQueryData(["post", postId], (old) => ({
      ...old,
      liked: true,
    }));
  },
  onError: (err, postId, context) => {
    // Rollback on failure
    queryClient.setQueryData(["post", postId], context.previous);
  },
});

4. The Compatibility Pattern

// Use for: API evolution
interface Config {
  version: number;
  features: string[];
  // Add new fields as optional
  newFeature?: boolean;
}

function handleConfig(config: Config) {
  // Feature detection
  if (config.features?.includes("real-time")) {
    enableWebSocket();
  }

  // Gracefully handle new fields
  if (config.newFeature !== undefined) {
    useNewFeature(config.newFeature);
  }
}

Questions to Ask When Designing Systems

Reliability

  • What happens when the network fails?
  • How do we handle partial failures?
  • What’s our error recovery strategy?
  • How do we prevent data corruption?

Scalability

  • How does performance change with more data?
  • What’s our bundle size strategy?
  • How do we handle growing state complexity?
  • Where are the bottlenecks?

Maintainability

  • Can new team members understand this?
  • How do we test this effectively?
  • How easy is it to change?
  • What’s our documentation strategy?

Data Modeling

  • Is this data naturally nested or interconnected?
  • How often do we update vs. read?
  • Do we have many-to-many relationships?
  • What are the query patterns?

Storage

  • What’s the read/write ratio?
  • Do we need range queries?
  • How important is locality?
  • What’s the data volume?

Evolution

  • How do we version this?
  • What’s our compatibility strategy?
  • How do we handle rolling upgrades?
  • Can old clients still work?

Conclusion: The Foundation That Changes Everything

Part 1 of “Designing Data-Intensive Applications” isn’t just about databases and backend systems. It’s about fundamental principles that apply to any system that stores, processes, and communicates data.

As frontend engineers building production-grade, real-time experiences, we face the same challenges:

  • Reliability: Networks fail, users make mistakes, bugs happen
  • Scalability: Apps grow in complexity, data volume, and traffic
  • Maintainability: Teams change, requirements evolve, systems must adapt

Understanding these foundations transforms how you:

  • Design state management (Chapter 2: data models)
  • Optimize performance (Chapter 3: storage patterns)
  • Handle network communication (Chapter 4: distributed systems)
  • Build for change (All chapters: evolution and compatibility)

What’s Next?

Part 2 of the book dives into Distributed Data:

  • Chapter 5: Replication
  • Chapter 6: Partitioning
  • Chapter 7: Transactions
  • Chapter 8: The Trouble with Distributed Systems
  • Chapter 9: Consistency and Consensus

These chapters become even more relevant as you build:

  • Real-time collaborative features
  • Offline-first applications
  • Multi-region systems
  • Complex state synchronization

The Mindset Shift

The biggest takeaway from Part 1? Stop thinking of frontend and backend as separate worlds. They’re both data systems facing the same fundamental challenges, just at different scales and contexts.

When you understand these foundations, you don’t just write code—you design systems. And that’s what separates good engineers from great ones.


Resources for Further Learning

For deeper understanding:

For frontend-specific applications:

For building real-time systems:


This summary is based on “Designing Data-Intensive Applications” by Martin Kleppmann. All concepts and examples are translated and adapted for frontend engineering context. For the full treatment and complete understanding, I highly recommend reading the original book.

Happy building! 🚀