SYSTEM DESIGN · RESEARCH REFERENCE

18 Case Studies for
Software Engineers

Research papers, architecture deep-dives, and academic resources for every major system design topic — plus distributed database systems.

18
Case Studies
80+
Research Papers
19
Topic Areas
Learning Depth
🔍
01
URL Shortener
Hashing, consistent hashing, key-value storage at scale
02
Meta Serverless (11.5M calls/sec)
Serverless architectures, FaaS, cold-start optimization
03
Google Docs
OT, CRDTs, collaborative editing, real-time sync
04
Apache Kafka
Distributed log, pub-sub, stream processing
05
Stock Exchange
Order matching engines, ultra-low latency, financial systems
06
Uber Driver Matching
Geospatial indexing, S2 cells, 1M req/sec
07
Reddit
Vote aggregation, hot rankings, feed generation at scale
08
Tinder
Geo-based matching, swipe mechanics, recommendation systems
09
Spotify
CDN for audio, recommendation ML, Backstage architecture
10
WhatsApp
Erlang/OTP, end-to-end encryption, 50B messages/day
11
Bluesky
AT Protocol, federated social, decentralized identity
12
Slack
WebSockets, channel fan-out, search at scale
13
Stripe Idempotent API
Exactly-once payments, idempotency keys, distributed transactions
14
ChatGPT
Transformer inference, token streaming, GPU orchestration
15
Twitter Timeline
Fan-out on write vs read, celebrity problem, caching
16
Amazon S3
Object storage, erasure coding, 11 nines durability
17
YouTube & MySQL (2.49B users)
Vitess, sharding, video transcoding pipeline
18
Scale to 10M Users on AWS
Auto-scaling, CDN, load balancing, database tiers
Distributed DB Systems
GFS, Dynamo, Spanner, Bigtable, Raft, CRDT, consensus
01

How URL Shortener Works

Hashing Key-Value Store Consistent Hashing Redirection

System Overview

A URL shortener converts long URLs into short slugs (6–8 characters), stores the mapping in a fast key-value store, and redirects HTTP 301/302 on lookup. Core challenges include generating unique short codes at high write throughput, distributing the lookup globally with low latency (sub-10ms), and handling 100:1 read/write ratios at scale.

Encoding
Base62 (a-z, A-Z, 0-9)
Short Code Length
7 chars = 3.5 trillion URLs
Redirect Type
301 (permanent) vs 302
Storage
Cassandra / DynamoDB / Redis
ID Generation
Snowflake / ZooKeeper range
Write Rate
~100M new URLs/day (bit.ly)

Key Design Decisions

301 vs 302 Redirect: 301 (permanent) is cached by browsers, reducing load on servers but breaking analytics. 302 (temporary) routes every request through your servers, enabling click tracking but increasing server load.

Approach 1 — Counter-based: Use a global distributed counter (ZooKeeper, Redis INCR), convert to Base62. Simple but requires coordination.
Approach 2 — Hash-based: MD5/SHA-256 the long URL, take first 7 chars. Risk of collision requires birthday problem mitigation.
Approach 3 — Snowflake ID: 64-bit ID from datacenter + machine + timestamp + sequence. Decentralized, collision-free, sortable.

Research Papers & Key Resources

Consistent Hashing and Random Trees
Karger et al. · MIT · STOC 1997
Foundation for how to distribute URL keys across nodes with minimal remapping when nodes join/leave. Directly applicable to sharding URL stores. Introduced the concept of virtual nodes on a ring.
→ ACM DL
Chord: A Scalable Peer-to-peer Lookup Service
Stoica et al. · MIT · SIGCOMM 2001
Distributed hash table used to map URL keys to storage nodes. O(log N) lookups. Heavily cited in distributed key-value design for URL shorteners.
→ Free PDF
Dynamo: Amazon's Highly Available Key-Value Store
DeCandia et al. · Amazon · SOSP 2007
The blueprint for key-value stores used in URL shortening backends. Covers consistent hashing, vector clocks, and eventual consistency — exactly what large-scale URL systems use.
→ Free PDF
Snowflake: A Distributed Unique ID Generator
Twitter Engineering Blog · 2010
The canonical approach to generating unique, roughly sortable 64-bit IDs without coordination, used in many URL shortener ID generation strategies.
→ Twitter Blog
02

How Meta Handles 11.5 Million Serverless Function Calls per Second

FaaS Serverless Cold Start Edge Computing

System Overview

Meta's serverless platform (used for content ranking, ads, and feed computation) runs thousands of isolated function workers, each responding to event triggers. At 11.5M calls/second they need microsecond-level cold starts, automatic scaling, fault isolation, and efficient bin-packing of compute resources. Meta's internal FaaS differs from AWS Lambda in that it uses persistent warm workers and aggressive pre-warming based on traffic prediction.

Function Calls/sec
11.5 Million
Runtime
Hack (PHP-derived), Python
Isolation
Container + cgroups
Scheduling
Tupperware (internal)

Key Insight: The Cold Start Problem

Cold start latency (50–500ms) is the #1 enemy of serverless at scale. Meta solves it via pre-warming (keep N idle workers hot based on traffic forecast), snapshot restore (fork from a frozen process image), and co-location with data dependencies to minimize network round trips.

Research Papers & Key Resources

Serverless Computing: One Step Forward, Two Steps Back
Hellerstein et al. · UC Berkeley · CIDR 2019
Critical analysis of serverless limitations: poor for stateful apps, high cold starts, communication overhead. Sets the research agenda that Meta's platform attempts to solve.
→ Free PDF
SOCK: Rapid Task Provisioning with Serverless-Optimized Containers
Oakes et al. · UW-Madison · USENIX ATC 2018
Introduces optimized container startup for serverless functions. Achieves 18x lower cold-start latency using lean containers. Directly relevant to Meta's approach.
→ Free PDF
Firecracker: Lightweight Virtualization for Serverless Applications
Agache et al. · Amazon · NSDI 2020
AWS Lambda's microVM approach. 125ms boot time, 5MB overhead per VM. Foundation for modern FaaS isolation. Comparable to Meta's container strategy.
→ Free PDF
Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing
Shillaker & Pietzuch · Imperial College · USENIX ATC 2020
WebAssembly-based isolation for serverless with shared memory regions. Pushes towards stateful serverless, a key concern at Meta's scale.
→ Free PDF
Twine: A Unified Cluster Management System for Shared Infrastructure
Tang et al. · Meta · OSDI 2020
Meta's actual cluster management system that underlies their serverless infrastructure. Describes how they schedule millions of tasks across fleets.
→ USENIX
03

How Google Docs Works

OT / CRDT Real-time Collaboration WebSocket Conflict Resolution

System Overview

Google Docs uses Operational Transformation (OT) to synchronize edits from multiple users in real time. Every character insertion/deletion is represented as an operation that gets transformed against concurrent operations before being applied. The server is the single source of truth and serializes all operations. Google's internal system is called "Wave" OT (open-sourced).

Algorithm
Operational Transformation (OT)
Connection
WebSocket + Long Polling
Storage
Bigtable + Colossus (GFS2)
Alternative
CRDTs (used by Figma, Notion)

OT vs CRDTs

OT requires a central server to serialize operations — simpler reasoning but a bottleneck. CRDTs (Conflict-free Replicated Data Types) are commutative by design and need no coordinator, making them suitable for offline-first and P2P collaboration. Both aim to satisfy convergence (all replicas eventually reach same state) and intention preservation.

Research Papers & Key Resources

Operational Transformation in Real-Time Group Editors
Ellis & Gibbs · MCC · CSCW 1989
The original OT paper. Foundational to Google Docs architecture. Introduces the GROVE (Group Outline Viewing and Editing) system. All collaborative editor systems reference this work.
Google Wave Operational Transformation
Wang et al. · Google · 2010
Google's whitepaper on the OT algorithm used in Wave and subsequently Google Docs. Covers transformation functions for rich text documents.
→ Free PDF
CRDTs: Consistency Without Concurrency Control
Shapiro et al. · INRIA · 2011
Comprehensive introduction to Conflict-free Replicated Data Types — the alternative to OT used in Figma, Notion, and offline-first apps. Covers G-Counter, PN-Counter, LWW-Element-Set.
→ Free PDF
LSEQ: an Adaptive Structure for Sequences in Distributed Collaborative Editing
Nédelec et al. · DocEng 2013
Solves the "tombstone" problem in sequence CRDTs. Highly cited in collaborative editing literature. Used in modern CRDT-based editors.
Real Differences Between OT and CRDT for Co-Editors
Sun & Sun · ACM CSCW 2020
Empirical comparison of OT vs CRDT approaches in production collaborative editors. Breaks down performance, consistency, and implementation complexity trade-offs.
04

How Apache Kafka Works

Distributed Log Pub-Sub Stream Processing Exactly-Once Semantics

System Overview

Kafka is a distributed commit log. Topics are divided into partitions, each an ordered, immutable sequence of records stored on disk. Producers append to partition tails; consumers read at their own pace using offsets. The key insight: Kafka's performance comes from sequential disk I/O (fast on all storage types), page cache reliance (avoids double buffering), and zero-copy transfer via sendfile() system call.

Original Publisher
LinkedIn (2011)
Throughput
Millions msgs/sec per broker
Retention
Time-based (default 7 days)
Coordination
ZooKeeper → KRaft (2.8+)
Delivery
At-least, At-most, Exactly-once
Replication
ISR (In-Sync Replicas)

The Log: Foundation of Modern Data Systems

Jay Kreps (Kafka co-creator) argues in "The Log: What every software engineer should know about real-time data's unifying abstraction" that the append-only log is the simplest yet most powerful abstraction in distributed systems — it unifies databases, stream processors, and messaging queues.

Research Papers & Key Resources

Kafka: A Distributed Messaging System for Log Processing
Kreps, Narkhede, Rao · LinkedIn · NetDB 2011
The original Kafka paper. Describes the pull-based consumer model, sequential disk access, zero-copy data transfer, and the design decisions that gave Kafka orders-of-magnitude better throughput than ActiveMQ and RabbitMQ. Essential reading.
→ Free PDF
Samza: Stateful Scalable Stream Processing at LinkedIn
Noghabi et al. · LinkedIn · VLDB 2017
LinkedIn's stream processing system built on Kafka. Covers stateful operators, fault tolerance via checkpointing, and how Kafka's log enables replay-based recovery.
→ Free PDF
Exactly-Once Semantics Are Possible: Here's How Kafka Does It
Narkhede · Confluent Engineering Blog · 2017
Explains the idempotent producer + transactional API added in Kafka 0.11. Two-phase commit within Kafka's log. Critical for payment and financial pipelines.
→ Blog
The Log: What Every Software Engineer Should Know
Kreps · LinkedIn Engineering · 2013
Seminal blog post (read as a paper) explaining logs as a unifying abstraction. Explains how databases, distributed systems, and Kafka all reduce to append-only logs.
→ Blog
Apache Flink: Stream and Batch Processing in a Single Engine
Carbone et al. · TU Berlin · IEEE Data Eng. Bulletin 2015
Flink is the most common stream processor on top of Kafka. This paper covers stateful stream operators, event time processing, and exactly-once checkpointing.
→ Free PDF
05

How Stock Exchange Works

Order Matching Low Latency LMAX Disruptor FIX Protocol

System Overview

A stock exchange is one of the most latency-sensitive systems in the world. NYSE processes ~20M trades/day with sub-microsecond matching. The order book (bid/ask) is a priority queue maintained in memory. The matching engine is single-threaded to avoid locks. Modern exchanges use FPGAs and kernel-bypass networking (DPDK, RDMA) to achieve nanosecond latencies.

Matching Engine
Single-threaded, in-memory
Order Book DS
Price-time priority (heap/tree)
Latency Target
<10µs (NASDAQ), <1µs (HFT)
Network
Kernel-bypass (DPDK, RDMA)
Protocol
FIX, ITCH, OUCH
Persistence
LMAX Disruptor ring buffer

Research Papers & Key Resources

LMAX Disruptor: High Performance Alternative to Bounded Queues
Thompson et al. · LMAX Exchange · 2011
The architecture behind LMAX Exchange — 6 million orders/sec on a single thread. Uses a pre-allocated ring buffer with memory barriers instead of locks. Revolutionary for financial systems engineering.
→ White Paper
Mechanical Sympathy: The Art of Low-Latency Architecture
Martin Thompson · GOTO Conference 2012
Discusses CPU cache lines, false sharing, and memory access patterns. Essential for understanding why exchange matching engines are designed the way they are.
Design Patterns for Low Latency Applications with Intel® DPDK
Intel · 2014
Kernel-bypass networking for sub-microsecond packet processing. Used by exchanges to eliminate OS overhead from the network stack.
→ Intel
High Frequency Trading: Market Microstructure and Economics
Aldridge · John Wiley & Sons · 2013 (Book)
Covers the economic motivation behind HFT and the technical architecture needed. Essential context for why exchanges push to nanosecond latency.
06

How Uber Finds Nearby Drivers at 1 Million Requests/Second

Geospatial S2 Geometry Real-time Location Dispatch System

System Overview

Uber's dispatch system must find the nearest available driver within milliseconds. The naive approach (iterate all drivers) fails at scale. Uber uses Google's S2 library which maps Earth's surface to 1D using a Hilbert space-filling curve (preserves spatial locality). Each driver location is stored in a key-value store indexed by S2 cell ID. Nearby search = range scan on adjacent cell IDs.

Spatial Index
Google S2 (Hilbert Curve)
Location Update
Every 4 seconds per driver
Storage
Redis (in-memory, geo commands)
Peak Load
1M requests/second
Architecture
DISCO / Ringpop (old); Cadence
DB
Schemaless (MySQL → Cassandra)

Research Papers & Key Resources

Space-Filling Curves: An Introduction With Applications in Scientific Computing
Bader · Springer · 2012 (Book)
Mathematical foundation for Hilbert curves used in S2. Understanding this is key to grasping why Uber's spatial index is efficient — locality-preserving mapping collapses 2D to 1D.
Uber's Schemaless: A Scalable MySQL-Based Key-Value Store
Uber Engineering Blog · 2016
Describes how Uber stored driver/trip data at scale using MySQL with a denormalized key-value layer. Evolved into the Docstore system (Cassandra-backed).
→ Uber Blog
Ringpop: Scalable Fault-Tolerant Application-Layer Sharding
Uber Engineering · 2015
Uber's consistent-hash ring for distributing driver location ownership across their dispatch fleet. Enables Uber to shard "who owns which S2 cell" across thousands of servers.
→ Uber Blog
H3: Uber's Hexagonal Hierarchical Spatial Index
Brodsky · Uber Open Source · 2018
Uber's open-source replacement for S2. Uses hexagonal grids (better for distance approximations) at 16 resolution levels. Now used for surge pricing, demand forecasting, and driver matching.
→ Uber Blog
Reliable, Scalable Dispatching Architecture: Uber's Dispatch System
Uber Engineering Blog · 2015
Covers the full dispatch pipeline from rider request to driver assignment. Discusses the Supply Service, Demand Service, and Dispatch Service architecture with Ringpop sharding.
→ Uber Blog
07

How Reddit Works

Vote Aggregation Feed Generation Memcached Hot Ranking

System Overview

Reddit's hotness algorithm combines score (upvotes − downvotes) with time decay. The ranking is recomputed periodically and cached aggressively in Memcached. Reddit uses PostgreSQL for persistent storage, Cassandra for vote aggregation, and Kafka for event streaming. The main scaling challenge is "vote bombing" (millions of votes on trending posts) and the frontpage cache invalidation problem.

Hot Algorithm
Wilson score + time decay
Primary DB
PostgreSQL (sharded)
Vote Store
Cassandra (eventual consistency)
Cache
Memcached (Mcrouter)
Search
Elasticsearch
CDN
Fastly

Research Papers & Key Resources

Reddit's Architecture: Scaling a Community Platform
Reddit Engineering Blog · Various Posts 2012–2020
Reddit's engineering blog covers their PostgreSQL sharding strategy, move to AWS, Cassandra adoption for votes, and Kafka for event streaming. Practical case study in iterative scaling.
→ Reddit Eng
How Not to Sort by Average Rating (Wilson Score)
Miller · 2009
Explains the statistical ranking algorithm used by Reddit's hot algorithm. The Wilson score confidence interval is the correct way to rank items by positive/negative ratings at varying sample sizes.
→ Article
Scaling Memcache at Facebook
Nishtala et al. · Facebook · NSDI 2013
Facebook's battle-tested Memcached deployment (analogous to Reddit's Mcrouter). Covers lease-based cache invalidation, regional replication, and thundering herd prevention — all directly applicable to Reddit's caching layer.
→ Free PDF
Cassandra: A Decentralized Structured Storage System
Lakshman & Malik · Facebook · SOSP 2009
The Cassandra paper — used by Reddit for vote storage. Wide-column, tunable consistency. Combines Dynamo's distribution model with BigTable's data model.
→ Free PDF
08

How Tinder Works

Geo Matching Recommendation ELO Score Swipe Engine

System Overview

Tinder's core challenge: for a given user, return a stack of nearby profiles ordered by desirability, in real time, from a pool of 50M+ active users. Tinder uses a geo-sharded user store (Geosharded MongoDB initially, then Cassandra), an ELO-like attractiveness score for ranking, and a Recs service that pre-computes recommendation stacks. When two users both right-swipe, they match.

Geo Index
S2 Cells / Geohash
Ranking
ELO score (chess-derived)
Storage
Cassandra + Redis
Recs
Pre-computed, cached in Redis
Images
S3 + CloudFront CDN
Streaming
Kafka for match events

Research Papers & Key Resources

Tinder's Architecture: Petabytes of Data, Billions of Swipes
Tinder Engineering Blog · 2020
Direct technical blog post from Tinder engineers covering their petabyte-scale architecture, Cassandra adoption, and how they moved from monolith to microservices.
→ Tinder Blog
Collaborative Filtering for Implicit Feedback Datasets
Hu, Koren, Volinsky · ICDM 2008
Foundation for recommendation systems. Tinder's "Smart Photos" and profile ordering use collaborative filtering variants. Understanding implicit feedback (swipes) vs explicit ratings is critical.
Real-Time Machine Learning for Recommendations at Tinder
Tinder Engineering · 2020
Describes how Tinder moved from ELO to ML-based scoring. Uses online learning with user swipe feedback to improve recommendation stacks in real time.
→ Blog
ELO Rating System in Competitive Environments
Elo, A.E. · Original Chess System · 1960s
Tinder publicly acknowledged using an ELO-like system (called "desirability score") before switching to ML. Understanding the original ELO math is essential context.
09

How Spotify Works

Audio CDN Recommendation ML Backstage Microservices

System Overview

Spotify serves 100M+ songs to 600M+ users. Audio files are stored on GCS (Google Cloud Storage) and served via CDN with adaptive bitrate streaming. Spotify's recommendation engine (Discover Weekly, Daily Mix) uses a hybrid of collaborative filtering (matrix factorization on listen history) and natural language processing (audio analysis + playlist text). The backend runs on GCP with 800+ microservices.

Audio Storage
Google Cloud Storage
Streaming
OGG Vorbis (bitrate adaptive)
Reco Engine
Matrix Factorization + NLP
Event Stream
Kafka + BigQuery
Service Catalog
Backstage (open-sourced 2020)
ML Platform
Kubeflow + Vertex AI

Research Papers & Key Resources

Deep Content-Based Music Recommendation
van den Oord et al. · Spotify/KU Leuven · NeurIPS 2013
Spotify's approach to cold-start problem: use deep CNNs on raw audio spectrograms to predict latent factors, enabling recommendations for new songs with no listen history.
→ Free PDF
Word2Vec Applied to Recommendations: Spotify's Metadata Embeddings
Spotify Research · 2016
Spotify treats songs as "words" and playlists as "sentences," training Word2Vec to learn song embeddings. Similar songs have similar vectors. Powers Related Artists and Radio.
→ Blog
Podcast Recommendations at Spotify
Spotify Research · 2020
Extends recommendation to audio content with transcripts. Combines BERT-based text understanding with collaborative filtering signals. Describes the multi-modal recommendation architecture.
Backstage: Empowering Product Teams with a Service Catalog
Spotify Engineering · 2020 (Open Source)
Spotify's internal developer portal, open-sourced and now a CNCF project. Solved the "800+ microservices, no way to navigate them" problem via a software catalog and TechDocs.
→ Blog
10

How WhatsApp Works

Erlang/OTP E2E Encryption XMPP 50B msgs/day

System Overview

WhatsApp serves 50 billion messages/day with just ~50 engineers at acquisition (2014). The secret: Erlang's actor model gives each user a lightweight process (~2KB), enabling millions of concurrent connections per server. The Signal Protocol provides end-to-end encryption. Messages are persisted in Mnesia (Erlang's built-in distributed DB) and deleted after delivery. Media is stored in S3, not the message store.

Backend Language
Erlang (BEAM VM)
Protocol
Modified XMPP over WebSocket
Encryption
Signal Protocol (E2E)
Message DB
Mnesia (Erlang native)
Media
Amazon S3 (deleted post-download)
OS
FreeBSD (for high file handles)

Research Papers & Key Resources

The Signal Protocol: A Modern Cryptographic Standard
Marlinspike & Perrin · Open Whisper Systems · 2016
Technical specification for the Signal Protocol used in WhatsApp, Signal, and Facebook Messenger. Covers the Double Ratchet algorithm (combining X3DH key agreement with message ratcheting) for perfect forward secrecy.
→ Specification
Scaling Erlang for Messaging: WhatsApp's Approach
Ferd Hébért · Erlang Factory · 2014
Explains how Erlang's actor model (gen_server processes per user) allows WhatsApp to handle 2.5M+ concurrent connections per node. Covers OTP supervision trees and hot code reloading.
→ Slides
End-to-End Encryption in Messaging: Technical and Policy Dimensions
Green et al. · 2020
Comprehensive survey of E2E encryption in messaging apps. Covers WhatsApp's multi-device challenges, key management, and the policy debates around backdoors.
Engineering WhatsApp for Two Billion Users
WhatsApp/Facebook Engineering · 2020
Blog post detailing the multi-device architecture introduced in 2021, where the same account can be active on multiple devices simultaneously, requiring a new key distribution model.
→ FB Engineering
11

How Bluesky Works

AT Protocol Federated Social DID (Decentralized ID) PDS / Relay / AppView

System Overview

Bluesky is built on the AT Protocol (Authenticated Transfer Protocol) — a federated social protocol where users own their data. Each user has a Personal Data Server (PDS) holding their signed repository (a Merkle tree of all posts/follows). A Relay ("Firehose") crawls all PDSes and emits a global event stream. AppViews (e.g., the Bluesky app) subscribe to the relay and build indexes for search/feeds. Users can migrate between PDSes without losing followers.

Identity
DID (Decentralized Identifier)
Data Model
Merkle DAG (signed repos)
Protocol
AT Protocol (atproto.com)
Components
PDS → Relay → AppView
Algorithms
Custom feed generators (open)
Inspiration
ActivityPub, SSB, Fediverse

Research Papers & Key Resources

The AT Protocol (atproto) Specification
Bluesky Team · 2023
The full technical specification for the Authenticated Transfer Protocol. Covers lexicons (schema language), repo structure (Merkle trees), DIDs, handle resolution, and the PDS/Relay/AppView architecture.
→ atproto.com
Decentralized Social Networks
Raman et al. · IMC 2019
Measurement study of Mastodon/Fediverse. Covers federation dynamics, content moderation challenges, and cross-instance data distribution — directly applicable to Bluesky's design decisions.
→ ACM DL
Decentralized Identifiers (DIDs) Specification
W3C · 2022
The W3C standard for DIDs used in Bluesky. A DID is a self-sovereign identifier cryptographically bound to a key pair. Enables portability without a central authority (similar to domain ownership).
→ W3C Spec
Merkle DAGs and Content-Addressed Storage
Benet · IPFS Whitepaper · 2014
IPFS whitepaper covering Merkle DAGs — the data structure Bluesky uses for signed user repositories. Content-addressing ensures data authenticity and enables verifiable history.
→ PDF
12

How Slack Works

WebSocket Channel Fan-out Message Search Presence

System Overview

Slack maintains persistent WebSocket connections to every active client. When a message is sent to a channel with 1000 members, Slack must fan out that message to all 1000 connections simultaneously. The "presence" system (online/typing indicators) generates 10x more traffic than messages themselves. Slack uses Vitess (MySQL sharding) for message storage and Elasticsearch for search across billions of messages.

Connection
WebSocket (persistent)
Message Store
MySQL + Vitess
Search
Elasticsearch
Cache
Memcached
Fan-out
Channel membership lookup → push
Job Queue
Redis

Research Papers & Key Resources

Slack's Architecture: Moving to a Cellular Design
Slack Engineering · 2020
Describes Slack's "grid" architecture transition where each workspace's message store is isolated in its own cell (MySQL shard), enabling independent scaling and fault isolation per workspace.
→ Slack Eng
Vitess: Scaling MySQL for the Web
Sougou · YouTube/PlanetScale · 2011
Vitess provides MySQL sharding with connection pooling and query routing. Used by Slack and YouTube to scale MySQL horizontally. Covers resharding, vertical splits, and cross-shard queries.
→ Vitess Docs
Flannel: An Application-Level Edge Cache for Slack
Slack Engineering · 2019
Slack built an edge caching layer (Flannel) using Memcached to cache per-workspace session membership data, dramatically reducing database load during message fan-out to large channels.
→ Blog
Presence at Scale: Tracking Millions of Users
Slack Engineering · 2017
The presence system (green dot = online) generates enormous traffic. Slack's solution: gossip-based presence aggregation with hysteresis (delay offline status to avoid flicker).
13

How Stripe Prevents Double Payment Using Idempotent API

Idempotency Keys Exactly-Once Distributed Transactions Payment Systems

System Overview

In payment systems, network retries can cause duplicate charges. Stripe's solution: clients include an idempotency key (a UUID) with each request. Stripe stores the key + response in a database. If a retry comes in with the same key, Stripe returns the cached response without re-executing. The key must be scoped to a user account, stored before the charge, and garbage collected after 24 hours.

Idempotency Key
UUID, 24hr TTL
Storage
Database (before charge)
Retry Strategy
Exponential backoff + jitter
Consistency
Serializable isolation (Postgres)
Core DB
PostgreSQL + Vitess
Events
Kafka for async webhooks

The Danger of Retries Without Idempotency

In a distributed system, "did the request succeed?" is unknowable if you lose the connection after sending but before receiving the response. Without idempotency keys, every retry risks a double charge. The fix: generate the idempotency key client-side, store it server-side before acting, and return the same result for any retry with that key.

Research Papers & Key Resources

Idempotency
Stripe Engineering Blog · Brandur Leach · 2017
The definitive blog post on idempotency in payments. Covers idempotency keys, database-level atomicity, API design patterns, and how to handle partial failures in multi-step payment workflows.
→ Stripe Blog
Exactly-Once Delivery and Transactional Messaging at Scale
Kafka Engineering · 2017
Explains the producer idempotency + transactions API in Kafka 0.11, enabling exactly-once message delivery — directly used in Stripe's event pipeline for webhook delivery.
→ Blog
Designing Data-Intensive Applications (Chapter 9: Consistency and Consensus)
Kleppmann · O'Reilly · 2017 (Book)
The industry's most-read book on distributed systems. Chapter 9 covers linearizability, distributed transactions, and exactly-once semantics — the theoretical foundation for idempotent APIs.
→ O'Reilly
Patterns for Distributed Transactions within a Microservices Architecture
Richardson · 2019
Covers the Saga pattern (choreography and orchestration) for multi-step financial workflows like a payment: reserve funds → charge card → update ledger → issue receipt. Each step must be idempotent.
→ Article
14

How ChatGPT Works

Transformer Inference Token Streaming RLHF GPU Clusters

System Overview

ChatGPT is a GPT-4-class autoregressive Transformer that generates responses token by token. Inference is computationally dominated by matrix multiplications on GPU tensors. To serve millions of requests, OpenAI runs massive GPU clusters (A100/H100) with tensor parallelism (split model layers across GPUs) and KV-cache (cache K/V attention matrices to avoid recomputation per token). Responses are streamed via Server-Sent Events (SSE).

Architecture
GPT (Transformer Decoder)
Training
RLHF (PPO + SFT)
Parallelism
Tensor + Pipeline + Data
Hardware
NVIDIA A100/H100 clusters
Optimization
KV Cache, FlashAttention
Streaming
Server-Sent Events (SSE)

Research Papers & Key Resources

Attention Is All You Need
Vaswani et al. · Google Brain · NeurIPS 2017
The Transformer paper — the architectural foundation of GPT, BERT, and all modern LLMs. Introduces multi-head self-attention, positional encoding, and the encoder-decoder architecture. Arguably the most important ML paper of the decade.
→ arXiv
Language Models are Few-Shot Learners (GPT-3)
Brown et al. · OpenAI · NeurIPS 2020
GPT-3 paper demonstrating in-context learning at scale. Directly precedes ChatGPT's architecture. Shows that scale alone (175B params) produces remarkable few-shot capabilities without fine-tuning.
→ arXiv
Training Language Models to Follow Instructions with Human Feedback (InstructGPT)
Ouyang et al. · OpenAI · NeurIPS 2022
The RLHF paper behind ChatGPT. Shows how SFT + reward model + PPO fine-tuning aligns LLMs with human preferences. A 1.3B InstructGPT model outperforms 175B GPT-3 on human evaluations.
→ arXiv
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Dao et al. · Stanford · NeurIPS 2022
Reduces the memory complexity of attention from O(n²) to O(n) by tiling GPU SRAM operations. Enables 2-4x faster training and inference for long-context models. Used by every major LLM serving system.
→ arXiv
Efficient Large Scale Language Modeling with Mixtures of Experts
Artetxe et al. · Meta AI · EMNLP 2021
MoE architecture (suspected in GPT-4) where only a subset of parameters activate per token. Achieves dense model quality at a fraction of the compute cost. Key to serving trillion-parameter models economically.
→ arXiv
Orca: A Distributed Serving System for Transformer-Based Generative Models
Yu et al. · USENIX OSDI 2022
Introduces continuous batching for LLM inference — serving multiple requests simultaneously, inserting new requests mid-batch. Used by vLLM and TGI. Dramatically improves GPU utilization.
→ USENIX
15

How Twitter Timeline Works

Fan-out on Write Celebrity Problem Redis Timeline Cache Hybrid Approach

System Overview

Twitter's timeline is a merge of tweets from all accounts you follow, sorted by recency. Twitter uses a hybrid fan-out: when a normal user tweets, their tweet is immediately pushed (fan-out on write) to all followers' Redis timeline caches. But for celebrities (Obama, Beyoncé — millions of followers), fan-out would take minutes. So Twitter uses fan-out on read for celebrities, merging their tweets at query time.

Timeline Cache
Redis (sorted set per user)
Fan-out (normal)
Write-time push to followers
Fan-out (celebrity)
Read-time merge
Tweet Store
Manhattan (Twitter's own KV)
Media
S3 + Fastly CDN
Search
Earlybird (Lucene-based)

Research Papers & Key Resources

Rainbird: Realtime Analytics at Twitter
Twitter Engineering · 2011
Twitter's real-time counters system (like/retweet counts). Uses Cassandra with counter columns. Foundational to understanding Twitter's data infrastructure that serves the timeline system.
→ Twitter Blog
The Architecture Twitter Uses to Deal with 150M+ Active Users
High Scalability · 2013
Comprehensive breakdown of Twitter's 2013 architecture: Flock (social graph), Gizzard (sharding), Snowflake (ID generation), Manhattan (NoSQL), and the timeline fanout service.
→ Article
Earlybird: Real-Time Search at Twitter
Busch et al. · Twitter · ICDE 2012
Twitter's real-time search engine based on Lucene. Tweets are indexed in seconds. Uses an inverted index with 100% in-memory segments for recency, and disk for older tweets.
Manhattan: Twitter's Real-Time, Multi-Tenant Distributed Database
Twitter Engineering · 2014
Twitter's custom distributed NoSQL database replacing Cassandra for their most critical workloads. Covers their approach to multi-tenancy, strong consistency, and predictable latency.
→ Blog
16

How Amazon S3 Works

Object Storage Erasure Coding 11-Nine Durability Bitrot Detection

System Overview

S3 stores trillions of objects with 99.999999999% (11 nines) durability by distributing erasure-coded chunks across multiple Availability Zones. An object is split into k data chunks + m parity chunks using Reed-Solomon coding; any m failures can be tolerated. S3's metadata (bucket/key → object location) is stored in a separate distributed database. S3 uses "scrubbing" to detect and repair bitrot proactively.

Durability
99.999999999% (11 nines)
Coding
Reed-Solomon erasure coding
Replication
≥3 Availability Zones
Consistency
Strong read-after-write (2020)
Storage Classes
Standard, IA, Glacier, etc.
Objects Stored
~280 Trillion (2023 estimate)

Research Papers & Key Resources

Amazon S3 — How It Works (AWS Documentation)
Amazon Web Services · 2023
Official documentation covering bucket policies, storage classes, lifecycle rules, and replication. The most authoritative source on S3's operational behavior and architecture decisions.
→ AWS Docs
The Google File System
Ghemawat, Gobioff, Leung · Google · SOSP 2003
GFS inspired S3's chunked storage model. First paper to propose treating component failures as the norm, not the exception. Pioneered chunk replication and a master for metadata. Essential reading for object storage.
→ Free PDF
Erasure Coding in Windows Azure Storage
Huang et al. · Microsoft · USENIX ATC 2012
Azure's erasure coding approach (similar to S3). Reduces storage overhead by 1.33x vs 3x replication while maintaining durability. Describes the Local Reconstruction Code (LRC) design.
→ Free PDF
Building and Operating a Large Distributed Storage System
Amazon re:Invent Talks · 2022
S3 Principal Engineer Marc Brooker's talks on S3's shuffle sharding, availability design, and how they achieve 11 nines durability. Includes the "shuffle sharding" technique to limit blast radius.
→ Werner's Blog
17

How YouTube Supports 2.49 Billion Users with MySQL

Vitess / MySQL Video Transcoding CDN Bigtable

System Overview

YouTube is one of the world's largest MySQL deployments, made possible by Vitess — a sharding middleware that makes MySQL horizontally scalable. Uploaded videos go through a transcoding pipeline (500 formats/resolutions per video) using Google's Zencoder-like service on GCP. Video chunks are served from Google's CDN. YouTube uses Bigtable for view counts and recommendation features, and Spanner for globally consistent data.

Primary DB
MySQL + Vitess (sharding)
Video Storage
Google Colossus (GFS2)
Transcoding
500+ format/resolution combos
Streaming
DASH / HLS (adaptive bitrate)
Counts
Bigtable (approx. at scale)
Reco
Deep Neural Nets (DNN-based)

Research Papers & Key Resources

Vitess: Scalable MySQL for the Web
Sougou · YouTube · VLDB 2012
Original paper on Vitess by its YouTube creator. Describes connection pooling, query routing to shards, row-based replication management, and how it makes MySQL horizontally scalable without application changes.
→ Vitess
Deep Neural Networks for YouTube Recommendations
Covington, Adams, Sargin · Google · RecSys 2016
YouTube's 2-stage recommendation system: candidate generation (recall ~100 videos from millions) + ranking (precision scoring). Uses DNN with user watch history embeddings. One of the most-cited recommendation papers.
→ Google Research
Bigtable: A Distributed Storage System for Structured Data
Chang et al. · Google · TOCS 2008
Bigtable powers YouTube's view counter, metadata, and ML feature store. Wide-column store with row-key range scans. Handles millions of reads/writes per second. Inspired Cassandra, HBase.
→ Free PDF
YouTube View Counts: Accuracy vs Speed Trade-off
YouTube Engineering · 2012
Why YouTube freezes view counts at 300 and batch-updates them. At scale, approximate counting (HyperLogLog) and deferred writes are necessary to avoid hot-row contention on viral videos.
→ YouTube Blog
MPEG-DASH Standard: Dynamic Adaptive Streaming over HTTP
ISO/IEC 23009-1 · 2012
The international standard for adaptive bitrate video streaming used by YouTube. Client switches between quality levels based on available bandwidth, eliminating buffering.
18

How to Scale an App to 10 Million Users on AWS

Auto Scaling CDN / CloudFront Database Tiers Multi-Region

Scaling Progression

1 user: Single EC2 + single RDS.
1K users: Add ELB, move to RDS Multi-AZ, add ElastiCache (Redis).
100K users: Add CloudFront CDN, read replicas on RDS, separate static asset servers, Route 53 health checks.
1M users: RDS Aurora (auto-scaling), S3 for media, SQS/SNS for async, EC2 Auto Scaling Groups, Lambda for edge compute.
10M users: Multi-region Active-Active, Global Accelerator, DynamoDB Global Tables, database sharding or migration to Aurora Global, chaos engineering.

Research Papers & Key Resources

On Designing and Deploying Internet-Scale Services
Hamilton · Microsoft · USENIX LISA 2007
James Hamilton's seminal paper on what it takes to build internet-scale services. Covers operability, health monitoring, graceful degradation, automatic recovery, and avoiding single points of failure. Required reading.
→ Free PDF
The AWS Well-Architected Framework
Amazon Web Services · 2023
AWS's official framework covering Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. The practical blueprint for scaling applications on AWS.
→ AWS
Chaos Engineering: Building Confidence in System Behavior through Experiments
Basiri et al. · Netflix · IEEE Software 2016
Netflix Chaos Monkey and the chaos engineering discipline. At 10M+ users, you must assume failures are happening at all times. This paper establishes the principles of proactive failure injection.
→ IEEE
Harvest, Yield, and Scalable Tolerant Systems
Fox & Brewer · HOTOS 1999
Introduces the harvest/yield framing: degrade harvest (return partial data) before yield (availability) drops. Real-world application of CAP theorem for 10M-user systems.
→ Free PDF

Showing topics related to messaging, streaming, and event-driven architecture.

Showing topics related to storage, databases, and data systems.

Showing topics related to platforms and consumer applications.

Showing topics related to infrastructure, cloud, and large-scale systems.

Distributed Database Systems — Deep Dive

Consensus CAP Theorem Replication Sharding

Foundational Concepts

The CAP Theorem (Brewer, 2000) states a distributed system cannot simultaneously guarantee Consistency, Availability, and Partition Tolerance. In practice, network partitions are unavoidable, so systems choose between CP (strong consistency, may be unavailable) or AP (always available, may be inconsistent). Most modern systems are tunable between the two.

The Essential 10 Papers — Every Distributed Systems Engineer Must Read

1. The Google File System (GFS)
Ghemawat, Gobioff, Leung · Google · SOSP 2003
Introduced fault-tolerant distributed file storage on commodity hardware. Chunked storage (64MB), master-slave coordination, pipelined replication. Inspired HDFS, Amazon S3's architecture. The paper that started modern distributed storage.
→ Free PDF
2. MapReduce: Simplified Data Processing on Large Clusters
Dean & Ghemawat · Google · OSDI 2004
Programming model for parallel processing of large datasets. Map phase (transform) → shuffle (redistribute by key) → Reduce (aggregate). Directly inspired Apache Hadoop. Revolutionized big data processing.
→ Free PDF
3. Bigtable: A Distributed Storage System for Structured Data
Chang et al. · Google · TOCS 2008
Wide-column NoSQL database. Data model: (row key, column family:qualifier, timestamp) → value. Tablets split at 200MB, assigned to Tablet Servers via Master. Inspired Cassandra, HBase, DynamoDB.
→ Free PDF
4. Dynamo: Amazon's Highly Available Key-Value Store
DeCandia et al. · Amazon · SOSP 2007
Eventual consistency, consistent hashing, vector clocks, quorum reads/writes (R+W>N), Merkle trees for anti-entropy, gossip for membership. Every NoSQL database designer references this paper.
→ Free PDF
5. Cassandra: A Decentralized Structured Storage System
Lakshman & Malik · Facebook · SOSP 2009
Combines Bigtable's data model with Dynamo's distribution model. Tunable consistency (ONE/QUORUM/ALL), wide-column, ring topology, no single point of failure. Now the most-deployed open-source NoSQL DB.
→ Free PDF
6. Spanner: Google's Globally-Distributed Database
Corbett et al. · Google · OSDI 2012 / TOCS 2013
First globally distributed database with external consistency. TrueTime API (atomic clocks + GPS) assigns globally ordered timestamps. Enables serializable distributed transactions at global scale. Influenced CockroachDB, YugabyteDB.
→ Free PDF
7. In Search of an Understandable Consensus Algorithm (Raft)
Ongaro & Ousterhout · Stanford · USENIX ATC 2014
Raft was designed to be easier to understand than Paxos. Leader election + log replication + safety guarantees. Used in etcd (Kubernetes), CockroachDB, TiKV. The go-to consensus algorithm for new systems.
→ Free PDF
8. Paxos Made Simple
Lamport · Microsoft Research · 2001
The simplest explanation of the Paxos consensus algorithm. Paxos underpins Google Chubby, Google Spanner, Apache Zookeeper, and many distributed databases. A 2-phase protocol: Prepare/Promise then Accept/Accepted.
→ Free PDF
9. ZooKeeper: Wait-free Coordination for Internet-Scale Systems
Hunt et al. · Yahoo · USENIX ATC 2010
Distributed coordination service: configuration management, leader election, distributed locks, barriers. Used by Kafka (pre-KRaft), HBase, Storm. ZAB protocol (Zookeeper Atomic Broadcast) is a Paxos variant.
→ Free PDF
10. Chord: A Scalable P2P Lookup Service for Internet Applications
Stoica et al. · MIT · SIGCOMM 2001
Distributed Hash Table using consistent hashing on a ring. O(log N) lookup, O(log N) routing table per node. Foundational for understanding how Dynamo, Cassandra, and Riak distribute data without a central coordinator.
→ Free PDF

Advanced Topics — Consistency, Replication & Transactions

CAP Twelve Years Later: How the "Rules" Have Changed
Brewer · IEEE Computer · 2012
Brewer himself refines the CAP theorem: the choice between C and A is not binary, partitions are rare in practice, and systems should focus on minimizing impact rather than picking one side permanently.
→ IEEE
Lamport Clocks — Time, Clocks, and the Ordering of Events in a Distributed System
Lamport · ACM · 1978
Foundational paper on logical clocks. In a distributed system, there's no global clock. Lamport clocks assign logical timestamps to events, establishing a partial order (happens-before relation). Prerequisite for understanding vector clocks.
→ Free PDF
Impossibility of Distributed Consensus with One Faulty Process (FLP)
Fischer, Lynch, Paterson · ACM JACM · 1985
The FLP impossibility result: in an asynchronous system with even one potentially faulty process, there is no deterministic algorithm that guarantees consensus. This is why practical systems use timeouts and randomization.
A Comprehensive Study of Convergent and Commutative Replicated Data Types
Shapiro et al. · INRIA · 2011 (CRDTs)
The seminal CRDT paper. Introduces state-based (CvRDT) and operation-based (CmRDT) replicated data types that converge without coordination. Used in Riak, Redis, Figma, Notion, and Bluesky's data model.
→ Free PDF
The Byzantine Generals Problem
Lamport, Shostak, Pease · ACM TPLS · 1982
Consensus in the presence of traitors (malicious nodes). BFT (Byzantine Fault Tolerant) consensus requires 3f+1 nodes to tolerate f Byzantine failures. Foundation for blockchain consensus (PBFT, HotStuff).
→ Free PDF
Percolator: Large-Scale Incremental Processing Using Distributed Transactions and Notifications
Peng & Dabek · Google · OSDI 2010
Google's distributed transaction layer over Bigtable (used by Google Search's index). Introduces optimistic concurrency via MVCC with a two-phase commit protocol backed by Chubby locks.
→ Free PDF

Modern Distributed Databases — Research & Architecture

CockroachDB: The Resilient Geo-Distributed SQL Database
Taft et al. · Cockroach Labs · SIGMOD 2020
Open-source Spanner-inspired database. Uses Raft for consensus, MVCC for transactions, and a SQL layer on top of a distributed KV store. Describes geo-partitioning for data residency compliance.
→ ACM DL
TiDB: A Raft-based HTAP Database
Huang et al. · PingCAP · VLDB 2020
Hybrid Transactional/Analytical Processing (HTAP) database. TiKV (Raft-based distributed KV) + TiFlash (columnar replica). Enables real-time analytics on fresh transactional data without ETL.
→ Free PDF
Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
Verbitski et al. · Amazon · SIGMOD 2017
Aurora's key innovation: the log IS the database. Redo log records are shipped to 6 storage nodes across 3 AZs; storage nodes apply log records lazily. Eliminates the write-amplification of traditional MySQL replication.
→ ACM DL
F1: A Distributed SQL Database That Scales
Shute et al. · Google · VLDB 2013
Google's distributed SQL database built on Spanner, used for AdWords (billions of dollars). Full SQL, ACID transactions, automatically maintained indexes, optimistic locking. Describes SQL at global scale.
→ Free PDF
Megastore: Providing Scalable, Highly Available Storage for Interactive Services
Baker et al. · Google · CIDR 2011
Google's database bridging the gap between OLTP and scalable NoSQL. Entity groups (mini-databases with ACID transactions) replicated via Paxos. Precursor to Spanner. Powers Gmail, App Engine.
→ Free PDF
CRAQ: Chain Replication with Apportioned Queries
Terrace & Freedman · Princeton · USENIX ATC 2009
Extends chain replication to allow reads at any replica (not just tail), dramatically improving read throughput. Highly relevant for understanding replication strategies in distributed databases.
→ Free PDF

Sharding, Partitioning & Data Distribution

Consistent Hashing and Random Trees
Karger et al. · MIT · STOC 1997
Introduces consistent hashing — when N nodes change to N±1, only 1/N of keys move. Used in every distributed cache and database (Dynamo, Cassandra, Redis Cluster, CDNs).
→ ACM DL
Slicer: Auto-Sharding for Datacenter Applications
Adya et al. · Google · OSDI 2016
Google's automatic sharding system for stateful services. Auto-rebalances load across servers based on observed traffic, avoiding hot shards that plague manual sharding approaches.
→ USENIX
PNUTS: Yahoo!'s Hosted Data Serving Platform
Cooper et al. · Yahoo · VLDB 2008
Yahoo's geographically distributed database. Timeline consistency (per-record sequential writes), regional master per record, pub-sub for replication. Alternative consistency model between Dynamo's AP and Spanner's CP.
→ ACM DL

Essential Books for Distributed Systems

Kleppmann (2017)
Designing Data-Intensive Applications — The bible
Tanenbaum (2017)
Distributed Systems: Principles and Paradigms
van Steen (2023)
Distributed Systems (free online)
Coulouris (2011)
Distributed Systems: Concepts & Design
Lamport (multiple)
All papers at lamport.azurewebsites.net
Martin Fowler
Patterns of Enterprise Application Architecture

Key Research Venues to Follow

SOSP
ACM Symposium on Operating Systems Principles
OSDI
USENIX Symposium on OS Design & Impl.
VLDB
Very Large Data Bases Conference
SIGMOD
ACM Management of Data Conference
NSDI
USENIX Networked Systems Design & Impl.
EUROSYS
European Conference on Computer Systems