How URL Shortener Works
System Overview
A URL shortener converts long URLs into short slugs (6–8 characters), stores the mapping in a fast key-value store, and redirects HTTP 301/302 on lookup. Core challenges include generating unique short codes at high write throughput, distributing the lookup globally with low latency (sub-10ms), and handling 100:1 read/write ratios at scale.
Key Design Decisions
301 vs 302 Redirect: 301 (permanent) is cached by browsers, reducing load on servers but breaking analytics. 302 (temporary) routes every request through your servers, enabling click tracking but increasing server load.
Approach 1 — Counter-based: Use a global distributed counter (ZooKeeper, Redis INCR), convert to Base62. Simple but requires coordination.
Approach 2 — Hash-based: MD5/SHA-256 the long URL, take first 7 chars. Risk of collision requires birthday problem mitigation.
Approach 3 — Snowflake ID: 64-bit ID from datacenter + machine + timestamp + sequence. Decentralized, collision-free, sortable.
Research Papers & Key Resources
How Meta Handles 11.5 Million Serverless Function Calls per Second
System Overview
Meta's serverless platform (used for content ranking, ads, and feed computation) runs thousands of isolated function workers, each responding to event triggers. At 11.5M calls/second they need microsecond-level cold starts, automatic scaling, fault isolation, and efficient bin-packing of compute resources. Meta's internal FaaS differs from AWS Lambda in that it uses persistent warm workers and aggressive pre-warming based on traffic prediction.
Key Insight: The Cold Start Problem
Cold start latency (50–500ms) is the #1 enemy of serverless at scale. Meta solves it via pre-warming (keep N idle workers hot based on traffic forecast), snapshot restore (fork from a frozen process image), and co-location with data dependencies to minimize network round trips.
Research Papers & Key Resources
How Google Docs Works
System Overview
Google Docs uses Operational Transformation (OT) to synchronize edits from multiple users in real time. Every character insertion/deletion is represented as an operation that gets transformed against concurrent operations before being applied. The server is the single source of truth and serializes all operations. Google's internal system is called "Wave" OT (open-sourced).
OT vs CRDTs
OT requires a central server to serialize operations — simpler reasoning but a bottleneck. CRDTs (Conflict-free Replicated Data Types) are commutative by design and need no coordinator, making them suitable for offline-first and P2P collaboration. Both aim to satisfy convergence (all replicas eventually reach same state) and intention preservation.
Research Papers & Key Resources
How Apache Kafka Works
System Overview
Kafka is a distributed commit log. Topics are divided into partitions, each an ordered, immutable sequence of records stored on disk. Producers append to partition tails; consumers read at their own pace using offsets. The key insight: Kafka's performance comes from sequential disk I/O (fast on all storage types), page cache reliance (avoids double buffering), and zero-copy transfer via sendfile() system call.
The Log: Foundation of Modern Data Systems
Jay Kreps (Kafka co-creator) argues in "The Log: What every software engineer should know about real-time data's unifying abstraction" that the append-only log is the simplest yet most powerful abstraction in distributed systems — it unifies databases, stream processors, and messaging queues.
Research Papers & Key Resources
How Stock Exchange Works
System Overview
A stock exchange is one of the most latency-sensitive systems in the world. NYSE processes ~20M trades/day with sub-microsecond matching. The order book (bid/ask) is a priority queue maintained in memory. The matching engine is single-threaded to avoid locks. Modern exchanges use FPGAs and kernel-bypass networking (DPDK, RDMA) to achieve nanosecond latencies.
Research Papers & Key Resources
How Uber Finds Nearby Drivers at 1 Million Requests/Second
System Overview
Uber's dispatch system must find the nearest available driver within milliseconds. The naive approach (iterate all drivers) fails at scale. Uber uses Google's S2 library which maps Earth's surface to 1D using a Hilbert space-filling curve (preserves spatial locality). Each driver location is stored in a key-value store indexed by S2 cell ID. Nearby search = range scan on adjacent cell IDs.
Research Papers & Key Resources
How Reddit Works
System Overview
Reddit's hotness algorithm combines score (upvotes − downvotes) with time decay. The ranking is recomputed periodically and cached aggressively in Memcached. Reddit uses PostgreSQL for persistent storage, Cassandra for vote aggregation, and Kafka for event streaming. The main scaling challenge is "vote bombing" (millions of votes on trending posts) and the frontpage cache invalidation problem.
Research Papers & Key Resources
How Tinder Works
System Overview
Tinder's core challenge: for a given user, return a stack of nearby profiles ordered by desirability, in real time, from a pool of 50M+ active users. Tinder uses a geo-sharded user store (Geosharded MongoDB initially, then Cassandra), an ELO-like attractiveness score for ranking, and a Recs service that pre-computes recommendation stacks. When two users both right-swipe, they match.
Research Papers & Key Resources
How Spotify Works
System Overview
Spotify serves 100M+ songs to 600M+ users. Audio files are stored on GCS (Google Cloud Storage) and served via CDN with adaptive bitrate streaming. Spotify's recommendation engine (Discover Weekly, Daily Mix) uses a hybrid of collaborative filtering (matrix factorization on listen history) and natural language processing (audio analysis + playlist text). The backend runs on GCP with 800+ microservices.
Research Papers & Key Resources
How WhatsApp Works
System Overview
WhatsApp serves 50 billion messages/day with just ~50 engineers at acquisition (2014). The secret: Erlang's actor model gives each user a lightweight process (~2KB), enabling millions of concurrent connections per server. The Signal Protocol provides end-to-end encryption. Messages are persisted in Mnesia (Erlang's built-in distributed DB) and deleted after delivery. Media is stored in S3, not the message store.
Research Papers & Key Resources
How Bluesky Works
System Overview
Bluesky is built on the AT Protocol (Authenticated Transfer Protocol) — a federated social protocol where users own their data. Each user has a Personal Data Server (PDS) holding their signed repository (a Merkle tree of all posts/follows). A Relay ("Firehose") crawls all PDSes and emits a global event stream. AppViews (e.g., the Bluesky app) subscribe to the relay and build indexes for search/feeds. Users can migrate between PDSes without losing followers.
Research Papers & Key Resources
How Slack Works
System Overview
Slack maintains persistent WebSocket connections to every active client. When a message is sent to a channel with 1000 members, Slack must fan out that message to all 1000 connections simultaneously. The "presence" system (online/typing indicators) generates 10x more traffic than messages themselves. Slack uses Vitess (MySQL sharding) for message storage and Elasticsearch for search across billions of messages.
Research Papers & Key Resources
How Stripe Prevents Double Payment Using Idempotent API
System Overview
In payment systems, network retries can cause duplicate charges. Stripe's solution: clients include an idempotency key (a UUID) with each request. Stripe stores the key + response in a database. If a retry comes in with the same key, Stripe returns the cached response without re-executing. The key must be scoped to a user account, stored before the charge, and garbage collected after 24 hours.
The Danger of Retries Without Idempotency
In a distributed system, "did the request succeed?" is unknowable if you lose the connection after sending but before receiving the response. Without idempotency keys, every retry risks a double charge. The fix: generate the idempotency key client-side, store it server-side before acting, and return the same result for any retry with that key.
Research Papers & Key Resources
How ChatGPT Works
System Overview
ChatGPT is a GPT-4-class autoregressive Transformer that generates responses token by token. Inference is computationally dominated by matrix multiplications on GPU tensors. To serve millions of requests, OpenAI runs massive GPU clusters (A100/H100) with tensor parallelism (split model layers across GPUs) and KV-cache (cache K/V attention matrices to avoid recomputation per token). Responses are streamed via Server-Sent Events (SSE).
Research Papers & Key Resources
How Twitter Timeline Works
System Overview
Twitter's timeline is a merge of tweets from all accounts you follow, sorted by recency. Twitter uses a hybrid fan-out: when a normal user tweets, their tweet is immediately pushed (fan-out on write) to all followers' Redis timeline caches. But for celebrities (Obama, Beyoncé — millions of followers), fan-out would take minutes. So Twitter uses fan-out on read for celebrities, merging their tweets at query time.
Research Papers & Key Resources
How Amazon S3 Works
System Overview
S3 stores trillions of objects with 99.999999999% (11 nines) durability by distributing erasure-coded chunks across multiple Availability Zones. An object is split into k data chunks + m parity chunks using Reed-Solomon coding; any m failures can be tolerated. S3's metadata (bucket/key → object location) is stored in a separate distributed database. S3 uses "scrubbing" to detect and repair bitrot proactively.
Research Papers & Key Resources
How YouTube Supports 2.49 Billion Users with MySQL
System Overview
YouTube is one of the world's largest MySQL deployments, made possible by Vitess — a sharding middleware that makes MySQL horizontally scalable. Uploaded videos go through a transcoding pipeline (500 formats/resolutions per video) using Google's Zencoder-like service on GCP. Video chunks are served from Google's CDN. YouTube uses Bigtable for view counts and recommendation features, and Spanner for globally consistent data.
Research Papers & Key Resources
How to Scale an App to 10 Million Users on AWS
Scaling Progression
1 user: Single EC2 + single RDS.
1K users: Add ELB, move to RDS Multi-AZ, add ElastiCache (Redis).
100K users: Add CloudFront CDN, read replicas on RDS, separate static asset servers, Route 53 health checks.
1M users: RDS Aurora (auto-scaling), S3 for media, SQS/SNS for async, EC2 Auto Scaling Groups, Lambda for edge compute.
10M users: Multi-region Active-Active, Global Accelerator, DynamoDB Global Tables, database sharding or migration to Aurora Global, chaos engineering.
Research Papers & Key Resources
Showing topics related to messaging, streaming, and event-driven architecture.
Showing topics related to storage, databases, and data systems.
Showing topics related to platforms and consumer applications.
Showing topics related to infrastructure, cloud, and large-scale systems.
Distributed Database Systems — Deep Dive
Foundational Concepts
The CAP Theorem (Brewer, 2000) states a distributed system cannot simultaneously guarantee Consistency, Availability, and Partition Tolerance. In practice, network partitions are unavoidable, so systems choose between CP (strong consistency, may be unavailable) or AP (always available, may be inconsistent). Most modern systems are tunable between the two.