Interactive Explainer

System Design 101 How Every App Handles Millions of Users

Netflix, Uber, Discord, Google — they all follow the same pattern. One click. Seven layers. Under 100 milliseconds.

💻

Your Browser

📖

DNS

🌐

CDN

⚖️

Load Balancer

⚙️

API Server

⚡

Cache

🗄️

Database

✅

Response

Architecture Map Scale the App Tradeoffs Latency Race FAQ

Scroll to explore ↓

Every app you use — Netflix, Uber, Instagram, Google — follows the same fundamental architecture. A request leaves your browser, hops through DNS servers, CDN edge nodes, load balancers, and API gateways before hitting a database or cache and returning with your data. The entire journey takes under 100 milliseconds.

System design is the art of choosing the right components for the right job. Netflix uses Cassandra for viewing history because it needs availability over consistency — a stale “Continue Watching” position is annoying, but Netflix being down is unacceptable. They use MySQL for billing because ACID transactions are non-negotiable when money is involved. Discord stores trillions of messages in ScyllaDB after migrating from Cassandra dropped their p99 latency from 200ms to 5ms.

This page covers the complete system design landscape. You won't just read about these concepts — you'll interact with them, break them, and build intuition that sticks.

The Living Architecture Map

Every web app follows this blueprint. Hover to see connections. Click any component to learn what it does and who uses it.

You

Network

Application

Data

Why does every app look like this?

It wasn't always this way. In 2004, Facebook ran on a single MySQL database. Twitter launched in 2006 on a single Ruby on Rails server. Instagram handled its first million users with just 3 engineers and a handful of AWS instances.

But as users grew — 10x, 100x, 1000x — each company hit the same walls in the same order. The database became a bottleneck. Latency spiked for distant users. A single server failure meant total downtime. And one by one, they added the same components: load balancers, caches, CDNs, message queues, and database replicas.

The architecture map above isn't a design philosophy. It's convergent evolution — every team solving the same scaling problems arrives at the same answer, independently.

Scale the App

Start with one server. Drag the slider to add users. Watch things break — then fix them.

Users

1–10

1–101001,00010,000100,0001,000,00010,000,000

What breaks

No problems. Life is good.

Solution: Single Server

One machine runs everything — your app, database, and file storage. A $5/month DigitalOcean droplet.

“Every startup begins here. One machine doing everything.”

Real world: Most side projects and MVPs run on a single server. It handles more than you think.

Architecture

Server

Latency50ms

Throughput100 req/s

Uptime99%

Cost$20/mo

The $20 to $500K journey

Notice the cost column. At 10 users, you're paying $20/month for a DigitalOcean droplet. At 10 million users, you're paying $500K/month for a fleet of servers, databases, caches, CDNs, and a team of 50+ engineers to keep it all running.

This is why premature optimization is a trap. Don't shard your database at 1,000 users. Don't add Kafka before you need async processing. Each component adds operational complexity — monitoring, debugging, on-call rotations. Add them when the pain of NOT having them exceeds the pain of maintaining them.

Stack Overflow is the canonical example: 1.3 billion page views per month on 9 web servers and 2 SQL Servers. They got absurdly far with vertical scaling, aggressive caching, and excellent engineering before touching horizontal scaling. The lesson? Simple architectures that you deeply understand will outperform complex architectures that nobody fully grasps.

How Real Companies Actually Build This

Theory is one thing. Here's how Netflix, Discord, and Uber actually chose their tech stacks — and the hard-won reasons behind each decision.

300M+ subscribers, 15% of global internet traffic

~1,000 microservices

2 trillion Kafka events/day

30M+ cache reads/sec

18,000+ CDN servers inside ISPs

Key Insight

Netflix doesn't use one database. They use the RIGHT database for each job. Cassandra for availability, MySQL for consistency, ElasticSearch for search, EVCache for speed.

Source: Netflix Tech Blog (netflixtechblog.com)

Three lessons from these architectures

1. Nobody uses just one database. Netflix uses Cassandra AND MySQL AND ElasticSearch AND EVCache. Uber uses MySQL AND Cassandra AND Redis. Each database excels at something and fails at something else. The skill is matching the right database to the right data.

2. Microservices are a consequence, not a goal. Uber has 4,000 microservices because they have 10,000 engineers. Shopify is a monolith serving 2 million stores. The question isn't “should we use microservices?” — it's “is our team large enough that a monolith slows us down?”

3. Every decision is a tradeoff. Discord chose ScyllaDB over Cassandra: faster, but a smaller community. Netflix chose to build their own CDN: cheaper at scale, but massive upfront investment. There's no universally correct architecture — only the right architecture for your constraints.

Every Concept at a Glance

16 building blocks that every system uses. Click any card to see when to use it — and when not to. Filter by layer to focus.

Click any card to see when to use it and when not to.

The Tradeoffs You Can't Escape

There are no right answers in system design. Only tradeoffs. Click each side to see who chose what — and why.

The best engineers don't pick the “right” technology. They pick the right tradeoff for their specific problem.

Thinking in tradeoffs

Junior engineers ask “what's the best database?” Senior engineers ask “what are we optimizing for?”

This mindset shift is the most important thing system design teaches you. There is no “best” technology. PostgreSQL is excellent — until you need 100,000 writes per second across multiple continents. Cassandra handles that beautifully — until you need a JOIN. Redis is blindingly fast — until the server loses power and your cached data vanishes.

The six tradeoffs above aren't just interview trivia. They're the lens through which every architectural decision gets made. When Amazon chose DynamoDB (AP) for their shopping cart, they decided that an available but slightly stale cart was better than an unavailable but perfectly consistent one. When banks choose PostgreSQL (CP), they decide that rejecting a transaction is better than processing a wrong one.

The Numbers That Matter

Not all operations are created equal. The gap between RAM and a cross-continent packet is 1,500,000x. This race makes it visceral.

L1 cache read

L2 cache read

RAM read

SSD random read

Datacenter roundtrip

SSD read 1 MB

HDD seek

Cross-continent packet

Throughput: How Fast Can You Move Data?

HDD sequential

30 MB/s

Network (1 Gbps)

100 MB/s

SSD sequential

1 GB/s

RAM sequential

4 GB/s

SSD is 33x faster than HDD. This is why every database guide says “use SSDs.”

Every architectural decision comes back to these numbers

Why does every company use Redis? Because RAM is 1,500x faster than SSD. Why do CDNs exist? Because a cross-continent packet takes 150ms but an edge server nearby takes 1ms. Why does Netflix build their CDN boxes inside ISPs? Because even one network hop matters when you're serving 15% of global internet traffic.

Jeff Bezos famously said that every 100ms of latency costs Amazon 1% of sales. Google found that adding 500ms to search results dropped traffic by 20%. These aren't abstract numbers — they're the reason billion-dollar companies invest millions in infrastructure to shave off milliseconds.

When you're doing back-of-envelope calculations in a system design interview — or in real life — these latency numbers are your foundation. Memory is fast. Disk is slow. Network is slower. And crossing the globe is an eternity in computer time.

Go Deeper

This was the bird's eye view. Each topic below gets its own interactive deep dive with hands-on simulations. All seven are live — follow in order or jump to any topic.

Page 2: The Network Layer

DNS, HTTP, Load Balancers, CDN, Proxies, WebSockets — every hop explained.

Explore →

Page 3: The Data Layer

SQL vs NoSQL, caching strategies, message queues, replication, object storage.

Explore →

Page 4: Distributed Systems

CAP theorem, consistent hashing, rate limiting, fault tolerance patterns.

Explore →

Page 5: Security & Authentication

TLS, OAuth, JWT, common attacks, Zero Trust — how systems protect themselves.

Explore →

Page 6: Monitoring & Observability

Golden signals, SLOs, distributed tracing, alerting, incident response.

Explore →

Page 7: The Interview Framework

RESHADED framework, estimation gym, latency numbers, common mistakes.

Explore →

Page 8: Design Netflix

Full case study. Every concept from the series applied to one real system.

Explore →

Frequently Asked Questions

What is system design?+

System design is the process of defining the architecture, components, and data flow of a system to meet specific requirements. It's about deciding which databases, caches, queues, and servers to use — and how they connect — to handle millions of users reliably.

Why do I need to learn system design?+

If you're building anything beyond a toy project, you'll hit scaling problems. System design teaches you how to solve them before they happen. It's also one of the most common interview topics at companies like Google, Meta, Amazon, and Netflix.

What's the difference between a load balancer and an API gateway?+

A load balancer distributes traffic across multiple servers. An API gateway does that plus authentication, rate limiting, request transformation, and routing to different services. Netflix's Zuul acts as both — it routes 2+ billion API requests/day while handling auth and canary deployments.

When should I use SQL vs NoSQL?+

Use SQL (PostgreSQL, MySQL) when you need complex joins, ACID transactions, or strict schemas — like billing or user accounts. Use NoSQL (Cassandra, MongoDB, DynamoDB) when you need flexible schemas, massive write throughput, or horizontal scaling — like activity feeds or IoT data.

What is the CAP theorem?+

In a distributed system, you can only guarantee two of three: Consistency (everyone sees the same data), Availability (every request gets a response), and Partition tolerance (system works despite network failures). Since network partitions are unavoidable, you're really choosing between consistency and availability.

Why does Netflix use multiple databases?+

Different data has different needs. Netflix uses Cassandra for viewing history (needs availability and massive write throughput), MySQL for billing (needs ACID transactions — wrong charges are unacceptable), ElasticSearch for search (needs full-text ranking), and EVCache for hot data (needs sub-millisecond reads).

What is caching and why is it so important?+

Caching stores frequently accessed data in memory (RAM) instead of reading from disk (database). RAM is 100x faster than SSD. Netflix's EVCache handles 30M+ reads per second. Without caching, they'd need 100x more database capacity.

How do I start learning system design?+

Start with the fundamentals: understand what each component does (DNS, load balancer, cache, database, queue) and WHY it exists. Then study real architectures — how Netflix, Uber, and Discord built their systems. Finally, practice designing systems yourself, starting with simple ones like URL shorteners.

Sources & References

Jeff Dean & Peter Norvig — Latency Numbers Every Programmer Should Know

Donnemartin — System Design Primer (github.com/donnemartin/system-design-primer)

ByteByteGo — System Design 101 (github.com/ByteByteGoHq/system-design-101)

Netflix Tech Blog (netflixtechblog.com) — Architecture, EVCache, Zuul, Chaos Engineering

Discord Engineering Blog — Scaling to Trillions of Messages

Uber Engineering Blog — Load Balancing, Microservices Architecture

Stack Overflow Architecture (nickcraver.com) — Performance and Scaling