System Design — Page 8

How Netflix Serves 300 Million Users

Q: How does Netflix handle 300 million subscribers?

Netflix uses ~1,000 microservices running on AWS, with their own CDN (Open Connect) for video delivery. Video comes from 18,000+ custom servers inside 6,000+ ISPs, not from the cloud. API traffic goes through Zuul gateway. Data is distributed across Cassandra (viewing data), MySQL (billing), ElasticSearch (search), and EVCache (caching). Kafka's Keystone pipeline handles 2 trillion events per day.

Q: Why did Netflix build their own CDN?

At 15% of global internet traffic, renting CDN capacity would cost billions per year. Netflix built Open Connect — custom hardware boxes placed inside ISPs for free. ISPs accept because it reduces their upstream bandwidth by 95%. The video travels one network hop from a box in your ISP's building to your home, instead of crossing the internet.

Q: What happens when you press Play on Netflix?

DNS resolves to the nearest AWS region. Zuul gateway authenticates you. Playback Service checks your plan, device, and network to pick the optimal video file from 1,200+ variants. It returns a manifest of CDN URLs for 4-second chunks. Your device fetches chunks from the nearest Open Connect Appliance. Adaptive bitrate adjusts quality per chunk based on bandwidth.

Q: Why does Netflix use Cassandra instead of PostgreSQL?

Viewing data needs: multi-region replication (users travel), always available (downtime = no streaming), 100K+ writes/sec. Cassandra is AP (Available + Partition Tolerant). Tradeoff: 'Continue Watching' might be 5 seconds stale. But Netflix being DOWN affects 15M+ concurrent users at peak. Availability wins. MySQL is used for billing where consistency is non-negotiable.

Q: What is Netflix's Chaos Monkey?

Chaos Monkey randomly kills production instances during business hours. The philosophy: if you can't survive random failures in controlled conditions, you'll fail during real outages. Netflix expanded this to the 'Simian Army' — Chaos Kong kills entire regions, Latency Monkey injects delays. Result: Netflix survived the 2012 Christmas AWS outage while other services went down.

Q: How does the content pipeline work?

Studios upload master files (200-500 GB for 4K) to Amazon S3. Netflix's encoding pipeline splits each title into chunks and encodes them in parallel: 6 resolutions × 3 codecs × 3 audio formats × 60+ languages = ~1,200 files. Each is quality-scored with VMAF, DRM-encrypted, then pushed to 18,000+ OCA servers during off-peak hours. Total: 6-24 hours from upload to globally available.

Q: How does adaptive bitrate streaming work?

Netflix doesn't send a continuous video stream. It sends 4-second chunks. Each chunk can be encoded at different quality levels (240p to 4K). The client measures available bandwidth and requests the highest quality chunk it can receive without buffering. If bandwidth drops, the next chunk is lower quality. When bandwidth recovers, quality climbs back up.

Q: What is back-of-envelope estimation in system design?

It's quickly calculating infrastructure needs using rough numbers. For Netflix: 300M subscribers × 50% daily active = 150M DAU × 2 hours = 300M viewing hours/day. Peak concurrent = ~10% of DAU = 15M streams. At 5 Mbps average = 75 Tbps peak bandwidth. Each formula builds on the previous one. This helps architects plan capacity before building anything.

Q: How does Netflix handle regional failures?

Netflix practices 'Chaos Kong' — deliberately killing entire AWS regions. Route 53 DNS detects the unhealthy region and reroutes traffic to surviving regions. Cassandra data is already replicated multi-region. Users experience a brief blip (~30-60 seconds of higher latency) then everything returns to normal. Most users don't notice.

Q: Why does one Netflix title exist as 1,200+ files?

Each title is encoded across: 6+ resolution levels (240p to 4K), 3+ codecs (H.264, VP9, AV1), multiple audio formats (stereo, 5.1, Atmos), 30+ audio languages, and 30+ subtitle languages. The combinations multiply. Netflix uses per-chunk complexity analysis — an action scene gets higher bitrate than a dialogue scene. AV1 codec saves 30% bandwidth vs H.264.

The ultimate system design case study. Every concept from the previous 7 pages — scalability, DNS, CDN, microservices, Kafka, Cassandra, circuit breakers, TLS, OAuth, observability — applied to one system at real scale.

ORIGINAL SERIES

System Design: The Show

S1 E5 · The Netflix Architecture

The Request Journey Full Architecture Content Pipeline Chaos Engineering Envelope Calculator Design Decisions Concept Consolidation

Netflix accounts for 15% of all downstream internet traffic worldwide. More than YouTube. More than every social media platform combined. When 300 million subscribers press Play, the request touches DNS, an API gateway, microservices, a custom CDN, multiple databases, and an event pipeline processing 2 trillion events per day. And it all happens in under 2 seconds.

This page is the capstone of the system design series. Every concept from pages 1–6 — scalability, DNS, load balancing, CDN, caching, databases, queues, consistent hashing, circuit breakers, TLS, OAuth, observability — appears here, applied to a real system at massive scale. Netflix isn't just using these concepts. They invented several of them. Chaos Monkey, EVCache, Zuul, and Open Connect all came from Netflix engineering.

But most architecture diagrams only show the user-facing flow: you press Play, video arrives. They skip the other half — how content gets from a studio's editing suite to 18,000+ servers inside ISPs worldwide. This page covers both sides: the request journey AND the content pipeline. Plus the estimation math that system design interviews actually test.

The Request Journey — What Happens When You Press Play

One click. Seven systems. Each step links back to a concept from the previous pages. Click through to trace the full journey.

DNS Resolution

Where is Netflix?

Page 2: DNS

Your device asks DNS: 'Where is netflix.com?' Route 53 responds with the IP of the nearest AWS region.

Latency-based routing. Mumbai → Asia Pacific. London → EU-West. Not just 'here's an IP' but 'here's the BEST IP for you.'

1 / 7

One click. Seven systems. Under 2 seconds.

When you press Play, your device doesn't download the entire movie. It receives a manifest — a list of URLs for 4-second video chunks at various quality levels. Your device then fetches chunks one at a time from the nearest Open Connect Appliance, adapting quality per chunk based on your current bandwidth.

This is why Netflix never fully buffers. It only needs to be a few chunks ahead. If your WiFi drops, it plays the next chunk at lower quality rather than buffering. When bandwidth recovers, quality climbs back up. The user barely notices. This adaptive approach is why Netflix can stream 4K globally — they don't need consistent 25 Mbps, just enough bandwidth to stay a few chunks ahead.

The investment in AV1 codec is strategic. AV1 delivers the same visual quality as H.264 at 30% less bandwidth. At Netflix's scale (500+ petabytes per day), 30% less bandwidth means hundreds of millions of dollars in CDN and ISP costs saved annually. This is why Netflix co-founded the Alliance for Open Media to develop AV1.

The Full Architecture — Every Component Connected

20 components across 6 layers. Each node is a system design concept from the previous pages — applied at Netflix scale. Click any node to explore.

~1,000

Microservices

18,000+

OCA Servers

100+

Kafka Clusters

6,000+

ISP Partners

The 7-year migration that changed everything

In 2007, Netflix was a monolith running in its own data center. A single database corruption incident in August 2008 took down DVD shipping for 3 days. Engineers couldn't isolate the problem because everything was entangled. That was the turning point.

The migration to AWS and microservices took 7 years (2008-2015). They didn't do a big-bang rewrite. They strangled the monolith — new features went into microservices, while they gradually extracted existing functionality. By 2015, the last monolith service was retired. Today, ~1,000 microservices communicate via gRPC and Kafka, each owned by a small team, each deployed independently 100+ times per day.

In 2025, Netflix expanded its modernization through a zero-configuration service mesh built on Envoy proxies, enabling unified resilience, routing, and observability without application-level libraries. They also adopted GraphQL Federation for more efficient API composition across hundreds of services.

The Content Pipeline — From Studio to Your Screen

Before you press Play, every title passes through a 6-stage pipeline. One movie becomes 1,200+ files, pushed to 18,000+ servers worldwide. This is the backend story most architecture diagrams skip.

1. Content Ingestion

Studio uploads master file

1 / 6

Studios upload original content (typically 4K or 8K master files) to Netflix's cloud storage on Amazon S3. Each upload includes metadata: title, cast, audio tracks, subtitle files, and content ratings.

A single 2-hour 4K HDR master can be 200-500 GB. Netflix receives thousands of new assets daily across 190+ countries.

200-500 GB per 4K masterAmazon S3 storageIncludes metadata + subtitles

Why 1,200+ Files? — The Encoding Math

6Resolutions

240p

360p

480p

720p

1080p

3Codecs

H.264

VP9

AV1

3Audio Formats

Stereo

5.1 Surround

Dolby Atmos

60+Languages

30+ audio

30+ subtitles

6 resolutions × 3 codecs × 3 audio × 60+ languages = ≈ 1,200+ files per title

Stranger Things S4E1 at 4K HDR ≈ 7 GB/hour · At 480p ≈ 700 MB/hour · AV1 saves 30% vs H.264

End-to-End Timeline

Studio Uploadhours→

Transcode2-8 hrs→

QA + VMAF1-4 hrs→

DRM Encrypt< 1 hr→

CDN Fill2-12 hrs→

Live✓

Total: 6-24 hours from studio upload to globally available. Pre-scheduled releases (like Squid Game S2) are distributed to OCAs days in advance.

Why Netflix builds its own hardware

Open Connect Appliances (OCAs) are custom-built by Netflix. Each box packs 100-200 TB of SSD storage, optimized for sequential read throughput. Netflix gives these servers to ISPs for free. The deal is straightforward: ISPs get 95% reduction in upstream bandwidth costs (a Stranger Things binge stays on their local network), and Netflix gets single-hop delivery to users.

The economics are decisive. At 15% of global internet traffic, renting CDN capacity from Akamai at market rates would cost billions per year. Building custom hardware costs millions. The back-of-envelope math made the decision obvious — build, don't rent. Today, 18,000+ OCAs sit inside 6,000+ ISPs across 190+ countries. Nearly all video traffic stays within the ISP's network.

Content distribution is intelligent: not every OCA gets every title. Netflix's control plane predicts which titles will be popular in each region and pre-caches them during off-peak hours. A global hit like Squid Game goes everywhere. A niche documentary caches primarily in regions where it's likely to be watched — but also in cities with relevant diaspora communities.

Chaos Engineering — Breaking Things on Purpose

Netflix runs Chaos Monkey in PRODUCTION. During business hours. Every weekday. Engineers have no idea which instance will die next. This forces every team to build resilient services.

Chaos Monkey

Playback

Profile

Recommend

Billing

Encoding

A/B Test

Analytics

Chaos Kong — Region Killer

US-East (Virginia)

Healthy

US-West (Oregon)

Healthy

EU-West (Dublin)

Healthy

The Simian Army

Chaos Monkey

Kills random production instances

Chaos Kong

Kills entire AWS regions

Latency Monkey

Injects artificial network delay

Conformity Monkey

Finds instances not following best practices

Security Monkey

Finds security violations and misconfigs

Janitor Monkey

Cleans up unused resources

“The best way to avoid failure is to fail constantly — on your own terms.”

Chaos engineering is a business decision, not a technical one

Netflix's revenue is $39+ billion per year. At 300 million subscribers paying an average of $11/month, one hour of global downtime costs roughly $2 million in direct revenue — plus immeasurable brand damage. The 2008 DVD outage lasted 3 days. That kind of failure at today's scale would be catastrophic.

The Chaos Monkey investment paid off definitively on Christmas Eve 2012. AWS Elastic Load Balancing had a major outage. Services across the internet went down — Reddit, Heroku, and numerous others. Netflix stayed up. Their circuit breakers detected the ELB failures, stopped routing through affected paths, and continued streaming from healthy instances.

Netflix now processes 38 million QoE (Quality of Experience) events per second during live events like the Jake Paul vs. Mike Tyson fight, which streamed to 60+ million concurrent viewers. Their observability platform Atlas processes 17 billion metrics and 700 billion distributed traces daily on 1.5 petabytes of log data.

Back-of-Envelope Calculator — The Math Explained

System design interviews require estimation skills. Here's how engineers calculate Netflix's infrastructure needs. Adjust the sliders, then click any metric to see the formula and reasoning.

Total Subscribers

300M

Daily Active %

50%

Avg Watch Time (hrs/day)

Avg Bitrate (Mbps)

5 Mbps

Bitrate → GB/hour Conversion

1.5 Mbps

480p mobile

= 0.68 GB/hr

1.5×3600/8/1000

5 Mbps

1080p HD

= 2.25 GB/hr

5×3600/8/1000

15 Mbps

4K SDR

= 6.75 GB/hr

15×3600/8/1000

25 Mbps

4K HDR

= 11.25 GB/hr

25×3600/8/1000

What If?

Why back-of-envelope matters in interviews and architecture

System design interviews at Google, Meta, Amazon, and Netflix all include estimation questions. Not because they want exact numbers — they want to see you think about scale. Can you reason about whether a single server suffices, or whether you need a distributed system? Can you identify the bottleneck before building anything?

The calculator above shows each formula and its derivation. The key insight: start with users, derive everything else. Subscribers → DAU → concurrent streams → bandwidth → storage → API calls → events. Each step multiplies. A 2x increase in subscribers doesn't just mean 2x servers — it means 2x bandwidth, 2x storage, 2x events, and potentially 4x peak load if the growth concentrates in one timezone.

Design Decisions — The WHY Behind Every Choice

Five architectural decisions that define Netflix. Click each card to reveal the reasoning.

Every Concept — Applied at Netflix Scale

This is the consolidation. Every system design concept from the previous 6 pages — from horizontal scaling to distributed tracing — appears in Netflix's architecture. Click any concept to see exactly where and how Netflix uses it.

36 system design concepts from 6 pages — all used in one system.

This is why Netflix is the definitive system design case study.

How does Netflix handle 300 million subscribers?+

Netflix uses ~1,000 microservices running on AWS, with their own CDN (Open Connect) for video delivery. Video comes from 18,000+ custom servers inside 6,000+ ISPs, not from the cloud. API traffic goes through Zuul gateway. Data is distributed across Cassandra (viewing data), MySQL (billing), ElasticSearch (search), and EVCache (caching). Kafka's Keystone pipeline handles 2 trillion events per day.

Why did Netflix build their own CDN?+

At 15% of global internet traffic, renting CDN capacity would cost billions per year. Netflix built Open Connect — custom hardware boxes placed inside ISPs for free. ISPs accept because it reduces their upstream bandwidth by 95%. The video travels one network hop from a box in your ISP's building to your home, instead of crossing the internet.

What happens when you press Play on Netflix?+

DNS resolves to the nearest AWS region. Zuul gateway authenticates you. Playback Service checks your plan, device, and network to pick the optimal video file from 1,200+ variants. It returns a manifest of CDN URLs for 4-second chunks. Your device fetches chunks from the nearest Open Connect Appliance. Adaptive bitrate adjusts quality per chunk based on bandwidth.

Why does Netflix use Cassandra instead of PostgreSQL?+

Viewing data needs: multi-region replication (users travel), always available (downtime = no streaming), 100K+ writes/sec. Cassandra is AP (Available + Partition Tolerant). Tradeoff: 'Continue Watching' might be 5 seconds stale. But Netflix being DOWN affects 15M+ concurrent users at peak. Availability wins. MySQL is used for billing where consistency is non-negotiable.

What is Netflix's Chaos Monkey?+

Chaos Monkey randomly kills production instances during business hours. The philosophy: if you can't survive random failures in controlled conditions, you'll fail during real outages. Netflix expanded this to the 'Simian Army' — Chaos Kong kills entire regions, Latency Monkey injects delays. Result: Netflix survived the 2012 Christmas AWS outage while other services went down.

How does the content pipeline work?+

Studios upload master files (200-500 GB for 4K) to Amazon S3. Netflix's encoding pipeline splits each title into chunks and encodes them in parallel: 6 resolutions × 3 codecs × 3 audio formats × 60+ languages = ~1,200 files. Each is quality-scored with VMAF, DRM-encrypted, then pushed to 18,000+ OCA servers during off-peak hours. Total: 6-24 hours from upload to globally available.

How does adaptive bitrate streaming work?+

Netflix doesn't send a continuous video stream. It sends 4-second chunks. Each chunk can be encoded at different quality levels (240p to 4K). The client measures available bandwidth and requests the highest quality chunk it can receive without buffering. If bandwidth drops, the next chunk is lower quality. When bandwidth recovers, quality climbs back up.

What is back-of-envelope estimation in system design?+

It's quickly calculating infrastructure needs using rough numbers. For Netflix: 300M subscribers × 50% daily active = 150M DAU × 2 hours = 300M viewing hours/day. Peak concurrent = ~10% of DAU = 15M streams. At 5 Mbps average = 75 Tbps peak bandwidth. Each formula builds on the previous one. This helps architects plan capacity before building anything.

How does Netflix handle regional failures?+

Netflix practices 'Chaos Kong' — deliberately killing entire AWS regions. Route 53 DNS detects the unhealthy region and reroutes traffic to surviving regions. Cassandra data is already replicated multi-region. Users experience a brief blip (~30-60 seconds of higher latency) then everything returns to normal. Most users don't notice.

Why does one Netflix title exist as 1,200+ files?+

Each title is encoded across: 6+ resolution levels (240p to 4K), 3+ codecs (H.264, VP9, AV1), multiple audio formats (stereo, 5.1, Atmos), 30+ audio languages, and 30+ subtitle languages. The combinations multiply. Netflix uses per-chunk complexity analysis — an action scene gets higher bitrate than a dialogue scene. AV1 codec saves 30% bandwidth vs H.264.

Series Complete

You've covered the full system design stack: from the big picture to networking, databases, distributed patterns, security, monitoring, interview technique, and a real-world case study. Every concept connects.

1. System Design 101 2. The Network Layer 3. The Data Layer 4. Distributed Systems 5. Security 6. Monitoring 7. Interview Framework8. Design Netflix (You are here)

Sources

Netflix Tech Blog (netflixtechblog.com)
Netflix Open Connect (openconnect.netflix.com)
Mastering Chaos — Netflix Guide to Microservices (InfoQ, Josh Evans)
How Netflix Scales its API with GraphQL Federation (Netflix blog)
Netflix: What Happens When You Press Play? (High Scalability)
Completing the Netflix Cloud Migration (Netflix blog, 2016)
The Netflix Simian Army (Netflix blog)
AV1 Codec Evaluation (Netflix Research)
Netflix Keystone Pipeline — 2 Trillion Events/Day (Netflix TechBlog)
Atlas: Netflix Observability Platform — 17B Metrics/Day
From On-Demand to Live: Netflix Streaming to 100M Devices (InfoQ, 2025)
ByteByteGo — Netflix Architecture Analysis