Designing Instagram or YouTube in a system design interview means solving one of the hardest problems in distributed systems: handling massive amounts of user-generated media while keeping feeds fast, storage cheap, and discovery accurate. This walkthrough covers every layer of the problem, from requirements to scaling trade-offs, with interactive demos at each step.
Understanding the Problem
Instagram is a photo gallery for the entire world. YouTube is a TV station where everyone is both the broadcaster and the audience. What makes them unique is not the complexity of any single feature — it is the combination of extreme scale across four dimensions simultaneously: storage, bandwidth, computation, and social graph.
Think about what happens when you open Instagram. In under a second, your phone downloads and displays a scrollable feed of high-resolution images from hundreds of people you follow. Behind that one-second experience, dozens of systems coordinated: your feed was pre-computed, images were pulled from a nearby CDN edge, your social graph was queried to determine whose posts to show, and a recommendation algorithm decided the ordering. Each of those systems must handle millions of requests per second.
The key challenges break down into four areas:
Storage — billions of photos and videos, growing by terabytes per day, that must persist for years
Delivery — serving those files to billions of users with low latency, which means CDN distribution
Feed generation — deciding what each user sees and pre-computing it before they open the app
Discovery — helping users find new content through search, recommendations, and trending
Requirements Gathering
Every system design interview starts with requirements. For a media platform, we need to distinguish between what the system must do (functional) and how well it must do it (non-functional).
Functional Requirements
Upload media — users can upload photos and videos with captions
View feed — each user sees a personalized timeline of posts from people they follow
Social interactions — like, comment, follow/unfollow other users
Search — find posts, users, and hashtags by text
User profiles — view a user’s uploaded media and follower/following counts
Stories and Reels — ephemeral and short-form content (bonus scope)
Media processing — resize, compress, transcode uploaded content automatically
Non-Functional Requirements
High availability — 99.99% uptime (52 minutes of downtime per year)
Low latency — feed loads in under 300ms, media begins streaming in under 500ms
Storage efficiency — store petabytes of media cost-effectively
Scalability — support 1B+ users with linear horizontal scaling
CDN delivery — media served from edge locations, not origin servers
Scale Assumptions
Metric
Instagram
YouTube
Total users
1B+
2B+
Daily active users
500M
1B
Daily uploads
100M photos
500 hours/min
Daily views
500M+
5B+
Avg file size
2 MB (photo)
50 MB (video)
Out of Scope
Payments, ads system, content moderation (mentioned in trade-offs), messaging, live streaming.
Capacity Estimation
Capacity estimation is where you show you can do back-of-the-napkin math. The goal is not precision — it is showing you understand the orders of magnitude. The numbers below are rough estimates based on public data. The exact values matter less than the method: multiply users by actions by size, then convert to useful units.
Instagram Estimates
Storage: 100M photos/day x 2 MB = 200 GB/day = 73 TB/year
Read QPS: 1B DAU / 86,400 x 3 = ~35,000 QPS average, ~100,000 peak
Write QPS: 43,200 uploads/day / 86,400 = ~0.5 QPS (but each upload is large and triggers transcoding)
Parameters
Total users1.00BM
Daily active users500.0MM
Photos per day100.0MM
Avg photo size2 MB
Videos per day10.0MM
Avg video size50 MB
Daily Storage
700.0 MB
Yearly Storage
255.5 GB
Step-by-step calculation
Daily storage700.0 MB
100.0M photos x 2 MB + 10.0M videos x 50 MB
Yearly storage255.5 GB
700.0 MB x 365 days
Read QPS (avg)17,361 QPS
500.0M DAU / 86,400s x 3 peak
Write QPS1,273 QPS
110.0M uploads / 86,400s
Daily bandwidth (origin)10.0 PB
500.0M DAU x 10 views x 2 MB (10% after CDN)
API Design
Clean API design shows the interviewer you think about the interface before the implementation. Each endpoint should have a clear purpose, well-defined request/response format, and consider edge cases like pagination, rate limiting, and authentication.
Key Endpoints
POST /api/media/upload — Upload media (multipart or presigned URL flow)
GET /api/feed — Paginated feed using cursor-based pagination
POST /api/media/{id}/like — Toggle like (idempotent)
POST /api/media/{id}/comments — Add a comment
GET /api/search?q=... — Full-text search with faceted results
GET /api/users/{id} — User profile with media and social counts
POST /api/users/{id}/follow — Follow/unfollow
Design Decisions
Cursor-based pagination (not offset-based) for the feed. Offsets break when new posts are inserted between pages. A cursor encodes the sort position (e.g., base64 of created_at + post_id) so results are stable even as new content arrives.
Presigned URLs for media uploads. Instead of proxying a 50 MB video through API servers, the client gets a temporary S3 URL and uploads directly. The API only handles metadata.
Idempotent operations for likes and follows. Sending “like” twice should not create two likes or return an error.
Large files use presigned S3 URL. Client uploads directly to S3, then confirms with API.
Database Schema Design
The schema needs to support the core operations efficiently: writing posts, reading feeds, tracking social relationships, and aggregating engagement metrics.
comments(parent_id)Threaded replies under a parent comment
Media Storage Pipeline
The media pipeline is what makes these platforms fundamentally different from a typical web app. You are not storing small JSON records — you are ingesting, processing, and distributing enormous files at massive scale.
Upload Flow
Client requests a presigned URL from the API
Client uploads the file directly to object storage (S3)
Client sends a confirmation request with metadata (caption, type)
API writes post metadata to the database
A message is published to a queue for processing
Processing Pipeline
The processing pipeline runs asynchronously. It does not block the upload response.
For photos:
Validate format (JPEG, PNG, HEIC)
Extract EXIF metadata (camera, location, date)
Generate multiple sizes: original, large (1080px), medium (640px), thumbnail (150px)
Compress to WebP for better compression
Store all variants in object storage
Update CDN configuration
For video:
All photo processing steps plus:
Transcode to multiple resolutions (4K, 1080p, 720p, 480p, 360p)
Generate adaptive bitrate manifests (HLS/DASH)
Extract keyframes as thumbnails
Content fingerprinting for copyright detection
This can take 5-30 minutes per video for long content
CDN Delivery
CDN (Content Delivery Network) is non-negotiable at this scale. Without it, every image request would travel to your origin servers, which are likely in one region. A user in Tokyo would fetch every image from Virginia.
The flow: Origin (S3) -> CDN PoPs (Points of Presence in 50+ regions) -> User. The first request for an image hits the origin. The CDN caches it. All subsequent requests in that region hit the CDN edge, typically within 20-50ms.
1
Upload
2
Object Storage
3
Processing
4
Transcoding
5
CDN Distribution
6
Ready
STAGE DETAIL
Press "Start Pipeline" to see how a video is uploaded, processed, and delivered to users via CDN.
TRANSCODE PROGRESS
4K
3840x216020 Mbps
1080p
1920x10808 Mbps
720p
1280x7205 Mbps
480p
854x4802.5 Mbps
360p
640x3601 Mbps
News Feed Generation
The news feed is the hardest problem in media platform design. Not because it is algorithmically complex, but because it must be fast for billions of users while the underlying data changes constantly.
The Fan-out Problem
When a user with 1 million followers posts a photo, where does that post go? If each follower has a pre-computed feed cache, you need to write that post to 1 million caches. That is 1 million write operations for a single post. This is the fan-out problem.
There are three strategies:
Fan-out on Write (Push Model)
When a user creates a post, immediately write it to the feed cache of every follower.
Pros: Feed reads are instant (just read from cache). Zero query time.
Cons: Celebrity posts (millions of followers) cause a write storm. A post from an account with 100M followers requires 100M cache writes.
Best for: Users with fewer than ~10,000 followers.
Fan-out on Read (Pull Model)
Do not pre-compute anything. When a user opens their feed, query the database for recent posts from all users they follow.
Pros: Zero write cost for posting. Handles celebrities trivially.
Cons: Feed loads are slow (must query hundreds of users). Load spikes when many users open the app simultaneously.
Best for: Celebrity accounts or systems with few active readers.
Hybrid Approach (Push + Pull)
The production solution used by Instagram and similar platforms:
For users with fewer than N followers (e.g., 10,000), fan out on write to their followers’ feed caches
For celebrity accounts, do not fan out. When followers open their feed, pull the celebrity’s posts on demand
Maintain a “fan-out threshold” that can be tuned per account
This means 99% of posts (from normal users) get the fast-read benefit of push. The 1% from celebrities use pull, which is acceptable because those followers expect to wait a moment for celebrity content.
Followers1.0K
When a user posts, copy the post to ALL followers' feed caches immediately.
Write operations (push to feed cache)1.0K writes
Write Cost
1.0K
ops per post
Read Cost
0
ops per feed open
Feed Latency
~0ms
per feed load
Push Model
Active
Fast feed reads. High write cost for celebrities. Cache invalidation on delete.
Pull Model
Not used
Zero write cost. Slow feed reads (query N users). Better for celebrity accounts.
Best For
Small creators (<10K followers)
Feed Ranking
Modern feeds are not purely chronological. They use a relevance score to rank posts:
Engagement rate: likes + comments + shares, normalized by the poster’s follower count
Recency decay: posts lose value over time (exponential decay)
User affinity: how often the viewer interacts with the poster
This scoring happens at write time for pre-computed feeds, or at read time for pull-based feeds. The trade-off is freshness (pre-computed scores become stale) vs. latency (real-time scoring adds query time).
5/5 cache hits
A
alice26m ago
CACHE HIT
Golden hour at the coast
312 likes26 comments
B
bob1h ago
CACHE HIT
New recipe: homemade pasta
586 likes48 comments
C
charlie2h ago
CACHE HIT
City skyline from the rooftop
1,211 likes94 comments
D
diana3h ago
CACHE HIT
Morning coffee ritual
169 likes14 comments
E
eve3h ago
CACHE HIT
Hiking trail through the mountains
463 likes33 comments
How it works
Posts sorted by created_at DESC. Pre-computed feed (fan-out on write) means most posts are served from cache. Cache misses trigger a real-time query to following users' recent posts.
Performance
Cache hit rate100%
Avg latency~15ms
DB queries / page1
Search & Discovery
Search on a media platform has two distinct use cases: finding specific things (a user, a hashtag) and discovering new content (exploration, recommendations).
Full-Text Search with Elasticsearch
The core of search is an inverted index. Instead of scanning every document to find “sunset”, you maintain a mapping from every word to the documents that contain it. This is what Elasticsearch, Solr, and Lucene all implement.
For example, if Document 1 contains “sunset beach” and Document 3 contains “sunset mountain”:
When a user searches “sunset”, you look up the inverted index and immediately get the matching documents. No scanning required.
TF-IDF Scoring
Documents are ranked by TF-IDF (Term Frequency - Inverse Document Frequency):
TF (Term Frequency): How often the search term appears in this document. A document mentioning “sunset” 5 times is more relevant than one mentioning it once.
IDF (Inverse Document Frequency): How rare the term is across all documents. A rare term like “astrophotography” is more valuable than a common term like “photo” for distinguishing relevant documents.
Trending and Recommendations
Trending topics are pre-computed every few minutes using a sliding window of recent post activity. They are cached and served from Redis with very short TTL.
Recommendations use collaborative filtering: “users who liked X also liked Y”. At scale, this is computed offline using matrix factorization on user-item interaction data, then served as pre-computed recommendation lists.
SEARCH RESULTS (0 matches)
Type a query to search
High-Level Design
Now we assemble all the pieces into a complete architecture. Think of this as connecting every service we discussed into a coherent system.
Components
Client Layer: Mobile apps and web browsers. Handle local caching, media display, and user interaction.
CDN: Static asset delivery for images and video streams. CloudFront, Akamai, or similar. Edge locations in 50+ regions worldwide.
Load Balancer: Distribute API traffic across server instances. L7 load balancing for HTTP/HTTPS with SSL termination.
API Gateway: Rate limiting, authentication, request routing. Routes requests to the appropriate service.
Microservices:
Media Service: Handles uploads, metadata storage, and triggers processing pipeline
Object Storage (S3): Raw and processed media files
Elasticsearch: Full-text search index
Message Queue (Kafka/RabbitMQ): Async communication between services
Request Flows:
Upload: Client -> API Gateway -> Media Service -> S3 (presigned URL) -> Kafka -> Processing Workers -> S3 (processed files) -> CDN
View Feed: Client -> CDN (images) + API Gateway -> Feed Service -> Redis (cached feed) or DB (on-demand query)
Search: Client -> API Gateway -> Search Service -> Elasticsearch -> ranked results
Architecture Diagram
The system follows a layered architecture: client -> CDN/LB -> API Gateway -> Services -> Data stores. Async processing flows through message queues. All inter-service communication uses a combination of synchronous REST/gRPC calls (for user-facing requests) and asynchronous message queues (for processing and notifications).
Scaling Considerations
Sharding
Users by user_id: Consistent hashing to distribute user data across database shards. A user’s profile, posts, and social graph live on the same shard for locality.
Posts by post_id: Distribute post storage across shards. Each shard stores a range of post IDs.
Media by object key: Object storage (S3) handles this natively — keys are hashed across storage nodes automatically.
Caching Strategy
Feed cache (Redis): Pre-computed feed for each user. TTL of 5-15 minutes. Invalidated on new post from a followed user (or allowed to go stale for eventual consistency).
User profile cache (Redis): Profile data, follower counts. Updated asynchronously.
Media metadata cache (Redis): Post metadata (caption, like count, comment count). invalidated on write.
CDN cache: Media files cached at edge for hours to days. Cache invalidation on update or delete.
Async Processing
Anything that does not need to happen synchronously goes through a message queue:
Media transcoding and thumbnail generation
Push notifications (likes, comments, follows)
Analytics and metrics aggregation
Search index updates
Feed cache invalidation and re-computation
This keeps API response times fast and makes the system resilient to processing failures.
Video-Specific Challenges
Video adds an entire layer of complexity on top of photo handling:
Streaming: Video is not downloaded as a single file. It is streamed using HLS (HTTP Live Streaming) or DASH, which splits video into small chunks (2-10 seconds each). The client fetches chunks sequentially, adapting quality based on network conditions.
Adaptive bitrate: Multiple resolution encodings are stored. The client switches between them mid-stream as bandwidth fluctuates.
Storage cost: Video storage is 25x larger than photos for equivalent content. Cold storage (S3 Glacier) is used for videos older than 30 days with rarely accessed views.
Processing time: Transcoding a 10-minute 4K video can take 30+ minutes. This must be fully async with progress tracking.
Trade-offs & Follow-ups
These are the questions interviewers love to ask. Having good answers ready shows depth.
Celebrity Accounts (millions of followers)
Use the hybrid fan-out approach. Do not push to all followers. Instead, maintain a “popular account” flag. When a follower opens their feed, pull recent posts from popular accounts they follow. Combine with the pre-computed posts from normal accounts. This limits the write storm while keeping feed latency acceptable.
Stories (ephemeral content)
Stories expire after 24 hours. This means storage cleanup is critical. Use TTL-based expiration in Redis for story metadata. Store media in object storage with lifecycle policies that auto-delete after 24 hours. Pre-compute a “story ring” (carousel of stories from followed users) and cache it with a short TTL (1-2 minutes).
Content Moderation
At scale, automated moderation is essential. Use a pipeline: image/video fingerprinting (to detect known bad content), ML classification models (to detect nudity, violence, hate speech), and human review for edge cases. Reports from users feed back into the ML models. This is a queue-based async pipeline.
Copyright on YouTube
YouTube uses Content ID: a fingerprinting system where copyright holders upload reference files. When a new video is uploaded, it is fingerprinted and compared against the Content ID database. Matches trigger actions: block, monetize (share revenue), or track (allow but notify the copyright holder). This runs as part of the async processing pipeline.
Comparison Table
Concern
Instagram
YouTube
Primary media
Photos (2 MB avg)
Video (50 MB avg)
Storage growth
~73 TB/year
~263 PB/year
Bandwidth
~1 PB/day
~250 PB/day
Processing
Resize + compress
Transcode to 5+ resolutions
Delivery
Static file from CDN
Adaptive streaming (HLS/DASH)
Feed model
Pre-computed + pull hybrid
Recommendation-driven
Caching
Aggressive (photos are static)
Moderate (multiple quality variants)
Real-time needs
Stories (24h TTL)
Live streaming (ultra-low latency)
Self-Check
Before walking into an interview on this topic, make sure you can answer each of these:
Can you estimate storage and bandwidth requirements from user counts?
Why use cursor-based pagination instead of offset-based for feeds?
What is the fan-out problem, and what are the three strategies to solve it?
When would you choose push vs. pull for feed generation?
Why does the hybrid approach use a threshold (e.g., 10K followers)?
How does an inverted index make search fast?
What is TF-IDF, and why does it work better than simple keyword matching?
Why use presigned URLs for media uploads instead of proxying through API servers?
What role does a message queue play in media processing?
How does adaptive bitrate streaming work, and why is it needed for video?
What is the difference between CDN caching and application-level caching?
How would you handle a celebrity account with 100M followers differently from a normal account?