How does a social platform deliver posts to 400M+ users' timelines in real-time, handling the asymmetry between users with few followers and celebrities with millions, while keeping timeline delivery under 5 seconds?
Core challenge: A celebrity with 100M followers posts a tweet. You cannot write to 100M timelines simultaneously · that's a fan-out explosion. But followers expect to see it within seconds. How do you balance write amplification vs read latency?
400M+
active users
timelines to serve
100M
max followers
celebrity problem
<5s
delivery target
post ? visible in feed
500K+
tweets / sec peak
during major events
Functional Requirements
Must Have
1. User posts a tweet ? appears in all followers' timelines 2. Timeline is ranked (not purely chronological) · relevance + recency 3.Real-time updates · new tweets appear without manual refresh 4. Handle celebrity asymmetry · users with 100M followers 5. Support infinite scroll with pagination 6.Engagement signals update in real-time (likes, retweets, replies count)
Out of Scope
? Tweet composition and media upload ? Search and trending topics ? Direct messages ? Notifications ? Content moderation
Functional Requirements
Must Have
1. User posts a tweet ? appears in all followers' timelines 2. Timeline is ranked (not purely chronological) · relevance + recency 3.Real-time updates · new tweets appear without manual refresh 4. Handle celebrity asymmetry · users with 100M followers 5. Support infinite scroll with pagination 6.Engagement signals update in real-time (likes, retweets, replies count)
Out of Scope
? Tweet composition and media upload ? Search and trending topics ? Direct messages ? Notifications ? Content moderation
Non-Functional Requirements
Property
Target
Design Impact
Latency
<5s tweet ? visible in follower timeline
Fan-out workers must process within seconds, not minutes
Read Latency
<100ms timeline load
Pre-computed timeline in Redis (no DB query on read)
Throughput
500K tweets/sec peak (major events)
Kafka + parallel fan-out workers, auto-scale on lag
Fan-out is async. Acceptable: not a banking system.
Scalability
400M users, linear scale with user count
Shard timeline cache by user_id. Add Redis nodes as users grow.
High-Level Architecture
Hybrid fan-out: write for normal users, read for celebrities
Why hybrid? Pure fan-out-on-write: a celebrity tweet triggers 100M Redis writes = minutes of lag. Pure fan-out-on-read: every timeline load queries all followed users = slow reads. Hybrid: 99% of tweets fan-out on write (fast reads), celebrities merge on read (no write explosion).
Timeline cache: Redis sorted set per user. Score = timestamp. ZREVRANGE for latest N tweets. Trim to 800 entries (older tweets fetched from DB). Cache hit rate: ~99% for active users.
Failure modes:Fan-out worker lag ? tweets delayed for some followers (acceptable: eventual). Redis node failure ? rebuild from DB (cold start ~30s). Celebrity tweet during event ? read path handles gracefully (no write storm).
Real-world:Twitter uses this exact hybrid approach. Threshold is ~10K followers. Instagram uses similar fan-out-on-write for feed. Facebook uses primarily fan-out-on-read with aggressive caching (TAO).
Scale Estimation
Back-of-envelope math for timeline delivery
Given:400M active users · 500K tweets/sec peak · avg 200 followers · top 1% have >10K followers
Neo4j (doesn't scale), MySQL (slow for graph traversal)
Ranking
ML model (TensorFlow Serving)
Personalized relevance scoring per user
Chronological only (lower engagement)
Real-time Push
WebSocket / Server-Sent Events
New tweets appear without refresh
Polling (wasteful, laggy)
Interview Cheat Sheet
The 8 things to say for timeline/feed design
1.Hybrid fan-out · write for normal users (<10K followers), read for celebrities (>10K) 2.Redis sorted set per user · score=timestamp, ZREVRANGE for latest, trim to 800 3.Fan-out workers consume from Kafka · tweet event ? lookup followers ? ZADD to each timeline 4.Celebrity merge at read time · query "my celebrity follows" ? fetch their recent tweets ? merge + rank 5.ML ranking · not purely chronological. Score by relevance, engagement prediction, recency 6.Cache hit rate ~99% · active users' timelines always warm. Cold start on first login after long absence. 7.Social graph sharded by user_id · follower lookup is the hot path during fan-out 8.Engagement counters eventually consistent · likes/RTs updated async, not blocking delivery