How does Google Ads select the best ad from 10M+ candidates in <100ms?

🎯 Design an ad serving system: 10M candidates, <100ms selection, 10B req/day, real-time budget pacing

Concepts Involved

Caching Load Balancer Stream Processing Kafka Rate Limiting

Problem Statement

How does an ad serving platform select the optimal ad from 10M+ candidates in under 100ms, ranking by predicted CTR · bid · relevance, while serving 10B requests/day and enforcing real-time budget pacing?

Core challenge: Can't score all 10M ads per request in 100ms. Need a funnel approach: fast retrieval (10M?10K) → lightweight scoring (10K?100) → heavy ML ranking (100?10) → auction. All while tracking budgets in real-time across billions of requests.

10M+

candidate ads

<100ms

end-to-end selection

10B

requests / day

Real-time

budget pacing

Architecture

Funnel architecture: Retrieval (inverted index, 10M?10K in ~10ms) → Pre-scoring (lightweight model, 10K?100 in ~20ms) → Deep ranking (neural net, 100?10 in ~50ms) → Auction (top 10?winner in ~5ms). Total <100ms end-to-end.

Ad Rank formula: pCTR · bid · quality_score. Predicted CTR from deep learning model (user features + ad features + context). Quality score penalizes low-relevance ads. Advertiser pays second-price (just enough to beat next ad).

Anti-patterns: Score all 10M ads with heavy model · impossible in 100ms. No budget pacing · budget exhausted by noon, no evening impressions. First-price auction · advertisers game bids, revenue drops. Stale CTR predictions · revenue loss from poor ranking.

Budget pacing: Each campaign has daily budget. Pacing controller adjusts participation rate throughout the day. Spend evenly · if 50% budget spent by noon with 50% day remaining, maintain rate. If overspending, throttle. Distributed counters track spend in near-real-time.

Interview Cheat Sheet

1. Funnel approach · retrieval → pre-scoring → deep ranking → auction (10M?10K?100?10?1)
2. Ad Rank · pCTR · bid · quality_score, second-price auction for pricing
3. Budget pacing · spend evenly across day, throttle/boost participation rate
4. CTR prediction · deep neural net with user/ad/context features, updated hourly
5. Latency budget · retrieval 10ms + pre-score 20ms + deep rank 50ms + auction 5ms = <100ms
6. Feedback loop · click/conversion data feeds back to retrain CTR model continuously

System Design Case Study

Problem Statement

Architecture

Interview Cheat Sheet