System Design Case Study

How does Google index 100B+ pages and return ranked results in <200ms?

?? Design a web search engine: 100B+ pages, <200ms results, authority + relevance ranking
Concepts Involved

Problem Statement

How does a web search engine index 100B+ pages and return ranked results in <200ms, combining authority scoring with relevance ranking across a tiered index serving architecture with early termination?

Core challenge: 100B+ pages. User types a query. You must find the most relevant 10 results from 100 billion candidates in under 200ms. Brute-force scan is impossible. The answer: inverted index + tiered serving + early termination.
100B+
indexed pages
petabytes of index
<200ms
query latency
P99 target
100K+
queries / second
8.5B queries/day
1000s
index shards
scatter-gather query

Architecture · Inverted Index + Tiered Serving

QUERY LAYER · Parse, Understand, Plan User Query "system design" 100K queries/sec 8.5B queries/day global traffic Query Parser Tokenize words ? normalize Spell-check correction (did you mean?) Synonym expansion (car ? automobile) Intent classification (navigational/info/transactional) Query Planner Which tier to query? (Tier 1 first, escalate if needed) Cache check: 80% hit rate for popular queries Determine shard fan-out strategy Set latency budget per stage (total <200ms) INDEX LAYER · Tiered Inverted Index + Scatter-Gather + ML Re-Rank Tiered Inverted Index Tier 1: Top 1B pages (in-memory) Serves 90% of queries | fastest path Tier 2: 10B pages (SSD) If Tier 1 insufficient Tier 3: 90B+ pages (disk) Long-tail, niche content word ? [(doc_id, position, tf-idf)] scatter Scatter-Gather (1000s of shards) Query fans out to all shards in tier Each shard returns top-K candidates Early termination: scan in decreasing PageRank order, stop when top-K stable ~0.01% of shard actually scanned Merge results from all shards 10K+ shards | 80ms scatter-gather Tail latency: drop slowest 1% of shards ML Re-Rank BERT on top 1000 candidates Semantic understanding ? Final top 10 results Scoring formula: BM25 · PageRank · Freshness · CTR ? BERT re-rank ~50ms ML inference RESULT LAYER · Ranked Results + Enhancements Ranked Results <200ms total latency Top 10 organic results Snippets + metadata 100K queries/sec served Query Cache Popular queries cached 80% hit rate 60s TTL for freshness Bypass index entirely on hit Ads Integration Sponsored results Auction-based placement Parallel to organic search Doesn't add to latency Knowledge Graph Instant answers Entity cards, facts Zero-click results Structured data extraction RANKING & EARLY TERMINATION Score = BM25(relevance) · PageRank(authority) · Freshness · UserSignals(CTR) ? BERT re-rank top 1000 ? final top 10 Early termination: scan in decreasing PageRank order, stop when top-K stable (~0.01% of shard scanned) Latency: Parse(5ms) ? Cache check(2ms) ? Index lookup(30ms) ? Scatter-gather(80ms) ? ML re-rank(50ms) ? Network(30ms) = <200ms 100B+ pages | <200ms | 100K queries/sec | 10K+ shards
LayerRoleHow
CrawlDiscover and fetch pagesPriority queue by PageRank + change frequency. Politeness (robots.txt). 1000s of crawlers.
Index BuildCreate inverted indexword ? [(doc_id, position, tf-idf score), ...]. MapReduce pipeline. Partitioned by doc_id range.
Tier 1 (Hot)Top 1B most important pagesIn-memory index on SSDs. Serves 90% of queries. Fast path.
Tier 2 (Warm)Next 10B pagesSSD-based index. Queried only if Tier 1 has insufficient results.
Tier 3 (Cold)Long-tail 90B+ pagesDisk-based. Rarely queried. Niche/old content.
RankingScore and order resultsBM25 (text relevance) · PageRank (authority) · freshness · user signals. ML re-ranking (BERT).
ServingScatter-gather across shardsQuery ? fan-out to 1000s of shards ? each returns top-K ? merge ? re-rank ? return top 10.
Early termination: Each shard scores documents in decreasing static rank order (PageRank). Once top-K candidates are "good enough" (score gap threshold), stop scanning. This means most queries only scan ~0.01% of the shard · not all documents containing the query terms.
Ranking signals: BM25 (term frequency, document length normalization). PageRank (link authority). Freshness (recent content boosted for news queries). Click-through rate (user engagement). BERT/LLM (semantic understanding for re-ranking top candidates).
Challenges: Index freshness · breaking news must be searchable in minutes (real-time indexing pipeline). Spam · adversarial SEO (ML classifiers). Tail queries · rare queries with few matching docs (need deep index). Personalization · same query, different intent per user.
Real-world: Google · Caffeine (real-time indexing), Hummingbird (semantic), BERT (understanding). Bing · similar tiered architecture. Elasticsearch · inverted index for enterprise search. Algolia · typo-tolerant, instant search (<50ms).

Scale Estimation

StepDerivationResultDesign Impact
1Queries: 8.5B/day · 86400~100K queries/secMassive parallelism needed · scatter-gather across shards
2Index size: 100B pages · 10KB avg compressed~1 PB indexDistributed across 10K+ machines, tiered by importance
3Shards: 1PB · 100GB per shard~10K shardsEach query fans out to all shards in its tier
4Latency budget: 200ms total~50ms per shard + 50ms merge + 100ms networkEarly termination critical · can't scan full posting list
5Crawl rate: 10B pages · 30 day refresh~3,800 pages/sec crawledContinuous crawl + real-time pipeline for breaking news

Resilience & Edge Cases

FailureImpactRecovery
Shard unavailablePartial results (some docs missing)Serve degraded results from available shards. Replicas take over. User rarely notices.
Spike query (trending topic)Same query 1M times/secQuery result cache (popular queries cached for 60s). Serve from cache without hitting index.
Index corruptionWrong results for affected shardChecksums on index segments. Rebuild from source documents. Serve from replica during rebuild.
Spam/SEO manipulationLow-quality results ranked highML spam classifiers, link graph analysis, manual penalties. Continuous quality evaluation.

Interview Cheat Sheet

The 7 things to say for search system design

1. Inverted index · word ? list of (doc_id, position, score). Core data structure.
2. Tiered serving · Tier 1 (top 1B pages, in-memory) serves 90% of queries. Tier 2/3 for long-tail.
3. Scatter-gather · query fans out to 1000s of shards, each returns top-K, merge + re-rank
4. Early termination · scan docs in decreasing PageRank order, stop when top-K is "good enough"
5. BM25 + PageRank + freshness + ML re-ranking · multi-signal scoring pipeline
6. Real-time indexing for freshness · breaking news searchable in minutes (separate pipeline)
7. Caching at query level · popular queries cached (80% of queries are repeats)