How does Google return search results in <200ms from 100B+ pages?

🎯 Design a web search engine: 100B+ pages, <200ms results, authority + relevance ranking

Concepts Involved

Elasticsearch Sharding Indexing Caching Load Balancer

Problem Statement

How does a web search engine index 100B+ pages and return ranked results in <200ms, combining authority scoring with relevance ranking across a tiered index serving architecture with early termination?

Core challenge: 100B+ pages. User types a query. You must find the most relevant 10 results from 100 billion candidates in under 200ms. Brute-force scan is impossible. The answer: inverted index + tiered serving + early termination.

100B+

indexed pages

petabytes of index

<200ms

query latency

P99 target

100K+

queries / second

8.5B queries/day

1000s

index shards

scatter-gather query

Architecture · Inverted Index + Tiered Serving

Layer	Role	How
Crawl	Discover and fetch pages	Priority queue by PageRank + change frequency. Politeness (robots.txt). 1000s of crawlers.
Index Build	Create inverted index	word → [(doc_id, position, tf-idf score), ...]. MapReduce pipeline. Partitioned by doc_id range.
Tier 1 (Hot)	Top 1B most important pages	In-memory index on SSDs. Serves 90% of queries. Fast path.
Tier 2 (Warm)	Next 10B pages	SSD-based index. Queried only if Tier 1 has insufficient results.
Tier 3 (Cold)	Long-tail 90B+ pages	Disk-based. Rarely queried. Niche/old content.
Ranking	Score and order results	BM25 (text relevance) · PageRank (authority) · freshness · user signals. ML re-ranking (BERT).
Serving	Scatter-gather across shards	Query → fan-out to 1000s of shards → each returns top-K → merge → re-rank → return top 10.

Early termination: Each shard scores documents in decreasing static rank order (PageRank). Once top-K candidates are "good enough" (score gap threshold), stop scanning. This means most queries only scan ~0.01% of the shard · not all documents containing the query terms.

Ranking signals: BM25 (term frequency, document length normalization). PageRank (link authority). Freshness (recent content boosted for news queries). Click-through rate (user engagement). BERT/LLM (semantic understanding for re-ranking top candidates).

Challenges: Index freshness · breaking news must be searchable in minutes (real-time indexing pipeline). Spam · adversarial SEO (ML classifiers). Tail queries · rare queries with few matching docs (need deep index). Personalization · same query, different intent per user.

Real-world: Google · Caffeine (real-time indexing), Hummingbird (semantic), BERT (understanding). Bing · similar tiered architecture. Elasticsearch · inverted index for enterprise search. Algolia · typo-tolerant, instant search (<50ms).

Scale Estimation

Step	Derivation	Result	Design Impact
1	Queries: 8.5B/day · 86400	~100K queries/sec	Massive parallelism needed · scatter-gather across shards
2	Index size: 100B pages · 10KB avg compressed	~1 PB index	Distributed across 10K+ machines, tiered by importance
3	Shards: 1PB · 100GB per shard	~10K shards	Each query fans out to all shards in its tier
4	Latency budget: 200ms total	~50ms per shard + 50ms merge + 100ms network	Early termination critical · can't scan full posting list
5	Crawl rate: 10B pages · 30 day refresh	~3,800 pages/sec crawled	Continuous crawl + real-time pipeline for breaking news

Resilience & Edge Cases

Failure	Impact	Recovery
Shard unavailable	Partial results (some docs missing)	Serve degraded results from available shards. Replicas take over. User rarely notices.
Spike query (trending topic)	Same query 1M times/sec	Query result cache (popular queries cached for 60s). Serve from cache without hitting index.
Index corruption	Wrong results for affected shard	Checksums on index segments. Rebuild from source documents. Serve from replica during rebuild.
Spam/SEO manipulation	Low-quality results ranked high	ML spam classifiers, link graph analysis, manual penalties. Continuous quality evaluation.

Interview Cheat Sheet

The 7 things to say for search system design

1. Inverted index · word → list of (doc_id, position, score). Core data structure.
2. Tiered serving · Tier 1 (top 1B pages, in-memory) serves 90% of queries. Tier 2/3 for long-tail.
3. Scatter-gather · query fans out to 1000s of shards, each returns top-K, merge + re-rank
4. Early termination · scan docs in decreasing PageRank order, stop when top-K is "good enough"
5. BM25 + PageRank + freshness + ML re-ranking · multi-signal scoring pipeline
6. Real-time indexing for freshness · breaking news searchable in minutes (separate pipeline)
7. Caching at query level · popular queries cached (80% of queries are repeats)

System Design Case Study

Problem Statement

Architecture · Inverted Index + Tiered Serving

Scale Estimation

Resilience & Edge Cases

Interview Cheat Sheet