System Design Case Study

How does Google manage sessions for 2B+ users across multiple devices?

?? Design a session management system: 2B users, instant revocation, multi-device, sliding expiry
Concepts Involved

Problem Statement

How does a platform manage sessions for 2B+ users across multiple devices, supporting instant revocation (password change invalidates all sessions), sliding expiry, and device-specific session limits without checking a central store on every request?

Core challenge: 2B users · 3 devices each = 6B active sessions. Checking a central session store on every single request would require millions of DB lookups/sec. But you need instant revocation (password change = all sessions dead immediately). How do you balance stateless verification with revocation?
2B+
active users
~6B sessions
Instant
revocation
password change ? all dead
Multi-device
per-device limits
phone, laptop, tablet
No central
check per request
stateless verification

Architecture · Short-Lived JWT + Refresh Token + Revocation List

LAYER 1 · AUTH (Stateless JWT Verification · NO DB Call) Client Authorization: Bearer <JWT> Every request + device fingerprint JWT API Service 1. Verify JWT signature (local, HMAC/RSA) 2. Check exp claim (expired? ? 401) 3. Check revocation bloom filter (in-memory) ? No DB call · O(1) total Extract: user_id, roles, device_id Serve Request Authorized ? No network hop Sub-ms verification LAYER 2 · REFRESH (Token Rotation + Theft Detection) Client refresh_token (single-use) Every 15 min API Validate refresh Check not revoked Device match? Session DB (Redis) user ? [active sessions] per device tracking Detect replay (theft) Old token reused ? kill family Issue New New access (15m) New refresh (rotated) Old refresh killed LAYER 3 · REVOCATION (Password Change ? Instant Invalidation) Password Change or security event or "sign out all" Trigger: revoke all Invalidate All Delete all refresh tokens in Redis All devices affected Bloom Filter Add user_id to revocation filter Pub/sub ? all svcs <1s propagation Old JWTs Rejected by bloom check <1s effective TOKEN LIFECYCLE Access JWT 15 min TTL Stateless, verified locally No DB call per request Refresh Token 30 days TTL Rotate on each use Single-use, detect replay Revocation <1s via bloom Worst case: 15min (JWT exp) Device Binding Fingerprint in JWT claim + stored in session DB Prevents cross-device theft Critical Actions Money transfer, pwd change ALWAYS check Session DB Never rely on JWT alone No central check per request | Worst case revocation: 15min | Critical actions always check DB Bloom filter: in-memory O(1) | Pub/sub propagation <1s | Refresh rotation detects theft | Device binding prevents replay Normal: JWT verified locally (sub-ms) | Refresh: every 15min hits Redis | Revocation: bloom updated via pub/sub 2B users · 3 devices = 6B sessions | Redis cluster handles refresh | Bloom: ~10MB for 10M revoked users
ComponentMechanismPurpose
Access Token (JWT)Short-lived (15 min), signed, statelessVerified locally by any service (no DB call). Contains user_id, roles, device_id, exp.
Refresh TokenLong-lived (30 days), opaque, stored server-sideUsed to get new access token. Stored in session DB. Rotated on each use.
Session DBRedis cluster (user_id ? [sessions])Tracks active refresh tokens per user per device. Enables revocation.
Revocation ListBloom filter / short-lived blocklistOn password change: add user_id to blocklist. Services check blocklist (cached, <1ms).
Device Bindingdevice_fingerprint in token + sessionToken only valid from same device. Prevents token theft across devices.
How instant revocation works without per-request DB check: Access tokens are valid for only 15 minutes. On password change: ? Invalidate all refresh tokens in session DB ? Add user_id to revocation bloom filter (propagated to all services within seconds via pub/sub) ? Services check bloom filter (in-memory, <1ms) before accepting JWT ? Within 15 min, all old access tokens expire naturally. Worst case: 15 min window. For critical actions (transfer money): always check session DB.
Refresh token rotation: Each refresh token is single-use. On use: issue new access + new refresh token, invalidate old refresh. If old refresh is reused (stolen token replay) ? invalidate entire session family (all tokens for that device). This detects token theft.
Anti-patterns: Long-lived JWT (24h+) · can't revoke for hours. Session in cookie only · no server-side revocation. No device binding · stolen token works from any device. Checking DB on every request · doesn't scale to billions of requests.
Real-world: Google · short-lived access tokens + refresh via OAuth. Auth0 · rotating refresh tokens with theft detection. GitHub · fine-grained PATs with expiry. Netflix · device-bound sessions with concurrent device limits (4 screens).

Interview Cheat Sheet

The 7 things to say for session management design

1. Short-lived JWT (15 min) + long-lived refresh token (30 days) · balance stateless verification with revocation
2. Refresh token rotation · single-use, issue new on each refresh, detect replay (theft)
3. Revocation bloom filter · propagated via pub/sub, checked in-memory (<1ms), catches revoked users
4. Device binding · token only valid from same device fingerprint (prevents cross-device theft)
5. Session DB in Redis · user_id ? [active sessions per device], enables "sign out all devices"
6. Critical actions always check DB · money transfer, password change ? verify session server-side
7. Worst-case revocation window = 15 min · access token expiry is the upper bound for stale sessions