How does Google manage sessions for 2B+ users across devices?

🎯 Design a session management system: 2B users, instant revocation, multi-device, sliding expiry

Concepts Involved

Authentication Encryption Redis Consistency Multi-Region

Problem Statement

How does a platform manage sessions for 2B+ users across multiple devices, supporting instant revocation (password change invalidates all sessions), sliding expiry, and device-specific session limits without checking a central store on every request?

Core challenge: 2B users · 3 devices each = 6B active sessions. Checking a central session store on every single request would require millions of DB lookups/sec. But you need instant revocation (password change = all sessions dead immediately). How do you balance stateless verification with revocation?

2B+

active users

~6B sessions

Instant

revocation

password change → all dead

Multi-device

per-device limits

phone, laptop, tablet

No central

check per request

stateless verification

Architecture · Short-Lived JWT + Refresh Token + Revocation List

Component	Mechanism	Purpose
Access Token (JWT)	Short-lived (15 min), signed, stateless	Verified locally by any service (no DB call). Contains user_id, roles, device_id, exp.
Refresh Token	Long-lived (30 days), opaque, stored server-side	Used to get new access token. Stored in session DB. Rotated on each use.
Session DB	Redis cluster (user_id → [sessions])	Tracks active refresh tokens per user per device. Enables revocation.
Revocation List	Bloom filter / short-lived blocklist	On password change: add user_id to blocklist. Services check blocklist (cached, <1ms).
Device Binding	device_fingerprint in token + session	Token only valid from same device. Prevents token theft across devices.

How instant revocation works without per-request DB check: Access tokens are valid for only 15 minutes. On password change: → Invalidate all refresh tokens in session DB → Add user_id to revocation bloom filter (propagated to all services within seconds via pub/sub) → Services check bloom filter (in-memory, <1ms) before accepting JWT → Within 15 min, all old access tokens expire naturally. Worst case: 15 min window. For critical actions (transfer money): always check session DB.

Refresh token rotation: Each refresh token is single-use. On use: issue new access + new refresh token, invalidate old refresh. If old refresh is reused (stolen token replay) → invalidate entire session family (all tokens for that device). This detects token theft.

Anti-patterns: Long-lived JWT (24h+) · can't revoke for hours. Session in cookie only · no server-side revocation. No device binding · stolen token works from any device. Checking DB on every request · doesn't scale to billions of requests.

Real-world: Google · short-lived access tokens + refresh via OAuth. Auth0 · rotating refresh tokens with theft detection. GitHub · fine-grained PATs with expiry. Netflix · device-bound sessions with concurrent device limits (4 screens).

Interview Cheat Sheet

The 7 things to say for session management design

1. Short-lived JWT (15 min) + long-lived refresh token (30 days) · balance stateless verification with revocation
2. Refresh token rotation · single-use, issue new on each refresh, detect replay (theft)
3. Revocation bloom filter · propagated via pub/sub, checked in-memory (<1ms), catches revoked users
4. Device binding · token only valid from same device fingerprint (prevents cross-device theft)
5. Session DB in Redis · user_id → [active sessions per device], enables "sign out all devices"
6. Critical actions always check DB · money transfer, password change → verify session server-side
7. Worst-case revocation window = 15 min · access token expiry is the upper bound for stale sessions

System Design Case Study

Problem Statement

Architecture · Short-Lived JWT + Refresh Token + Revocation List

Interview Cheat Sheet