Store frequently accessed data in a fast layer to improve
latency (serve from cache โ faster),
throughput (reduce backend load), and
cost efficiency (fewer DB calls),
with trade-offs in
consistency (stale data risk),
invalidation (hard to update/expire), and
memory usage (extra storage).
Cache-Aside (Lazy)
Pro App controls logic, only requested data cached Con Miss = 3 round trips, stale if DB updated directly
Read-Through
Pro Simple app code โ cache auto-fetches from DB Con Cache library dependency, needs DB plugin
Write-Around
Pro DB is source of truth, no stale writes in cache Con Cache may be stale until next read miss
Write-Back (Write-Behind)
Pro Lowest write latency, cache+DB eventually consistent ConData loss risk if cache crashes before DB sync
Write-Through
2sync immediatelyโ Cache + DB always in syncHigher write latency (waits for DB)
Pro Reads always fresh, cache+DB in sync Con Write latency doubles, infrequent data cached too
Strategy
Read Path
Write Path
Consistency
Best For
Cache-Aside
App โ Cache โ DB on miss
App โ DB (cache invalidated)
Eventual
General purpose (most common)
Read-Through
App โ Cache (auto-fetches DB)
โ
Eventual
Simpler app code
Write-Around
App โ Cache โ DB on miss
App โ DB directly
Eventual
Write-heavy, read-rarely data
Write-Back
App โ Cache
App โ Cache โ async DB
Eventual
High write throughput
Write-Through
App โ Cache
App โ Cache โ sync DB
Strong
Read-heavy, consistency needed
Invalidation:TTL (simple, stale until expiry) ยท Event-driven (CDC/app triggers delete, near real-time) ยท Version key (new version = auto miss). Eviction: LRU (most common) ยท LFU ยท FIFO.
Thundering Herd: Cache expires โ thousands hit DB simultaneously. Fix: mutex on cache miss, probabilistic early expiration, stale-while-revalidate.
Real-world:Facebook uses Memcached (TAO). Twitter caches timelines in Redis. Target: cache hit rate >95%.
Redis โ Data Structures
In-memory data store โ sub-ms latency, 100Kโ1M ops/sec. Cache + data structures + messaging
Pipelining โ batch multiple commands in one TCP round trip โ up to 10x throughput gain.
No Query Planner โ commands are direct operations โ no SQL parsing, no optimizer.
RAM ~100ns ยท SSD ~100ยตs ยท HDD ~10ms
Single GET/SET โ 100Kโ1M ops/sec
With Pipelining โ up to 10x gain
Bottleneck = NETWORK, not CPU โ single thread is enough
Redis 6.0+:I/O threads for network read/write โ still single-threaded for command execution. Redis 7.0: functions, multi-part AOF, sharded pub/sub.
Watch out:RAM-bound ยท avoid KEYS * / SMEMBERS on huge sets (use SCAN) ยท big keys block the event loop ยท use Redis Cluster to shard beyond single-node limits.
Redis as Cache
App checks Redis first โ cache hit returns instantly, cache miss fetches from DB and populates cache
Pattern: App checks Redis first โ cache hit returns instantly ยท cache miss โ fetch from DB โ write to Redis with TTL โ serve.
Strategies:Cache-aside (most common โ app manages cache). Write-through (write to cache + DB together). Write-back (write to cache, async flush to DB). Read-through (cache fetches from DB on miss).
Eviction Policies:allkeys-lru (evict least recently used โ best for cache). allkeys-lfu (evict least frequently used). volatile-lru (only evict keys with TTL). noeviction (return error when full โ for data store use).
Cache problems:Thundering herd โ many requests hit DB on same cache miss โ use SETNX lock or probabilistic early expiry. Cache penetration โ queries for non-existent keys always miss โ cache null values with short TTL. Cache avalanche โ many keys expire simultaneously โ add random jitter to TTLs.
Anti-patterns:No TTL โ stale data forever. Cache everything โ wastes RAM on cold data. No eviction policy โ OOM crash. Inconsistent invalidation โ cache and DB disagree.
Redis Pub/Sub & Streams
Real-time messaging built into Redis โ from fire-and-forget broadcast to durable event logs
โธ Pub/Sub โ Real-Time Broadcast
PUBLISH chat:room1 "Hello!" โ sends to all current subscribers
SUBSCRIBE chat:room1 โ receives "Hello!" instantly
PSUBSCRIBE chat:* โ pattern match โ all chat channels
Subscriber joins later โ NO history โ messages already gone
Guarantees:Real-time delivery (<1ms). Fan-out to all subscribers. Pattern matching.
Limitations:No persistence. No replay. No acknowledgment. No consumer groups. Fire-and-forget only.
Use cases:Figma โ real-time collaboration signals. Slack โ online presence indicators. Cache invalidation across app servers. Chat typing indicators.
Guarantees:Persistence (survives restart). Replay (XRANGE from any point). Consumer groups (competing consumers). At-least-once delivery. Blocking reads (XREADGROUP BLOCK).
Limitations:Single node throughput โ <100K events/sec (vs Kafka millions). RAM-bound. No cross-cluster replication. Best as lightweight Kafka when you already have Redis.
โธ Cache vs Pub/Sub vs Streams โ When to Use What
Feature
Cache
Pub/Sub
Streams
Purpose
Read acceleration
Real-time broadcast
Durable event log
Persistence
TTL-based
None
Yes (AOF/RDB)
Replay
โ
โ
โ XRANGE
Fan-out
โ
โ All subscribers
โ Consumer groups
Acknowledgment
โ
โ Fire-and-forget
โ XACK
Best For
DB offload, sessions
Presence, signals, invalidation
Order pipelines, audit, IoT
Redis Persistence & High Availability
From single-node to sharded cluster โ persistence, replication, and failover
โธ Persistence: RDB vs AOF
RDB (Snapshot)
AOF (Append-Only File)
โธ Redis Deployment Modes
Mode
Architecture
Sharding
HA
Use Case
Single Node
One instance
No
No (SPOF)
Dev, small cache, non-critical
Sentinel
Master + replicas + sentinel monitors
No
Yes (auto-failover)
HA cache, sessions, moderate load
Cluster
N masters (16,384 hash slots) + replicas
Yes
Yes
Large datasets, high throughput, horizontal scale
Managed
ElastiCache / MemoryDB / Upstash
Yes
Yes
Production โ no ops overhead
Cluster details:16,384 hash slots distributed across masters. Key โ CRC16(key) % 16384 โ slot โ node. Each master has 1+ replicas. Gossip protocol for node discovery. MOVED/ASK redirects for client routing. Multi-key ops only within same slot (use hash tags: {user:123}.profile).
Redlock (distributed lock): Acquire lock on majority (N/2+1) of independent Redis nodes. Set TTL to prevent deadlock. Validate lock still held before critical section. Controversial โ Martin Kleppmann argues it's unsafe (clock drift). Alternative: use etcd/ZooKeeper for strong locks.
Limitations:RAM-bound โ all data must fit in memory. Single-threaded core โ one slow command blocks everything. Not a primary DB โ use as cache/accelerator. Async replication โ data loss possible on failover (use WAIT for sync).
Serve content from edge PoPs globally to improve
latency (closer to users),
throughput (offload origin), and
availability (distributed delivery),
with trade-offs in
consistency (cache freshness) and
invalidation (hard to purge).
Pull (lazy) vs Push (proactive).
Guarantees:Low latency (<50ms from edge). DDoS absorption at edge. Origin offload. Edge computing (Cloudflare Workers) runs logic at edge.
Limitations:Dynamic/personalized content harder to cache. Cache invalidation complexity. Cost at high invalidation frequency.
โธ CDN Architecture โ Edge PoPs Worldwide
โธ Pull CDN vs Push CDN
Pull CDN (Lazy)
Cache on first request. Cache-Control: max-age=3600
Flow: User โ Edge (MISS) โ Origin โ Edge caches โ User Next: User โ Edge (HIT, <10ms) โ
Pro No upfront cost, auto-populates on demand Con First request slow (cache miss), cold start Use: General web assets, images, API responses
Push CDN (Proactive)
Pre-populate all PoPs before users request
Flow: Origin โ Push to all PoPs on publish User: User โ Edge (always HIT, <5ms) โ
Pro Zero cold starts, predictable latency Con Storage cost, must know what to push Use: Video segments, firmware, known-hot assets
โธ Scaling with CDN โ From 1K to 1B+ Requests/Day
Scaling Principles:Shield layer โ intermediate cache between edge and origin that collapses duplicate misses (100 PoPs miss โ 1 request to origin). Tiered TTLs โ edge 60s, shield 5min, origin 1h. Request coalescing โ 1000 users request same uncached asset โ only 1 goes to origin. Stale-while-revalidate โ serve stale, refresh async.
Pitfalls at Scale:Thundering herd โ hot key expires, all PoPs hit origin. Fix: jittered TTL + coalescing. Cache stampede โ popular item invalidated during spike. Fix: lock + stale-while-revalidate. Purge storms โ mass invalidation overloads origin. Fix: soft purge (serve stale, refresh async).
Interview tip: Always mention cache hit ratio as the key CDN metric. A 1% improvement from 95% โ 96% = 20% fewer origin requests. At Netflix scale (100B+ req/day), that's billions of saved origin calls.
Real-world:Netflix Open Connect โ custom CDN in ISPs, serves 95%+ of traffic from ISP-local boxes. Cloudflare โ 300+ PoPs, serves 20%+ of web traffic. CloudFront โ 400+ PoPs, Lambda@Edge for compute.
Advanced Caching:Cache Warming โ pre-populate cache before traffic spike (product launch, Black Friday). Multi-Level โ L1 (in-process, Caffeine) โ L2 (Redis) โ L3 (CDN). Each level faster but smaller. CDC Invalidation โ DB change โ CDC event โ invalidate specific cache key in real-time (no stale TTL wait). Stale-While-Revalidate โ serve stale, refresh in background.
Content Delivery & Edge:CDN caches static assets at edge PoPs (Cloudflare, CloudFront). Edge Computing โ run logic at edge (Cloudflare Workers, Lambda@Edge, Vercel Edge Functions). Use for: A/B testing, geo-routing, auth token validation, personalization. Reduces origin load + latency. Limitation: limited runtime, no persistent state at edge.