How does an edge network rate-limit 45M+ requests/sec without adding latency, distributing rate counters across hundreds of PoPs while using fail-open policies to avoid blocking legitimate traffic during sync delays?
Core challenge: Rate limiting at the edge means no central counter · each of 300+ PoPs must make local decisions. But a distributed attacker can spread requests across PoPs to bypass per-PoP limits. How do you coordinate without adding latency?
45M+
requests / second
across all PoPs
300+
edge PoPs
globally distributed
<1ms
added latency
rate check overhead
Fail-Open
policy on sync failure
never block legit traffic
Architecture · Sliding Window at the Edge
Local counters + async gossip for global coordination
Decision
Choice
Why
Algorithm
Sliding window log (per-PoP) + approximate global sync
Precise locally, eventually consistent globally
Storage
In-memory counters per worker (no DB)
Zero I/O latency · counter lives in L1 cache
Coordination
Async gossip every 1-5s between PoPs
Share counts without blocking request path
Fail mode
Fail-open (allow) when sync is stale
Availability > precision · never block legit users
Key space
IP + API key + path (configurable)
Flexible: per-IP, per-key, per-endpoint rules
Response
429 + Retry-After + RateLimit-* headers
Standard, client can back off intelligently
Sliding window: Each PoP maintains a sliding window counter per (key, rule). Window = weighted sum of current + previous bucket. Example: 100 req/min limit, current bucket (30s) has 40 req, previous bucket had 80 req ? weighted = 40 + 80·0.5 = 80 ? allow (under 100).
Global coordination: Every 1-5s, PoPs gossip their local counts to a coordination layer. Global rate = sum of all PoP counts. If global exceeds threshold, all PoPs tighten local limits. Tradeoff: 1-5s window where distributed attacks can slightly exceed limits.
Anti-patterns:Central Redis counter · adds 10-50ms RTT per request (unacceptable at edge). Fail-closed on sync failure · blocks all traffic during network issues. Fixed window · burst at window boundary (2· limit in 1s).
Real-world:Cloudflare · sliding window at 300+ PoPs. Stripe · token bucket per API key (100 req/sec). GitHub · 5000 req/hour sliding window. AWS API Gateway · token bucket with burst capacity.
Resilience & Edge Cases
Failure
Impact
Recovery
Gossip network down
PoPs can't share counts · distributed attacker bypasses global limit
Fail-open: allow traffic. Tighten local limits as fallback (per-PoP limit = global/N).
PoP overloaded
Counter updates delayed
Shed load at L4 before rate limiter. Pre-filter known-bad IPs via blocklist.
Clock skew between PoPs
Window boundaries misaligned
Use NTP sync. Sliding window is tolerant of small skew (weighted average smooths it).
Hot key (one API key = 90% traffic)
Counter for that key dominates memory
Separate hot-key path with dedicated counter. Alert on anomalous single-key volume.
1.Sliding window · weighted current + previous bucket (no boundary burst) 2.In-memory counters · zero I/O, counter in L1 cache, <1ms overhead 3.Async gossip between PoPs · share counts every 1-5s without blocking requests 4.Fail-open policy · if sync is stale, allow traffic (availability > precision) 5.429 + Retry-After + RateLimit headers · standard response, client backs off 6.No central Redis · adds 10-50ms RTT (unacceptable at edge). Local decision + eventual global sync.