8. Messaging — System Design Concepts

📨 8 · MESSAGING

Message Queues (RabbitMQ / SQS)

Messages wait in a queue for consumers to process — task distribution & load balancing with retry + DLQ

Guarantees: At-least-once delivery (consumer must be idempotent). Ordering per queue (FIFO). Dead Letter Queue — after N retries, message moved to DLQ. Visibility timeout — message invisible to others while being processed.

Limitations: No message replay. Limited throughput for high-volume streaming. Fan-out harder than Kafka. No long-term retention.

RabbitMQ exchanges: Direct (exact key) · Fanout (broadcast) · Topic (pattern: order.*.created) · Headers. SQS: Standard (at-least-once, best-effort order) vs FIFO (exactly-once, strict order, 3K msg/sec).

Message Streams - Apache Kafka

Messages continuously appended to a durable ordered log — 1M+ events/sec, replayable history, multiple independent readers.

▸ How Kafka Works

▸ Top Kafka Use Cases

Event Streaming

Log Aggregation

Message Queuing

Web Activity Tracking

Data Replication (CDC)

Concept	Detail	Guarantee
Partition	Unit of parallelism. Append-only log.	Ordered within partition, not across.
Offset	Sequential ID per message.	Consumer can rewind/replay from any offset.
ISR	In-Sync Replicas caught up with leader.	`acks=all` + `min.insync.replicas=2` = no data loss.
Consumer Group	Each partition → exactly 1 consumer per group.	Multiple groups = independent reads.
Compacted Topic	Keep latest value per key.	State snapshots, changelogs.
KRaft	Replaces ZooKeeper (Kafka 3.3+).	Simpler operations.

Delivery Guarantees: At-most-once (acks=0, lossy). At-least-once (acks=all, may duplicate). Exactly-once (idempotent producer + transactions).

Limitations: Operational complexity. Ordering per partition only. No per-message acking. Overkill for low volume.

Real-world: LinkedIn 4T+ events/day. Uber — trip events, surge pricing. Netflix — recommendations pipeline.

▸ Real-World: Uber Ride Events in Kafka — Complete Flow

▸ Kafka: Producers, Consumers, Brokers

Producers

Acks: 0=fire-forget · 1=leader ACK · all=all ISR ACK (safest)
Retries: On failure, retry with backoff. Idempotent producer (enable.idempotence=true) deduplicates retries via sequence numbers.
Batching: linger.ms (wait to fill batch) + batch.size (max bytes). Bigger batch = higher throughput, slight latency.
Compression: snappy (fast), lz4 (balanced), zstd (best ratio). Compress at producer, decompress at consumer.
Partitioner: Default = hash(key) % partitions. Sticky partitioner for null keys (batch to one partition, then rotate).

Consumers

Consumer Group: Each partition → exactly 1 consumer in group. Add consumers = parallel. Max consumers = partitions.
Offsets: Track position per partition. Committed to __consumer_offsets topic. auto.offset.reset: earliest (replay all) / latest (new only).
Delivery: At-most-once (commit before process) · At-least-once (process then commit, may re-read) · Exactly-once (transactions).
Rebalance: Consumer joins/leaves → partitions reassigned. Cooperative/incremental rebalance (Kafka 2.4+) avoids stop-the-world.
Poll: max.poll.records, max.poll.interval.ms. Too slow = kicked from group.

Brokers & Topics

Broker: Single Kafka server. Cluster = N brokers. Each broker stores subset of partitions.
Replication: RF=3 → 3 copies. Leader handles reads/writes. Followers in ISR (In-Sync Replicas) replicate. min.insync.replicas=2 + acks=all = no data loss.
Segments: Partition = ordered log split into segment files (1GB default). Each segment has .log + .index + .timeindex.
Retention: Time-based (retention.ms=7 days) or size-based (retention.bytes=1TB). Log compaction = keep latest value per key (state snapshots).
KRaft: Replaces ZooKeeper (Kafka 3.3+). Controller quorum via Raft. Simpler ops, faster failover.

Config	Setting	Effect
acks=all + min.insync=2	Producer waits for 2+ replicas	Zero data loss (even if 1 broker dies)
enable.idempotence=true	Sequence numbers per producer	No duplicates on retry
Transactions	Atomic multi-partition writes	Exactly-once end-to-end
Unclean leader election=false	Only ISR members can become leader	No data loss (may reduce availability)
Log compaction	Keep latest value per key	State snapshots, changelogs (KTable)

Pub/Sub (SNS / Google Pub/Sub)

Publishers broadcast (fan-out) to a topic — every subscriber receives its own independent copy

Guarantees: At-least-once delivery to every subscriber. Independent processing. Fully managed. Google Pub/Sub: seek to timestamp, ordering keys, exactly-once delivery. SNS: push to SQS/Lambda/HTTP/email/SMS.

Limitations: No long retention by default. No consumer-side replay in SNS. Ordering not guaranteed by default. Not suited for point-to-point work queues.

Real-world: Spotify — Google Pub/Sub for event-driven microservices. Shopify — SNS+SQS for order event fan-out. Uber — Google Pub/Sub for cross-service event propagation.

Message Queues vs Event Streams vs Pub/Sub

Three fundamentally different messaging patterns

Message Queue (Point-to-Point)

Event Stream (Log-based)

Pub/Sub (Fan-out)

Feature	Message Queue	Event Stream	Pub/Sub
Definition	Messages wait in a queue for a consumer to pick up & process	Messages appended to a durable ordered log — replayable history	Publisher broadcasts to a topic; every subscriber gets its own copy
Core Idea	Queue of tasks / jobs	Durable ordered event log	Broadcast events to subscribers
Concrete Question	"Which worker should process this image upload / email / payment retry?"	"What exactly happened in this system / user journey over time?"	"Which systems should know that an order was placed?"
Why this fits best	Best when exactly one worker should process a task / job	Best when events must be retained as an ordered replayable history	Best when one event must be broadcast (fan-out) to many independent subscribers
Why others don't fit	Pub/Sub broadcasts to many consumers causing duplicate processing; Streams retain/replay history — unnecessary for one-time task execution	Queue removes messages after processing; Pub/Sub focuses on live delivery rather than durable replayable event history	Queue is designed for task distribution where only one consumer gets the message; Streams add retention/replay/ordering overhead when only real-time notification is needed
Model	Competing consumers — task distribution & load balancing	Independent readers — each group reads the full log at its own pace	Fan-out — one publish, N independent deliveries
Consumption	One consumer per message	Multiple groups, each reads all	All subscribers get copy
After read	Deleted	Retained	Gone (SNS) or short retention
Delivery	At-least-once (consumer must be idempotent)	At-least-once · exactly-once (idempotent producer + transactions)	At-least-once to every subscriber
Retention	No long-term retention — gone after ACK	Configurable (days / size / compaction)	Short (SNS: none · Pub/Sub: 7 days)
Replay	✗	✓ (rewind to any offset)	Limited
Ordering	✓ per queue (FIFO)	✓ per partition	✗ by default
Throughput	~10K-100K msg/sec	~1M+ msg/sec	~100K-1M msg/sec
Best for	Task distribution, job queues	Event sourcing, CDC, analytics	Notifications, fan-out, triggers
Example	RabbitMQ, SQS	Kafka, Kinesis, Redis Streams	SNS, Google Pub/Sub

Key Insight: The fundamental difference is the consumption model. Queues are competing consumers (work distribution). Streams are independent consumers (each reads everything). Pub/Sub is broadcast (everyone gets a copy). Many real systems combine them — e.g., SNS → SQS → Lambda.

Dead Letter Queue (DLQ)

Quarantine poison messages so the main queue keeps flowing — critical for fault isolation & observability

▸ DLQ Lifecycle — Retry, Quarantine & Recovery

Concept	Detail	Best Practice
maxReceiveCount	Number of delivery attempts before moving to DLQ	3–5 retries with exponential back-off
Visibility Timeout	Time message is hidden from other consumers during processing	Set to 6× avg processing time
Redrive Policy	Config linking source queue to its DLQ	Always configure — never lose messages silently
Redrive Allow Policy	Controls which source queues can target this DLQ	Restrict to specific source queues
Retention Period	How long DLQ keeps messages before auto-deletion	14 days (max for SQS) — gives time to investigate

▸ DLQ Implementation by Broker

Broker	Mechanism	Configuration	Recovery
SQS	Source queue → redrive policy → DLQ (separate queue)	`RedrivePolicy: {deadLetterTargetArn, maxReceiveCount}`	StartMessageMoveTask API (redrive)
RabbitMQ	Dead-letter exchange (DLX) on TTL/reject/nack	`x-dead-letter-exchange` + `x-dead-letter-routing-key`	Shovel plugin or manual republish
Kafka	App writes failed events to `topic.DLT`	Spring Kafka: `@RetryableTopic(attempts=3)` + `@DltHandler`	Replay from DLT topic (seek to beginning)
Azure SB	Built-in `$DeadLetterQueue` sub-queue	`MaxDeliveryCount` on subscription/queue	Service Bus Explorer — peek & resubmit
GCP Pub/Sub	Dead-letter topic on subscription	`deadLetterPolicy: {deadLetterTopic, maxDeliveryAttempts}`	Pull from dead-letter subscription

▸ Why Messages End Up in DLQ

Poison Messages

Malformed payload (invalid JSON/XML)
Schema mismatch (missing required fields)
Encoding errors (wrong charset)
Message too large for consumer buffer

Transient Failures

Downstream service unavailable
Database connection timeout
Rate limiting / throttling
Network partition (temporary)

Logic Errors

Unhandled exception in consumer code
Business rule violation
Referential integrity failure
Idempotency key collision

Best Practices: Always alarm on DLQ depth > 0 — even 1 message means something is broken. Set up CloudWatch / Prometheus metrics on ApproximateNumberOfMessagesVisible. Include correlation IDs and original timestamps in message headers for debugging. Use separate DLQs per source queue to isolate failure domains.

Recovery Playbook: ① Alert fires → ② Inspect DLQ messages (peek, don't consume) → ③ Identify root cause (schema? downstream? bug?) → ④ Fix the consumer/downstream → ⑤ Redrive messages back to source queue → ⑥ Verify processing succeeds → ⑦ Post-mortem if recurring.

Anti-patterns: Ignoring DLQ — messages expire silently. No alarm — failures go unnoticed for days. Infinite retries without DLQ — blocks the entire queue. Same retention on DLQ as source — messages may expire before investigation. No metadata — can't trace why message failed.

Real-world: Uber — DLQ per microservice, auto-redrive after circuit breaker resets. Netflix — DLQ + S3 archival for compliance audit trail. Stripe — webhook DLQ with exponential backoff (5 retries over 3 days).

Event Sourcing

Persist facts (events) as the source of truth — derive state by replaying the event log. Never mutate, only append.

▸ Event Sourcing Architecture — Append, Replay, Project

Concept	Detail	Implementation
Event	Immutable fact that happened — past tense naming	`OrderCreated`, `PaymentReceived`, `ItemShipped`
Stream	Ordered sequence of events for one aggregate	`order-{orderId}` — one stream per entity instance
Aggregate	Consistency boundary — validates commands, emits events	Load from stream → apply events → check invariants → emit
Projection	Read model built by processing events	Subscribe to stream → update denormalized view (async)
Snapshot	Materialized state at a point in time	Store every N events — replay only from snapshot forward
Idempotency	Processing same event twice must produce same result	Track last processed position / use event ID as dedup key

▸ Event Store Technologies

Store	Type	Strengths	Considerations
EventStoreDB	Purpose-built	Native projections, subscriptions, optimistic concurrency	Smaller community, self-hosted
Kafka	Log-based	High throughput, built-in replication, ecosystem	No per-stream concurrency, compaction ≠ snapshots
PostgreSQL	RDBMS	ACID, familiar, NOTIFY for subscriptions	Manual stream management, polling for projections
DynamoDB	NoSQL	Serverless, DynamoDB Streams for projections	25-item transaction limit, cost at scale
Marten	Library (.NET)	PostgreSQL-backed, projections built-in	.NET ecosystem only

▸ Key Patterns & Challenges

Schema Evolution

Upcasting: transform old events to new schema on read
Versioning: OrderCreated_v2 with migration
Weak schema: add optional fields (backward compatible)
Never delete/modify stored events

Snapshotting Strategy

Snapshot every N events (e.g., 100)
Snapshot on time interval (e.g., daily)
Snapshot on aggregate complexity threshold
Store snapshot + version in separate stream

Handling Side Effects

Process managers / sagas for multi-aggregate workflows
Outbox pattern: event + side-effect in same transaction
Idempotent handlers: safe to replay
Compensating events for "undo" (not delete)

Common Pitfalls

GDPR: "right to erasure" conflicts with immutability → crypto-shredding
Large aggregates: too many events → slow replay → need snapshots
Event granularity: too fine = noise, too coarse = lost info
Projection lag: eventual consistency confuses users

Wins: Perfect audit trail — every state change recorded. Time-travel debugging — reconstruct state at any point. Rebuild projections — fix bugs, replay, get corrected views. Natural fit for financial ledgers, order systems, collaboration tools.

Costs: Schema evolution of events is hard (upcasting, versioning). Eventual consistency on read models. Snapshot/compaction strategy needed for long-lived aggregates. Steeper learning curve — team must think in events, not state.

Real-world: Stripe — payment state machine as events. Datomic — immutable database (event-sourced by design). LMAX Exchange — event-sourced trading engine (6M orders/sec). Git — commits are events, working tree is projection.

CQRS (Command Query Responsibility Segregation)

Separate the write model (commands) from the read model (queries) — scale, optimize, and evolve them independently

▸ CQRS Architecture — Write Path vs Read Path

Concept	Write Side	Read Side
Model	Domain model / aggregates — enforces invariants	Denormalized views — optimized for specific queries
Store	Normalized RDBMS or event store	Redis, Elasticsearch, DynamoDB, materialized views
Scale	Vertical (consistency matters)	Horizontal (read replicas, caches, CDN)
Consistency	Strong (ACID transactions)	Eventual (async projection updates)
Schema	3NF — no redundancy	Denormalized — pre-joined, pre-computed

▸ When to Use (and When NOT to)

✓ Good Fit

Read/write ratio heavily skewed (100:1 reads)
Complex queries need different data shapes
Write model has complex business rules
Multiple read representations needed (search, reports, API)
Event sourcing already in use
Microservices with separate read/write services

✗ Bad Fit

Simple CRUD — overhead not justified
Strong consistency required on reads (banking UI)
Small team — dual model doubles maintenance
Low traffic — no scaling benefit
Read-after-write needed immediately
Domain is simple with no complex queries

▸ CQRS + Event Sourcing (Power Combo)

Pattern: Commands → Aggregate → Events persisted → Projectors subscribe → Read models updated async. The event store IS the write model. Projections ARE the read models. Rebuild any projection by replaying events from the beginning.

Consistency strategies: Pull-based: query handler checks if projection is up-to-date (compare position). Push-based: projector publishes "ready" event. Hybrid: serve stale + indicate "updating" in UI. Inline projection: update synchronously in same transaction (sacrifices scalability for consistency).

Anti-patterns: Querying the write model — defeats the purpose. Bidirectional sync — creates conflicts. Shared database for read/write — coupling returns. Over-engineering — CQRS for a TODO app.

Real-world: Microsoft — Azure architecture patterns (official CQRS guidance). Uber — trip service (write) + rider-facing API (read from cache). Netflix — catalog writes vs personalized read views. Shopify — order writes vs merchant dashboard reads.

Ordering Guarantees

Where order is preserved — and where it is not. Critical for financial transactions, state machines, and causal consistency.

▸ Partition-Based Ordering — How Keys Determine Order

Broker	Order Scope	Mechanism	Throughput Impact
Kafka	Per-partition (key → partition)	`murmur2(key) % numPartitions`	More partitions = more parallelism, less global order
SQS Standard	Best effort — may reorder	Distributed architecture, no ordering guarantee	Unlimited throughput
SQS FIFO	Strict per `MessageGroupId`	Deduplication + sequencing per group	3,000 msg/sec (batching: 30K)
RabbitMQ	Per queue (single consumer)	FIFO within a single queue	Prefetch count affects perceived order
Pub/Sub	Optional ordering keys	`orderingKey` on publish	Ordered messages go to same region
Kinesis	Per shard (partition key)	`MD5(partitionKey)` → shard	1 MB/sec or 1000 records/sec per shard
Azure Event Hubs	Per partition	Partition key → consistent hash	1 MB/sec ingress per throughput unit

▸ Patterns for Maintaining Order

Partition Key Design

userId: all user events ordered (profile, orders, sessions)
orderId: order lifecycle events in sequence
accountId: financial transactions ordered per account
deviceId: IoT telemetry ordered per device
⚠ Hot keys: popular users → partition hotspot

Cross-Partition Ordering

Sequence numbers: embed monotonic seq in payload
Vector clocks: causal ordering across partitions
Lamport timestamps: logical ordering
Single partition: sacrifice parallelism for global order
External sequencer: Redis INCR or DB sequence

Key Insight: Use a stable partition key (userId, accountId, orderId) so all related events land in the same partition and stay ordered. Choose the key based on your consistency boundary — what entity needs its events in order?

Rebalancing risk: When partitions are added/removed, key-to-partition mapping changes. Use sticky partitioning or consistent hashing to minimize disruption. During rebalance, consumers may see temporary out-of-order delivery — handle with buffering + reordering window.

Anti-patterns: Random partition key — destroys ordering. Too few partitions — bottleneck on throughput. Assuming global order in multi-partition topics. Processing out-of-order without idempotency — corrupted state.

Schema Registry

Central contract for event payloads — prevents "it broke prod". Enforces backward/forward compatibility across producer and consumer versions.

▸ Schema Registry — Serialize, Register, Validate

Compatibility	Rule	Producers May	Consumers May	Use Case
Backward	New schema can read old data	Add optional fields, remove fields	Read old data with new code	Most common — consumers upgrade first
Forward	Old schema can read new data	Add fields, remove optional fields	Read new data with old code	Producers upgrade first
Full	Both backward + forward	Only add/remove optional fields	Both directions work	Safest — independent deployments
Transitive	Compatible with ALL previous versions	Strictest constraints	Any version reads any other	Long-lived topics, many consumers
None	No checks	Anything	May break	Development only — never in prod

▸ Schema Registry Implementations

Tool	Formats	Integration	Key Feature
Confluent Schema Registry	Avro, Protobuf, JSON Schema	Kafka native, REST API	De facto standard, subject strategies, schema references
AWS Glue Schema Registry	Avro, JSON Schema, Protobuf	MSK, Kinesis, Lambda	Serverless, IAM integration, auto-registration
Apicurio Registry	Avro, Protobuf, JSON, OpenAPI, GraphQL	Kafka, HTTP, gRPC	Open source, CNCF, multi-format
Azure Schema Registry	Avro	Event Hubs	Azure-native, RBAC, client-side caching
Buf (BSR)	Protobuf	gRPC, Connect	Breaking change detection, linting, code gen

▸ Schema Evolution Best Practices

Safe Changes (Backward Compatible)

Add optional field with default value
Add new enum value (if consumer ignores unknown)
Widen numeric type (int → long)
Add new union member (Avro)
Deprecate field (keep, stop writing)

Breaking Changes (Avoid)

Remove required field
Rename field (without alias)
Change field type (string → int)
Remove enum value
Change field from optional to required

Best Practices: Enforce compatibility in CI — reject PRs that break schema contracts. Use subject naming strategies (TopicNameStrategy, RecordNameStrategy) to control scope. Cache schemas on producer/consumer side (schema ID → schema). Set FULL_TRANSITIVE for critical topics.

Subject Strategies: TopicNameStrategy (default) — one schema per topic. RecordNameStrategy — schema per record type (multiple types in one topic). TopicRecordNameStrategy — schema per topic+record combo (most flexible).

Anti-patterns: No schema registry — "just use JSON" leads to silent breakage. Compatibility = NONE in prod — ticking time bomb. Not versioning schemas — can't roll back. Tight coupling — producer and consumer must deploy simultaneously.

Real-world: LinkedIn — Avro + Confluent SR for all Kafka topics (thousands of schemas). Uber — Protobuf + custom registry for gRPC services. Netflix — Avro schemas with automated compatibility testing in CI. Shopify — Protobuf for event-driven architecture with Buf for linting.