How does a collaborative editor handle 100 users editing the same paragraph simultaneously, resolving conflicting character insertions without a central lock while maintaining convergence across all clients within 50ms?
Core challenge: Two users type at the same position at the same time. Without coordination, their documents diverge permanently. How do you guarantee all clients converge to the same final state without locking?
100+
concurrent editors
same document
<50ms
convergence latency
local-first, sync async
1B+
documents
Google Workspace scale
0
data loss
every keystroke preserved
Functional Requirements
What the system must do · core collaborative editing behaviours
Must Have (Core)
Multiple users edit same document simultaneously
All clients converge to identical state (consistency)
No locking · users never blocked from typing
Preserve user intent (insertions don't overwrite each other)
Real-time cursor/selection visibility of other users
Undo/redo works correctly per-user in collaborative context
Should Have
Offline editing with sync on reconnect
Version history with point-in-time restore
Comments and suggestions (non-destructive)
Rich text formatting (bold, headings, lists)
Presence indicators (who's viewing/editing)
Permission levels (view, comment, edit)
Non-Functional Requirements
Requirement
Target
Why
Latency
<50ms local apply, <200ms remote sync
Typing must feel instant; remote changes appear quickly
Consistency
Strong eventual convergence
All clients must reach same state, regardless of operation order
Availability
99.99% uptime
Users depend on Docs for daily work
Scalability
100+ concurrent editors per doc
Large team meetings, live editing sessions
Durability
Zero data loss
Every keystroke must be persisted
Bandwidth
<10KB/sec per user
Works on mobile/slow connections
High-Level Architecture
Client-side OT with server as single source of truth
OT vs CRDT · Core Algorithm Choice
Two fundamentally different approaches to conflict resolution
How OT works: Each edit is an operation (insert 'H' at position 5, delete at position 3). When two ops conflict, a transform function adjusts positions so both can apply correctly. Server maintains linear revision history · all ops are serialized through one point.
// OT Transform Example:
// User A: insert('X', pos=3) at revision 5
// User B: insert('Y', pos=1) at revision 5 (concurrent!)
// Server receives A first ? applies at pos 3 ? rev 6
// Server receives B (based on rev 5, but now rev 6 exists)
// Transform: B's pos=1 < A's pos=3, so B stays at pos=1
// Result: "abYcXdef" · both insertions preserved, positions adjusted
transform(insertA, insertB):
if A.pos <= B.pos: B.pos += len(A.text)
if B.pos < A.pos: A.pos += len(B.text)
? CRDT (Conflict-Free Replicated Data Types) · Figma's Approach
How CRDTs work: Each character has a unique ID (not position-based). Operations reference IDs, not indices. Merge is commutative + associative · order doesn't matter, result is always the same. No central server needed for correctness.
OT (Google Docs)
CRDT (Figma, Yjs, Automerge)
Server
Required · serializes ops
Optional · peer-to-peer possible
Complexity
Transform functions (O(n·) worst case)
Unique IDs per character (metadata overhead)
Offline
Limited (must sync through server)
Excellent · merge on reconnect
Memory
Low (ops are small)
Higher (tombstones, unique IDs)
Correctness
Proven for specific transform pairs
Mathematically guaranteed convergence
Latency
Server round-trip for confirmation
Instant local, async merge
Used by
Google Docs, Etherpad
Figma, Apple Notes, Notion (partial), Yjs
Key Design Decisions
Critical choices that determine system behavior
Client-Side OT Pipeline
Local apply: user types ? apply immediately (0ms latency)
Buffer: store pending ops not yet ACKed by server
Send: send op with base revision to server
ACK: server confirms ? remove from buffer
Remote op arrives: transform against pending buffer
Apply transformed: update local doc with remote changes
Server-Side Processing
Receive op: client sends (op, baseRevision)
Transform: against all ops since baseRevision
Apply: to server document state
Assign revision: monotonic increment
Persist: append to revision log (Spanner)
Broadcast: send transformed op to all other clients
Cursor & Presence
Cursor positions sent as ephemeral (not persisted)
Throttled to 50ms intervals (avoid flooding)
Transformed alongside document ops
Color-coded per user (up to ~20 visible)
Selection ranges shown as highlights
Failure Handling
Disconnect: buffer ops locally, resync on reconnect
Server crash: replay from revision log
Conflict: OT guarantees convergence (no manual merge)
Slow client: server compacts ops into snapshots
Version history: periodic snapshots + op replay
Scaling & Production Considerations
Challenge
Solution
Detail
Hot documents
Dedicated server per doc
Shard by docId, pin to single server for serialization
100+ editors
Op batching + throttling
Batch rapid keystrokes into single op (debounce 50ms)
Large documents
Chunked loading
Load visible portion, lazy-load rest on scroll
Version history
Periodic snapshots
Snapshot every N ops, replay from nearest snapshot
Undo/Redo
Per-user inverse ops
Undo transforms against subsequent ops (complex!)
Rich text
Structured ops
Ops include formatting attributes, not just text
Google's approach: OT with server as serialization point. Each document has a single collaboration server (sharded by docId). Server maintains linear revision history. Clients optimistically apply locally, server transforms and broadcasts. Jupiter protocol (Google's OT variant) handles the client-server transform.
Real-world numbers: Google Docs handles 1B+ documents, 100+ concurrent editors per doc, <50ms local latency, <200ms sync latency. Revision log stored in Spanner for strong consistency. Presence via Colossus (Google's distributed file system).
Common mistakes:Position-based without transform · divergence guaranteed. Locking paragraphs · terrible UX. Last-write-wins · loses edits silently. Sending full document on each edit · bandwidth explosion.
Interview Cheat Sheet
The 8 things to say for collaborative editing design
1.OT (Operational Transform) · transform concurrent ops against each other to preserve intent 2.Server as serialization point · single source of truth, assigns revision numbers 3.Optimistic local apply · user sees their edit instantly, sync happens async 4.Transform function · insert(pos=5) vs insert(pos=3) ? shift first to pos=6 5.CRDTs as alternative · no server needed (P2P), but larger metadata overhead 6.Cursor/selection sync · broadcast cursor positions via presence channel (ephemeral) 7.Undo = inverse operation · not "restore previous state" (would undo others' edits) 8.Revision log for history · every op stored, enables time-travel and "see changes"