Skip to content

Article Review: Group 4 — Distributed Systems (Sharding & Distributed Locking)

Articles Reviewed

  1. Architecture Patterns: Sharding — Pier-Jean Malandrino / Scub-Lab — Sharding types, strategies, and trade-offs
  2. How to do distributed locking — Martin Kleppmann — Deep analysis of distributed lock correctness, fencing tokens, and why Redlock is unsafe

Key Concepts

Sharding (Malandrino)

Types: - Horizontal sharding — same schema, different rows per shard (our per-case DB model) - Vertical sharding — different tables/columns per shard

Strategies: | Strategy | Shard Key | Use Case | |----------|-----------|----------| | Hash-based | hash(key) → shard | Uniform distribution, web sessions | | Range-based | key ranges → shards | Time-series, sequential data | | Directory-based | Lookup table maps key → shard | Complex/non-uniform distribution | | Geo-based | Geographic location → shard | Data locality requirements |

Trade-offs: - Scalability and performance vs implementation complexity - Cross-shard joins are expensive or impossible - Cross-shard transactions are complex - Reverting from sharded to non-sharded is extremely difficult

Distributed Locking (Kleppmann)

Two reasons for locking: 1. Efficiency — avoid duplicate work. If lock fails, minor cost increase. Single Redis node is fine. 2. Correctness — prevent data corruption. If lock fails, serious problem. Need proper consensus (ZooKeeper, not Redlock).

The fencing token pattern: The critical insight. Even with a perfect lock, a GC pause (or network delay, or process pause) can cause a client to hold an expired lock and make unsafe writes. The fix: every lock acquisition returns a monotonically increasing fencing token. The storage service rejects writes with tokens lower than the highest it has seen. This makes the lock safe regardless of timing.

Why Redlock is unsafe: - Relies on timing assumptions (bounded network delay, bounded process pauses, bounded clock error) - Does not generate fencing tokens - Clock jumps on Redis nodes can cause two clients to hold the same lock - GC pauses can cause a client to use an expired lock

Kleppmann's recommendation: - For efficiency locks → single Redis node, document it's approximate - For correctness locks → ZooKeeper or database with transactional guarantees + fencing tokens

Mapping to NGE Architecture

1. VALIDATED: Per-Case Database Is Directory-Based Sharding

Our ADR-003 (per-case database schemas) is a directory-based sharding pattern: - Shard key: case_id - Directory: core schema maps case_id{RDS_DBNAME}_case_{case_id} - Each shard (case schema) has identical table structure - Lambda resolves the shard via set_npcase() + engine cache

This aligns with the article's description. Our specific advantages: - Natural shard boundary — cases are legally isolated; cross-case queries are never needed - No cross-shard joins — each case is self-contained by business requirement (litigation isolation) - Easy lifecycle — archive/drop schema when case closes - No rebalancing needed — case_id is permanent, no data migration between shards

This is the ideal sharding scenario because our shard boundary (case) is also our business isolation boundary. Most sharding pain points (cross-shard joins, rebalancing, hot spots) don't apply.

2. CRITICAL: Bates Number Sequencing Needs Fencing (ADR-006)

ADR-006 identifies bates number sequencing as a risk: bates numbers must be globally sequential per pattern across concurrent Lambda invocations.

Current mitigation plan: SELECT ... FOR UPDATE on BatesPattern for atomic increment. This is correct for efficiency locking — it prevents duplicate bates numbers under normal conditions. But Kleppmann's analysis reveals the deeper issue:

The risk scenario: 1. Lambda A acquires lock (SELECT ... FOR UPDATE), gets bates number 1001 2. Lambda A pauses (cold start, network delay, visibility timeout edge case) 3. MySQL lock times out (InnoDB innodb_lock_wait_timeout) 4. Lambda B acquires lock, gets bates number 1001 (duplicate!) 5. Lambda A resumes and stamps document with 1001

Is this a correctness or efficiency concern? For bates numbering, it's correctness — duplicate bates numbers are a legal compliance issue in eDiscovery. A document's bates number is its permanent legal identifier.

Recommendation: Apply fencing token pattern. The BatesPattern.next_number column is already a natural fencing token (monotonically increasing). The stamp service should: 1. Acquire the next number via SELECT ... FOR UPDATE + UPDATE (atomic increment) 2. Include the number as a fencing token when writing the stamp annotation 3. The Nutrient API call should be idempotent — if the annotation with that bates number already exists, skip it

This is simpler than distributed locks because we're using a single MySQL database (not a distributed lock service). MySQL's SELECT ... FOR UPDATE within a transaction provides the necessary serialization. The key is ensuring the Lambda timeout is well within innodb_lock_wait_timeout and that the operation is idempotent.

3. VALIDATED: No Distributed Lock Needed for Document Processing

Our document processing pipeline doesn't need distributed locks because: - Each message is processed by exactly one Lambda invocation (SQS visibility timeout) - Idempotent handlers protect against duplicate delivery - No shared mutable state between Lambda invocations (all state in MySQL or S3)

Kleppmann would approve: we use the database as the serialization point (correct approach) rather than a distributed lock service.

4. INFO: Sharding Strategy Implications for ADR-008 (Elasticsearch)

ADR-008 proposes an Elasticsearch upgrade. ES uses hash-based sharding internally. Our per-case data model maps to ES indices-per-case or routing-by-case-id. The sharding article's warning about cross-shard joins is relevant: cross-case search in ES requires careful index design (separate indices vs. shared index with routing).

New Backlog Items

Item Priority Related
Ensure bates number increment is idempotent (fencing token pattern) for ADR-006 HIGH ADR-006 risk: bates sequencing
Verify InnoDB lock wait timeout vs Lambda timeout alignment for stamp service MEDIUM ADR-006

Summary

The sharding article validates our per-case database as a textbook directory-based sharding pattern with ideal properties (shard boundary = business isolation boundary, no cross-shard joins needed). Kleppmann's distributed locking article is directly relevant to ADR-006 (bates stamping): bates numbers are a correctness concern requiring the fencing token pattern to prevent duplicates under concurrent Lambda execution. Our document processing pipeline correctly avoids distributed locks by using the database as the serialization point with idempotent handlers — exactly Kleppmann's recommendation.

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.