ADR-008: Elasticsearch Upgrade + Search Service Extraction¶
Status¶
Proposed
Date¶
2026-03-19
Context¶
Elasticsearch is the backbone of Nextpoint's document and deposition search. The current implementation is deeply embedded in the Rails monolith with significant technical debt.
Current State¶
Version: ES 7.4.0 (gem pinned to 7.4.0, elasticsearch-model 7.0.0, elasticsearch-rails 7.0.0). ES 7.4 reached EOL in 2020.
Index Architecture:
- Multi-tenant via filtered aliases — multiple cases share physical indexes
- Physical index naming: {env}_{model}_{index_identifier}_{sequence}
- Alias naming: {env}_{npcase_id}_{model} (e.g., production_12345_exhibits)
- Filtered alias: { filter: { term: { npcase_id: npcase_id } } }
- Auto-sharding when shard exceeds 30GB (MAX_SHARD_SIZE)
- Two index types: exhibits (parent-child join field with attachments) and deposition_volumes
Search Pipeline (Parslet-based, Legacy):
User query string
→ SmartQuoteReplace (normalize)
→ DocumentSearchStringParser (Parslet PEG grammar)
→ DocumentParsedSearchHashTransform (AST → ES query DSL)
→ 17 sub-transform modules (batch, date, metadata, privilege, etc.)
→ ElasticsearchDocumentListSearch (execute, paginate, highlight)
→ Results
QLE (TypeScript, built but NOT integrated):
User query string
→ POST /parse to query-language-engine service
→ Chevrotain lexer → grammar → visitor (parse tree)
→ Builder factory → ExhibitESQueryBuilder or DepositionESQueryBuilder
→ 16 transformers
→ ES 7.x query DSL
→ (NOT YET) Rails uses this instead of Parslet
Indexing Pipeline:
Model change → request_elasticsearch_indexing (writes ElasticsearchIndexRequest)
→ ElasticsearchIndexer (persistent process) claims request
→ BulkIndexable#bulk_import (fork-based parallelism, batches of 1000)
→ Adaptive batch sizing (halves on RequestEntityTooLarge)
→ Zero-downtime reindexing via alias swap
Key Files:
- lib/search/ — 25+ files (parsers, transforms, search executors)
- app/models/concerns/ — Searchable, ExhibitSearchable, AttachmentSearchable, BulkIndexable, Reindexable
- app/models/elasticsearch_indexer.rb — persistent indexing process
- lib/search/elasticsearch/indices/mappings/ — YAML mapping definitions
Why Extract?¶
- ES 7.4 is 6 years past EOL — security vulnerabilities, no bug fixes
- QLE exists but isn't integrated — TypeScript parser is production-ready (1238 tests) but Rails still uses Parslet
- Parslet parser is unmaintainable — PEG grammar + 17 transform modules is fragile; every new field requires changes in multiple files
- Fork-based indexing is brittle —
BulkIndexablemonkeypatches elasticsearch-model to fork processes for parallel indexing - Tight Rails coupling — search logic is spread across concerns, transforms, and controllers
Decision¶
A three-phase approach that incrementally modernizes search without a big-bang migration.
Phase 1: Integrate QLE (Medium effort, high value)¶
Wire the existing query-language-engine service into Rails, replacing the Parslet parser:
BEFORE: User query → Parslet parser (Ruby, in-process) → ES query DSL → ES
AFTER: User query → HTTP POST to QLE (TypeScript, ECS) → ES query DSL → ES
Steps:
1. Install the QLE Ruby client gem (generator already exists in QLE repo)
2. Add QLE_ENABLED feature flag (per-case or global)
3. In ElasticsearchDocumentListSearch, replace Parslet parse+transform with QLE HTTP call
4. Use QLE's built-in A/B testing framework to compare Parslet vs QLE results in production
5. Once parity confirmed, remove Parslet code (~25 files)
This is the highest-value, lowest-risk step. QLE already has 1238 tests and was built specifically to replace the Parslet parser with feature parity.
Phase 2: ES Version Upgrade (High effort, required)¶
Upgrade from ES 7.4 → OpenSearch 2.x (or ES 8.x):
Steps:
1. Stand up new OpenSearch cluster alongside existing ES 7.4
2. Update mapping definitions (minimal changes from 7.4 → OpenSearch 2.x)
3. Update QLE to generate OpenSearch-compatible query DSL (QLE already targets 7.x, minor changes)
4. Dual-write indexing: new documents go to both clusters
5. Backfill: reindex existing data to new cluster using ElasticsearchReindexer
6. Switch aliases to new cluster
7. Decommission old ES 7.4 cluster
Why OpenSearch over ES 8.x: AWS-native, managed service (Amazon OpenSearch Service), no license concerns, API-compatible with ES 7.x.
Phase 3: Extract Search Service (Long-term)¶
Extract indexing and search execution into a standalone service:
Rails App Search Service (Lambda + ECS)
│ │
├── SNS: IndexRequested ├── Lambda: IndexProcessor
│ → case_id, exhibit_ids │ → bulk index to OpenSearch
│ │
├── HTTP: POST /search ├── ECS: SearchAPI
│ → query, case_id, filters │ → QLE parse + OpenSearch execute
│ ← results, highlights │ ← results
│ │
└── SNS: ReindexRequested └── Lambda: ReindexProcessor
→ case_id, full reindex → zero-downtime alias swap
This phase is optional. If Phase 1 + 2 address the immediate pain (EOL, parser maintenance), Phase 3 can wait until there's a stronger driver.
Consequences¶
Positive¶
- Phase 1 is immediately actionable — QLE exists, is tested, has a Ruby client generator
- Removes ~25 Parslet files — significant reduction in Rails codebase complexity
- ES upgrade path is clear — OpenSearch 2.x is API-compatible with ES 7.x
- A/B testing framework — QLE already supports comparing old vs new parser results
Negative¶
- Phase 1 adds HTTP latency — in-process Parslet (~5ms) vs HTTP to QLE (~20-50ms)
- Phase 2 is operationally complex — dual-cluster running, dual-write, backfill, cutover
- Phase 3 requires API layer — Rails currently calls ES directly; extracting search needs a clean API
Risks¶
- QLE parity gaps — QLE's CLAUDE.md mentions 5 remaining gap phases (wildcard validation, unknown field multi-match, fuzzy search, numeric brackets, security fix). Must verify these are complete before switching.
- Index mapping compatibility — ES 7.4 → OpenSearch 2.x mapping changes could affect existing data. Must test with production-scale data before cutover.
- Custom analyzers —
nextpoint_analyzer,edge_ngram_analyzer,custom_path_treemust work identically on the new cluster. - Fork-based indexing —
BulkIndexableusesforkfor parallel indexing. This works on EC2/ECS but would NOT work in Lambda. Phase 3 must replace fork-based parallelism with SQS fan-out.
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.