Skip to content

ADR-008: Elasticsearch Upgrade + Search Service Extraction

Status

Proposed

Date

2026-03-19

Context

Elasticsearch is the backbone of Nextpoint's document and deposition search. The current implementation is deeply embedded in the Rails monolith with significant technical debt.

Current State

Version: ES 7.4.0 (gem pinned to 7.4.0, elasticsearch-model 7.0.0, elasticsearch-rails 7.0.0). ES 7.4 reached EOL in 2020.

Index Architecture: - Multi-tenant via filtered aliases — multiple cases share physical indexes - Physical index naming: {env}_{model}_{index_identifier}_{sequence} - Alias naming: {env}_{npcase_id}_{model} (e.g., production_12345_exhibits) - Filtered alias: { filter: { term: { npcase_id: npcase_id } } } - Auto-sharding when shard exceeds 30GB (MAX_SHARD_SIZE) - Two index types: exhibits (parent-child join field with attachments) and deposition_volumes

Search Pipeline (Parslet-based, Legacy):

User query string
  → SmartQuoteReplace (normalize)
  → DocumentSearchStringParser (Parslet PEG grammar)
  → DocumentParsedSearchHashTransform (AST → ES query DSL)
    → 17 sub-transform modules (batch, date, metadata, privilege, etc.)
  → ElasticsearchDocumentListSearch (execute, paginate, highlight)
  → Results

QLE (TypeScript, built but NOT integrated):

User query string
  → POST /parse to query-language-engine service
  → Chevrotain lexer → grammar → visitor (parse tree)
  → Builder factory → ExhibitESQueryBuilder or DepositionESQueryBuilder
    → 16 transformers
  → ES 7.x query DSL
  → (NOT YET) Rails uses this instead of Parslet

Indexing Pipeline:

Model change → request_elasticsearch_indexing (writes ElasticsearchIndexRequest)
  → ElasticsearchIndexer (persistent process) claims request
  → BulkIndexable#bulk_import (fork-based parallelism, batches of 1000)
  → Adaptive batch sizing (halves on RequestEntityTooLarge)
  → Zero-downtime reindexing via alias swap

Key Files: - lib/search/ — 25+ files (parsers, transforms, search executors) - app/models/concerns/Searchable, ExhibitSearchable, AttachmentSearchable, BulkIndexable, Reindexable - app/models/elasticsearch_indexer.rb — persistent indexing process - lib/search/elasticsearch/indices/mappings/ — YAML mapping definitions

Why Extract?

  1. ES 7.4 is 6 years past EOL — security vulnerabilities, no bug fixes
  2. QLE exists but isn't integrated — TypeScript parser is production-ready (1238 tests) but Rails still uses Parslet
  3. Parslet parser is unmaintainable — PEG grammar + 17 transform modules is fragile; every new field requires changes in multiple files
  4. Fork-based indexing is brittleBulkIndexable monkeypatches elasticsearch-model to fork processes for parallel indexing
  5. Tight Rails coupling — search logic is spread across concerns, transforms, and controllers

Decision

A three-phase approach that incrementally modernizes search without a big-bang migration.

Phase 1: Integrate QLE (Medium effort, high value)

Wire the existing query-language-engine service into Rails, replacing the Parslet parser:

BEFORE: User query → Parslet parser (Ruby, in-process) → ES query DSL → ES

AFTER:  User query → HTTP POST to QLE (TypeScript, ECS) → ES query DSL → ES

Steps: 1. Install the QLE Ruby client gem (generator already exists in QLE repo) 2. Add QLE_ENABLED feature flag (per-case or global) 3. In ElasticsearchDocumentListSearch, replace Parslet parse+transform with QLE HTTP call 4. Use QLE's built-in A/B testing framework to compare Parslet vs QLE results in production 5. Once parity confirmed, remove Parslet code (~25 files)

This is the highest-value, lowest-risk step. QLE already has 1238 tests and was built specifically to replace the Parslet parser with feature parity.

Phase 2: ES Version Upgrade (High effort, required)

Upgrade from ES 7.4 → OpenSearch 2.x (or ES 8.x):

Steps: 1. Stand up new OpenSearch cluster alongside existing ES 7.4 2. Update mapping definitions (minimal changes from 7.4 → OpenSearch 2.x) 3. Update QLE to generate OpenSearch-compatible query DSL (QLE already targets 7.x, minor changes) 4. Dual-write indexing: new documents go to both clusters 5. Backfill: reindex existing data to new cluster using ElasticsearchReindexer 6. Switch aliases to new cluster 7. Decommission old ES 7.4 cluster

Why OpenSearch over ES 8.x: AWS-native, managed service (Amazon OpenSearch Service), no license concerns, API-compatible with ES 7.x.

Phase 3: Extract Search Service (Long-term)

Extract indexing and search execution into a standalone service:

Rails App                          Search Service (Lambda + ECS)
  │                                  │
  ├── SNS: IndexRequested            ├── Lambda: IndexProcessor
  │     → case_id, exhibit_ids       │     → bulk index to OpenSearch
  │                                  │
  ├── HTTP: POST /search             ├── ECS: SearchAPI
  │     → query, case_id, filters    │     → QLE parse + OpenSearch execute
  │     ← results, highlights        │     ← results
  │                                  │
  └── SNS: ReindexRequested          └── Lambda: ReindexProcessor
        → case_id, full reindex            → zero-downtime alias swap

This phase is optional. If Phase 1 + 2 address the immediate pain (EOL, parser maintenance), Phase 3 can wait until there's a stronger driver.

Consequences

Positive

  • Phase 1 is immediately actionable — QLE exists, is tested, has a Ruby client generator
  • Removes ~25 Parslet files — significant reduction in Rails codebase complexity
  • ES upgrade path is clear — OpenSearch 2.x is API-compatible with ES 7.x
  • A/B testing framework — QLE already supports comparing old vs new parser results

Negative

  • Phase 1 adds HTTP latency — in-process Parslet (~5ms) vs HTTP to QLE (~20-50ms)
  • Phase 2 is operationally complex — dual-cluster running, dual-write, backfill, cutover
  • Phase 3 requires API layer — Rails currently calls ES directly; extracting search needs a clean API

Risks

  • QLE parity gaps — QLE's CLAUDE.md mentions 5 remaining gap phases (wildcard validation, unknown field multi-match, fuzzy search, numeric brackets, security fix). Must verify these are complete before switching.
  • Index mapping compatibility — ES 7.4 → OpenSearch 2.x mapping changes could affect existing data. Must test with production-scale data before cutover.
  • Custom analyzersnextpoint_analyzer, edge_ngram_analyzer, custom_path_tree must work identically on the new cluster.
  • Fork-based indexingBulkIndexable uses fork for parallel indexing. This works on EC2/ECS but would NOT work in Lambda. Phase 3 must replace fork-based parallelism with SQS fan-out.
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.