Skip to content

Reference Implementation: query-language-engine

Overview

The Query Language Engine (QLE) is a containerized TypeScript microservice that converts human-readable search query strings into Elasticsearch 7.x query DSL (JSON). It replaces the Legacy Ruby/Parslet-based query parser that lived inside the Rails monolith (lib/search/).

EDRM Stage: 7 (Analysis) — search query parsing for document review. Suite: Common (both Discovery and Litigation — exhibit and deposition searches).

Architecture

query-language-engine/
├── app/                                 # Application code (TypeScript)
│   ├── src/
│   │   ├── server.ts                    # HTTP server (raw Node.js, no Express)
│   │   ├── routes/
│   │   │   └── parse.ts                 # POST /parse — main endpoint
│   │   ├── queryEngine/
│   │   │   ├── parsing/
│   │   │   │   ├── UQLParser.ts         # Parser public API (lexer → grammar → visitor)
│   │   │   │   ├── lexer.ts             # Chevrotain token definitions (AND, OR, NOT, W/, etc.)
│   │   │   │   ├── grammar.ts           # Chevrotain grammar rules (UQL syntax)
│   │   │   │   └── visitor.ts           # CST visitor → intermediate parse tree
│   │   │   └── building/
│   │   │       ├── builders/
│   │   │       │   ├── builderFactory.ts     # Exhibit vs Deposition builder dispatch
│   │   │       │   ├── ExhibitESQueryBuilder.ts  # ES query with has_child for attachments
│   │   │       │   └── DepositionESQueryBuilder.ts # ES query with nested search_pages
│   │   │       ├── transformers/             # 16 specialized transformers
│   │   │       ├── fieldRegistry.ts          # Field mappings and aliases
│   │   │       └── field-definitions.ts      # Field type definitions
│   │   ├── middleware/
│   │   │   └── auth.ts                  # API key auth (timing-safe, SOC 2 logging)
│   │   └── client/
│   │       ├── experiment.ts            # A/B testing framework (Scientist-style)
│   │       └── schemas/                 # Zod request/response schemas
│   ├── scripts/
│   │   └── generate-ruby-client/        # Ruby gem generator for Rails integration
│   └── spec/                            # 1238 Jest tests
├── infrastructure/
│   └── lib/stacks/
│       └── ecs-stack.ts                 # CDK: ECS EC2, ALB, CloudWatch, ECR
├── pipeline/                            # CodePipeline CI/CD
└── CLAUDE.md                            # Project configuration

Pattern Mapping

Pattern QLE Implementation Standard NGE Pattern
Communication Synchronous HTTP (POST /parse) Async SNS/SQS events
Architecture Stateless request/response Hexagonal core/shell
Language TypeScript (Node.js 20) Python 3.10+
Compute ECS EC2 behind ALB Lambda or ECS Fargate
Database None (stateless) Per-case MySQL
Infrastructure AWS CDK (TypeScript) AWS CDK (TypeScript) ✓
CI/CD CodePipeline Bitbucket Pipelines
Auth API key (Secrets Manager, timing-safe comparison, SOC 2 audit logging) IAM roles / HMAC-SHA1
Testing Jest (1238 tests) pytest

Key Design Decisions

Two-Stage Pipeline

Query parsing happens in two stages:

Stage 1 — Parsing (string → parse tree):

Search string → Lexer (tokenize) → Grammar (parse) → Visitor (build parse tree)
Uses Chevrotain LL(k) parser (replaces Ruby Parslet PEG parser).

Stage 2 — Building (parse tree → ES query DSL):

Parse tree → BuilderFactory → ExhibitESQueryBuilder or DepositionESQueryBuilder
                           16 specialized transformers
                           (labels, batches, exports, custom fields,
                            proximity, ranges, boolean, wildcards, etc.)
                           Elasticsearch 7.x query DSL JSON

Service Discovery

Deployed as ECS EC2 behind ALB with AWS Cloud Map service discovery at query-engine.local:3000 on private subnets only. No public access.

Dual API Format (Migration)

Supports two request formats during the Rails migration: - Legacy: Top-level labels, batches, exports, categories fields (name-to-id dictionaries) - New: case_context array-based format from the Ruby client gem

If case_context is provided, it takes precedence and is converted via convertCaseContextToBuildContext().

Exhibit vs Deposition Builders

  • ExhibitESQueryBuilder: Generates ES queries with has_child for attachment-level search (documents have child attachment records in ES)
  • DepositionESQueryBuilder: Generates ES queries with nested for search_pages (deposition transcripts have nested page-level text)

A/B Testing Framework

Built-in Scientist-style experiment framework (client/experiment.ts) for comparing Ruby parser (control) vs QLE (candidate) during migration. This allows Rails to run both parsers and compare results before fully switching over.

Ruby Client Gem Generator

The project includes tooling (generate-ruby-client/) to produce a Ruby gem that Rails can consume, providing typed request/response objects for the POST /parse API.

Integration with Rails

Request flow:

Rails search request
QLE Ruby client gem
POST /parse (API key auth via X-API-Key header)
    │  Body: { query, searchType (exhibit|deposition), caseContext, options }
    │  caseContext: { labels, batches, exports, categories, customFields }
QLE processes: lexer → grammar → visitor → builder → transformers
Response: Elasticsearch 7.x query DSL JSON
Rails sends ES query to Elasticsearch
Results returned to user

What QLE replaces in Legacy: - lib/search/nextpoint_controller_search_factory.rb — search orchestration - lib/search/elasticsearch_document_search.rb — ES query building - lib/search/document_parsed_search_hash_transforms/ — query transforms - Parslet grammar definitions in lib/search/

Divergences from Standard NGE Patterns

This is a purpose-built stateless microservice, not a standard NGE service module:

  • Synchronous HTTP — not event-driven (SNS/SQS)
  • No database — pure query transformation with no persistence
  • No hexagonal boundary — no domain logic requiring core/shell separation
  • TypeScript — not Python
  • ECS EC2 — not Lambda (needs persistent HTTP server for low latency)
  • Single function — parse queries. No checkpoint pipeline, no batch lifecycle.

This is appropriate for its use case — a low-latency, stateless parser that serves every search request in the platform.

Key File Locations

File Purpose
app/src/server.ts HTTP server (two routes: /health, /parse)
app/src/routes/parse.ts Main parse endpoint
app/src/queryEngine/parsing/UQLParser.ts Parser public API
app/src/queryEngine/parsing/lexer.ts Chevrotain token definitions
app/src/queryEngine/building/builders/builderFactory.ts Exhibit vs Deposition dispatch
app/src/middleware/auth.ts API key auth with SOC 2 logging
app/src/client/experiment.ts A/B testing framework
infrastructure/lib/stacks/ecs-stack.ts CDK infrastructure
CLAUDE.md Project configuration
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.