Semantic Search: Infrastructure Changes Summary¶

Overview¶

This document catalogs every infrastructure change needed to support the documentsearch module — from prototype through production, including backfill of existing data. Organized by: what exists (no changes), what's new, and what needs modification.

Infrastructure That Already Exists (Zero Changes)¶

Component	Current State	Role in Semantic Search
SNS topic	Shared topic, documentextractor publishes `DOCUMENT_PROCESSED`	Embedding pipeline subscribes via new filter policy. Topic itself unchanged.
Extracted text on S3	Every processed document has text at `s3://{bucket}/case_{id}/documents/{uid}/extracted.txt`	Input to chunking. Documentextractor already did the extraction.
Elasticsearch 7.4	Shared physical indices with per-case filtered aliases. 290+ field exhibit mapping. Custom `nextpoint_analyzer`.	BM25 leg of hybrid search queries existing aliases. No mapping or config changes.
Aurora MySQL (per-case)	`nextpoint_case_{id}` databases with exhibits, attachments, tags tables	Chunk text and embedding status stored in new tables (same database). Exhibits table provides document manifest for backfill.
PSM (Firehose -> Athena)	Captures all SNS events from NGE modules	Tracks `DOCUMENT_EMBEDDED` events for progress. No changes.
NgeCaseTrackerJob	Rails Sidekiq job polls Athena for NGE event status	Picks up embedding progress automatically. No changes.
VPC, subnets, security groups	Private subnets, NAT gateway for external API calls	CDK stacks deploy into existing network. NAT gateway needed for Voyage AI API.
Secrets Manager	Stores API keys and credentials for NGE modules	Voyage AI API key stored here. Standard pattern.
SSM Parameter Store	NGE modules publish service URLs; Rails reads them	documentsearch publishes API Gateway URL. Rails reads it. Standard pattern.
IAM roles	Rails -> Lambda authentication pattern established	Search API uses same IAM auth as documentexchanger.

Elasticsearch: Current Configuration (For Reference)¶

Understanding the current ES setup is critical because the BM25 leg of hybrid search queries these existing indices.

BM25 Settings¶

No custom BM25 tuning. Uses Elasticsearch 7.4 defaults:

Parameter	Value	What It Controls
`k1`	1.2 (ES default)	Term frequency saturation
`b`	0.75 (ES default)	Document length normalization (0=none, 1=full)
`search_type`	`dfs_query_then_fetch`	Global IDF scoring (not per-shard)

Note for hybrid search: Default b=0.75 normalizes scores by document length. Legal documents vary enormously in length (2-line emails vs 200-page contracts). Post-prototype, evaluate whether tuning b lower (e.g., 0.5) improves the BM25 leg for legal text — shorter documents shouldn't be penalized as heavily in a production with mixed document types.

Index Architecture¶

Aspect	Configuration
Index strategy	Shared physical indices with per-case filtered aliases (NOT index-per-case)
Alias naming	`{environment}_{npcase_id}_{type}` (e.g., `production_12345_exhibits`)
Physical index naming	`{environment}_{type}_{identifier}_{sequential_number}`
Max shard size	30 GB (new physical index created when exceeded)
Shard size check	Only at case creation time (known issue: ADR-008)
Join field	Parent-child for exhibit -> pages (`exhibit_join`)
Nested fields	`es_tags`, `es_exh_designations`, `shr_tags`, `relationships`
Full-text field	`search_text` (uses `nextpoint_analyzer`)
Dynamic mapping	Strict (no auto-detected fields)
Exhibit mapping	290+ fields

Custom Analyzers¶

Analyzer	Purpose
`nextpoint_analyzer`	Email parsing, edge n-gram, path hierarchies (index-time)
`nextpoint_search_analyzer`	Search-time paired analyzer
`edge_ngram_analyzer`	Autocomplete (min_gram: 1, max_gram: 6)
`custom_path_tree`	Folder path hierarchy

Implication for Hybrid Search¶

The BM25 leg of hybrid search queries the existing per-case alias:

# shell/keyword/es_ops.py
def keyword_search(case_id, query, filters, size=100):
    alias = f"{config.ENVIRONMENT}_{case_id}_exhibits"
    # Standard ES query against existing alias
    # Returns BM25-scored results

No ES changes required. The existing index, mapping, and analyzers are used as-is. The hybrid search module is a READ-ONLY consumer of the existing ES infrastructure.

New Infrastructure¶

1. Voyage AI Embedding API Access¶

Item	Details
What	API key for Voyage AI hosted API (`https://api.voyageai.com/v1/embeddings`)
Model	`voyage-law-2` (1024 dimensions, legal-optimized)
Storage	Secrets Manager (`documentsearch/voyage-api-key`)
Network	Lambda -> NAT Gateway -> Voyage AI API (HTTPS)
Cost	$0.12 per million tokens
Alternative	SageMaker endpoint (ml.g6.xlarge, $737/mo) if data-in-VPC required
Provisioning time	Minutes (API key signup)

2. Vector Store (Prototype: pgvector)¶

Item	Details
What	Aurora PostgreSQL 15 with pgvector extension
Instance	db.t3.medium (prototype) / db.r6g.large (production)
Why new DB engine	pgvector requires PostgreSQL. Current stack is Aurora MySQL.
Alternative	OpenSearch k-NN for production (see ADR)
Schema	One schema per case (mirrors MySQL `_case_{id}` pattern)
RDS Proxy	Required for Lambda connection pooling (~$87/mo)
Cost	~$150/mo (prototype) / ~$1,000/mo (production)
Provisioning time	15-30 minutes (Aurora cluster creation)

3. Vector Store (Production: OpenSearch)¶

Item	Details
What	Amazon OpenSearch Service with k-NN plugin (Faiss HNSW engine)
Instance	r6g.large.search x 2 (multi-AZ)
Why OpenSearch	Native hybrid search (BM25 + k-NN in one query), can replace ES 7.4 long-term
Index strategy	Index-per-case (NOT shared physical indices like current ES)
Cost	$3,900-5,000/mo (on-demand) / $2,700-3,400/mo (1-year RI)
Alternative	OpenSearch Serverless ($700-2,000/mo, but 60s refresh interval)
Provisioning time	15-30 minutes (domain creation)

Decision: Prototype uses pgvector. Production decision (OpenSearch Managed vs Serverless) deferred until prototype validates retrieval quality and query volume. See adr/adr-vector-store-selection.md.

4. documentsearch Lambda Functions (4)¶

Lambda	Memory	Timeout	Trigger	Purpose
Embedding Lambda	1024 MB	900s	SQS (live + backfill queues)	Chunk documents, call Voyage AI, store vectors
Search Lambda	512 MB	25s	API Gateway	Embed query, parallel BM25 + k-NN, RRF
Backfill Lambda	256 MB	25s	API Gateway	Trigger backfill for a case
Job Processor Lambda	256 MB	900s	SQS	Per-batch infrastructure lifecycle

All Lambdas: Python 3.10+, deployed in private subnets, packaged as Lambda layer.

5. SQS Queues (4)¶

Queue	Purpose	MaximumConcurrency	DLQ
`live_embedding_queue`	DOCUMENT_PROCESSED events from live ingest	10	Yes
`live_embedding_dlq`	Dead letters from live ingest	N/A	N/A
`backfill_embedding_queue`	BACKFILL_REQUESTED events for existing docs	3	Yes
`backfill_embedding_dlq`	Dead letters from backfill	N/A	N/A

Configuration: visibility timeout = 900s (matches Lambda timeout), maxReceiveCount = 3, DLQ retention = 14 days.

Subscription	Source Topic	Filter Policy
Live ingest	Existing shared SNS topic	`eventType: ["DOCUMENT_PROCESSED", "IMPORT_CANCELLED"]`
Backfill	Existing shared SNS topic	`eventType: ["BACKFILL_REQUESTED"]`

Both subscriptions use filterPolicyWithMessageBody and include caseId and batchId filters for per-batch routing.

7. API Gateway¶

Item	Details
Type	REST API
Endpoints	`POST /search`, `POST /backfill`, `GET /status/{case_id}`
Auth	IAM authorizer (Rails -> Lambda)
Timeout	29s (API Gateway hard limit), Lambda set to 25s
Rate limiting	100 burst, 50 req/sec per route (default)
URL published to	SSM Parameter Store (`documentsearch_api_url`)

8. MySQL Tables (Per-Case Database)¶

Two new tables in each nextpoint_case_{id} database:

CREATE TABLE search_chunks (
    id INT AUTO_INCREMENT PRIMARY KEY,
    document_id VARCHAR(255) NOT NULL,
    chunk_id VARCHAR(255) NOT NULL UNIQUE,
    chunk_index INT NOT NULL,
    chunk_text TEXT NOT NULL,
    metadata_json JSON,
    embedding_model VARCHAR(100),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_document_id (document_id)
);

CREATE TABLE search_embedding_status (
    npcase_id INT NOT NULL,
    document_id VARCHAR(255) NOT NULL,
    checkpoint_id INT NOT NULL DEFAULT 0,
    embedding_model VARCHAR(100),
    chunk_count INT,
    status VARCHAR(20),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (npcase_id, document_id)
);

Migration strategy: These tables are created by documentsearch's own schema migration, run per-case when the module first processes a document for that case. Same pattern as documentloader's checkpoint tables. No shared migration with Rails.

9. CloudWatch Alarms¶

Alarm	Threshold	Action
Search p99 latency	> 2 seconds	SNS notification
Search error rate	> 5%	SNS notification
Embedding DLQ depth	> 0 for 15 minutes	SNS notification
Backfill DLQ depth	> 0 for 30 minutes	SNS notification
Voyage AI errors	> 10 in 5 minutes	SNS notification

10. CDK Stacks (3)¶

Stack	Resources
CommonResourcesStack	Vector store (pgvector or OpenSearch), Secrets Manager refs, shared IAM roles
SearchIngestStack	Embedding Lambda, SQS queues/DLQs, SNS subscriptions, Job Processor Lambda
SearchApiStack	API Gateway, Search/Backfill/Status Lambdas, CloudWatch alarms

Existing Infrastructure Modifications¶

Modifications Required: None for T1¶

The documentsearch module is fully additive. No existing infrastructure is modified:

Existing Component	Modification	Why None Needed
SNS topic	None	New subscriptions are additive. Fan-out handles new consumer.
Elasticsearch 7.4	None	BM25 queries use existing aliases read-only. No mapping changes.
Aurora MySQL	None	New tables added to existing per-case databases via module migration.
Rails application	None for T1	T1+ adds new helper + UI. T1 is standalone.
documentextractor	None	Already publishes DOCUMENT_PROCESSED events.
documentloader	None	Unaffected by new SNS subscriber.
documentuploader	None	Unaffected.
PSM / Athena	None	Automatically captures new event types.

Backfill: Making Existing Data Searchable¶

What Already Exists for Each Document¶

Data	Location	Created By	Status
Raw file	S3 (`case_{id}/documents/{uid}/{filename}`)	Upload	Available
Extracted text	S3 (`case_{id}/documents/{uid}/extracted.txt`)	documentextractor	Available
Metadata (author, date, subject)	MySQL (`exhibits` table)	documentloader	Available
ES index entry	ES alias (`{env}_{case_id}_exhibits`)	documentloader	Available

What Backfill Creates for Each Document¶

Data	Location	Created By
Chunk text + metadata	MySQL (`search_chunks` table)	documentsearch embedding Lambda
Chunk vectors (1024-dim)	Vector store (pgvector or OpenSearch)	documentsearch embedding Lambda
Embedding status	MySQL (`search_embedding_status` table)	documentsearch embedding Lambda

Backfill Flow¶

POST /backfill { case_id: 123 }
    |
    v
Backfill Lambda queries MySQL:
    SELECT document_id, s3_text_path FROM exhibits
    WHERE NOT EXISTS in search_embedding_status
    |
    v
For each un-embedded document:
    Publish BACKFILL_REQUESTED to SNS
    |
    v
backfill_embedding_queue (SQS, MaxConcurrency: 3)
    |
    v
Embedding Lambda (same code as live ingest):
    1. Fetch extracted text from S3     (~50ms, $0)
    2. Chunk with metadata prepend      (~10ms, $0)
    3. Embed via Voyage AI              (~400ms, $0.003/doc avg)
    4. Store vectors in vector store    (~50ms)
    5. Store chunks in MySQL            (~20ms)
    6. Publish DOCUMENT_EMBEDDED        (~10ms)

Backfill Scale Estimates¶

Case Size	Documents	Chunks (est.)	Voyage AI Cost	Time (MaxConc=3)
Small	1,000	10,000	~$0.24	~10 minutes
Medium	50,000	500,000	~$12	~8 hours
Large	500,000	5,000,000	~$120	~3.5 days

Time estimate assumes: 3 concurrent Lambda invocations, 1 document per invocation, ~2 seconds per document (fetch + chunk + embed + store). Real throughput will vary based on document size and Voyage API latency.

Backfill Does NOT Require¶

Not Needed	Why
Re-extraction of text	Already on S3 from documentextractor
Re-indexing in ES	Existing ES entries are used as-is for BM25
Rails downtime	Backfill is background processing
Import re-processing	Documents don't go through the pipeline again
Schema migration on exhibits table	New tables only; exhibits table read-only

Cost Summary¶

Prototype (Single Case, 2 Weeks)¶

Item	Monthly Cost
Aurora PostgreSQL (db.t3.medium)	~$70
RDS Proxy	~$87
Lambda compute (minimal)	~$5
Voyage AI (one case, ~50K docs)	~$12 (one-time)
API Gateway	~$5
SQS	~$1
Total	~$170/mo + $12 one-time

Production (1000 Cases)¶

Item	Monthly Cost
OpenSearch (r6g.large x2, gp3)	$3,900-5,000
OR OpenSearch Serverless	$700-2,000
Lambda compute	~$200
Voyage AI (ongoing ingest)	Variable (~$0.12/M tokens)
Voyage AI (initial backfill of all cases)	~$5,000-10,000 (one-time)
API Gateway	~$50
SQS	~$20
CloudWatch	~$30
Total (Managed)	~$4,200-5,300/mo + backfill
Total (Serverless)	~$1,000-2,300/mo + backfill

Note: These costs are IN ADDITION to existing ES 7.4 cluster costs. If OpenSearch replaces ES 7.4 long-term (consolidation), the net increase is lower.

Infrastructure Provisioning Sequence¶

Prototype (Days 1-2)¶

1. Create Secrets Manager entry for Voyage AI API key
2. Provision Aurora PostgreSQL instance (db.t3.medium) + RDS Proxy
3. CREATE EXTENSION vector; on PostgreSQL
4. Deploy CommonResourcesStack (CDK)
5. Deploy SearchIngestStack (CDK) — creates Lambdas, queues, SNS subscriptions
6. Deploy SearchApiStack (CDK) — creates API Gateway, search Lambda
7. Publish API URL to SSM Parameter Store
8. Run backfill on sample case

Production¶

1. Provision OpenSearch domain (or convert from pgvector)
2. Deploy CDK stacks to dev -> staging -> qa -> prod
3. Verify DLQ = 0 post-deploy (standard NGE deployment check)
4. Backfill cases in batches (throttled, off-hours for large cases)
5. Enable search UI toggle in Rails (T1+ phase)

Production Corpus Analysis and Cost Model¶

Base Numbers (End of 2025)¶

Metric	Value	Source
Total documents	870,000,000 (870M)	Production corpus analysis
Total pages	6,400,000,000 (6.4B)	Production corpus analysis
Average pages per document	~7.4	6.4B / 870M
Estimated chunks per document	~15	~2 chunks per page (512 tokens, overlap)
Total estimated chunks	~13 billion	870M x 15
Average tokens per chunk	~200	512 target, shorter after accounting for small docs
Total estimated tokens	~2.6 trillion	13B chunks x 200 tokens

Scope Filters¶

Not all 870M documents need embedding. These filters are cumulative:

Filter	Documents After Filter	Reduction	Rationale
Full corpus	870M	--	Starting point
NGE-enabled cases only	~87M	~90% out	Only ~10% of cases are NGE-enabled. Legacy cases are the vast majority.
Discovery suite only	~78M	~10% out	~90% of documents are Discovery (documents). ~10% are Litigation (depositions/transcripts). Litigation is T2+ (later phase).
Active cases only	~78M	Minimal	Counts already represent active cases. Closed/archived cases excluded.
Realistic backfill scope	~78M	~91% reduction	Active NGE Discovery cases

Key insight: The NGE filter is the big one. Only ~10% of cases are NGE-enabled, which reduces the corpus from 870M to ~87M. Discovery-only further trims to ~78M. This is the realistic Phase 1 scope.

Voyage AI Embedding Cost¶

Scope	Documents	Est. Tokens	Direct API ($0.12/M)	SageMaker ($0.22/M)
Prototype (1 case)	~50K	~150M	~$18	~$33
Pilot (10 cases)	~500K	~1.5B	~$180	~$330
Phase 1 (top 100 active NGE)	~10M	~30B	~$3,600	~$6,600
Phase 2 (all active NGE Discovery)	~78M	~234B	~$28,000	~$51,500
Full corpus (if ever needed)	~870M	~2.6T	~$312,000	~$572,000

Backfill Time Estimates¶

Per-document throughput: ~2 seconds (S3 fetch + chunk + embed + store).

By Lambda Concurrency¶

Scope	Documents	MaxConc=10	MaxConc=50	MaxConc=100
Prototype (1 case)	50K	~3 hours	~30 min	~15 min
Pilot (10 cases)	500K	~28 hours	~6 hours	~3 hours
Phase 1 (100 cases)	10M	~23 days	~5 days	~2.3 days
Phase 2 (all NGE Discovery)	78M	~180 days	~36 days	~18 days

At MaximumConcurrency=10, Phase 2 takes 180 days. Too slow.

By Voyage AI API Rate Limits¶

The real constraint is Voyage API throughput, not Lambda concurrency. At enterprise tier (~10,000 requests/min, 128 texts/request):

Metric	Value
Max throughput	10,000 req/min x 128 texts/req = 1.28M chunks/min
At 15 chunks/doc	~85,000 documents/minute
Phase 1 (10M docs)	~2 hours
Phase 2 (78M docs)	~15 hours

With enterprise API rate limits, Phase 2 completes in under a day. Lambda concurrency should match API throughput: ~50-100 concurrent invocations during backfill.

With SageMaker Endpoints (In-VPC Alternative)¶

Each ml.g6.xlarge: 12.6M tokens/hr, ~$1.01/hr.

Instances	Throughput	Phase 2 Time	Instance Cost (duration)
5	63M tokens/hr	~155 days	~$3,685/mo
20	252M tokens/hr	~39 days	~$14,740/mo
50	630M tokens/hr	~15 days	~$36,850/mo

Direct API with enterprise rate limits is faster and cheaper for burst backfill. SageMaker makes sense only if data-in-VPC is required.

Vector Storage Cost¶

At Phase 2 scope (78M documents, ~1.17B chunks):

Storage Component	Calculation	Size
Raw vector data	1.17B x 1024 dims x 4 bytes	~4.8 TB
HNSW overhead (~1.5x)	Graph structure	~7.2 TB
Total vector storage		~7.2 TB

Vector Store	Monthly Storage Cost	Notes
OpenSearch Managed (gp3)	~$864/mo	Part of cluster cost
OpenSearch Serverless	~$173/mo	$0.024/GB
pgvector (Aurora)	~$720-1,440/mo	$0.10-0.20/GB

At full corpus (870M docs, ~13B chunks): ~80 TB. Significant. This is why phased rollout matters.

Chunk Text Storage (MySQL)¶

Scope	Chunks	Avg Chunk Text	Total Size	Notes
Phase 1 (10M docs)	~150M	~400 bytes	~60 GB	Spread across per-case DBs
Phase 2 (78M docs)	~1.17B	~400 bytes	~468 GB	Spread across per-case DBs
Full corpus	~13B	~400 bytes	~5.2 TB	Meaningful increase to Aurora storage

Stored in per-case databases (search_chunks table). Aurora storage is elastic but this is a real increase to track.

Recommended Phased Rollout¶

Phase	Scope	Docs	Embedding Cost	Vector Storage	Timeline	Gate
Prototype	1 known case	~50K	~$18	~50 GB	1 day	Validate retrieval quality
Pilot	10 active NGE cases	~500K	~$180	~500 GB	1 day	Attorney feedback
Phase 1	Top 100 active NGE cases	~10M	~$3,600	~700 GB	2-3 days	Confirm production stability
Phase 2	All active NGE Discovery	~78M	~$28,000	~7.2 TB	1-2 weeks	Business case approved
On-demand	Remaining cases	Per-case	Per-case	Per-case	On search	Embed when attorney searches

Phase 1 is the sweet spot: covers the most-searched cases, costs $3,600, done in 2-3 days. Proves value before committing $28K for Phase 2.

On-demand backfill is the right strategy for the long tail. Most of the 870M documents are in legacy cases that will rarely be searched again. Only embed when an attorney actually runs a semantic search on that case.

Cost Breakdown: One-Time Backfill vs Ongoing Monthly¶

One-Time Backfill Costs (Embedding Existing Documents)¶

These costs are incurred ONCE when existing documents are embedded for the first time. After backfill completes, these costs do not recur.

Phase	Documents	Voyage AI Embedding	Compute (Lambda)	Total One-Time
Prototype	50K	$18	~$2	~$20
Pilot (10 cases)	500K	$180	~$15	~$195
Phase 1 (100 cases)	10M	$3,600	~$300	~$3,900
Phase 2 (all NGE Discovery)	78M	$28,000	~$2,300	~$30,300

Lambda compute estimate: ~2 seconds per document, 1024 MB memory, $0.0000166667/GB-second = ~$0.000034/document.

Ongoing Monthly Costs: New Document Ingest (Embedding)¶

New documents imported into NGE cases flow through the embedding pipeline automatically (DOCUMENT_PROCESSED events). These are recurring costs.

Metric	Estimate	Notes
New documents per month (estimate)	~2-5M	Active NGE imports across all cases
Chunks per document	~15
Tokens per month	~6-15B	2-5M docs x 15 chunks x 200 tokens
Voyage AI cost/month	~$720-1,800	$0.12/M tokens
Lambda compute/month	~$70-170	Embedding Lambda invocations
Total ingest/month	~$790-1,970

Note: You would need to validate the 2-5M new docs/month estimate against actual import volumes. This can be derived from PSM/Athena event counts.

Ongoing Monthly Costs: Search Queries¶

Metric	Estimate	Notes
Search queries per month (estimate)	~30K-100K	Assumes 100-300 searches/day across all users
Voyage AI query embedding cost	~$0.04-0.12/month	~50 tokens/query x queries x $0.12/M tokens. Negligible.
Lambda compute (search)	~$5-15	512 MB, ~200ms per search
API Gateway	~$10-35	$3.50 per million requests
Total search/month	~$15-50	Search is cheap. Ingest is the cost driver.

Query embedding cost is effectively zero — ~$0.12/month even at 100K searches. The embedding model call adds ~50ms latency but negligible cost.

Ongoing Monthly Costs: Infrastructure (Always-On)¶

Component	Managed OpenSearch	Serverless OpenSearch	Notes
Vector store	$3,900-5,000	$700-2,000	Cluster or OCU cost
Vector storage (7.2 TB, Phase 2)	Included	~$173	gp3 or $0.024/GB
SQS (4 queues)	~$20	~$20
CloudWatch	~$30	~$30
Total infra/month	~$3,950-5,050	~$923-2,223

Total Monthly Cost Summary (Post-Phase 2, Steady State)¶

Cost Category	Managed OpenSearch	Serverless OpenSearch
Infrastructure (always-on)	$3,950-5,050	$923-2,223
New document embedding	$790-1,970	$790-1,970
Search queries	$15-50	$15-50
Total monthly	$4,755-7,070	$1,728-4,243

Plus one-time backfill: - Phase 1: ~$3,900 - Phase 2: ~$30,300

Cost Per Search Query¶

Component	Cost Per Query
Voyage AI (query embedding)	$0.0000012
Lambda (search, 200ms)	$0.0000017
API Gateway	$0.0000035
Vector store (amortized)	~$0.04-0.17 (infra / queries)
Total per query	~$0.04-0.17

The per-query cost is dominated by infrastructure amortization. At higher query volumes, the per-query cost drops significantly.

Cost Comparison: Semantic Search vs Status Quo¶

	Current (keyword only)	With semantic search (Managed OS)	With semantic search (Serverless OS)
ES 7.4 cluster	$X/mo (existing)	$X/mo (unchanged)	$X/mo (unchanged)
Vector store	$0	+$3,950-5,050/mo	+$923-2,223/mo
Embedding (ongoing)	$0	+$790-1,970/mo	+$790-1,970/mo
Search	Included in ES	+$15-50/mo	+$15-50/mo
Net new monthly	$0	+$4,755-7,070	+$1,728-4,243

If OpenSearch eventually replaces ES 7.4 (consolidation), the existing ES cluster cost offsets the OpenSearch vector store cost — potentially making the net increase just the embedding cost (~$790-1,970/mo).

Decision Log¶

Decision	Choice	Alternative	ADR
Embedding model	Voyage AI voyage-law-2 via direct API	SageMaker endpoint, Bedrock Titan V2	`adr/adr-vector-store-selection.md`
Vector store (prototype)	Aurora PostgreSQL + pgvector	OpenSearch, FAISS	`adr/adr-vector-store-selection.md`
Vector store (production)	OpenSearch Managed or Serverless	pgvector at scale	`adr/adr-vector-store-selection.md`
BM25 source	Existing ES 7.4 indices (read-only)	Duplicate in OpenSearch	N/A — reuse existing
Hybrid fusion	RRF (k=60) in application code	OpenSearch native hybrid (if consolidated)	`reference-implementations/documentsearch.md`
Multi-tenancy	Index-per-case (vectors) + per-case DB (chunks)	Shared index with filtering	`reference-implementations/documentsearch.md`
Backfill trigger	API + on-first-search auto-trigger	Bulk migration script only	`reference-implementations/documentsearch.md`
Search determinism	Audit logging + exact mode	Accept non-determinism	`reference-implementations/documentsearch.md`

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.