Semantic Search: Infrastructure Changes Summary
Overview
This document catalogs every infrastructure change needed to support the
documentsearch module — from prototype through production, including backfill
of existing data. Organized by: what exists (no changes), what's new, and
what needs modification.
Infrastructure That Already Exists (Zero Changes)
Component
Current State
Role in Semantic Search
SNS topic
Shared topic, documentextractor publishes DOCUMENT_PROCESSED
Embedding pipeline subscribes via new filter policy. Topic itself unchanged.
Extracted text on S3
Every processed document has text at s3://{bucket}/case_{id}/documents/{uid}/extracted.txt
Input to chunking. Documentextractor already did the extraction.
Elasticsearch 7.4
Shared physical indices with per-case filtered aliases. 290+ field exhibit mapping. Custom nextpoint_analyzer.
BM25 leg of hybrid search queries existing aliases. No mapping or config changes.
Aurora MySQL (per-case)
nextpoint_case_{id} databases with exhibits, attachments, tags tables
Chunk text and embedding status stored in new tables (same database). Exhibits table provides document manifest for backfill.
PSM (Firehose -> Athena)
Captures all SNS events from NGE modules
Tracks DOCUMENT_EMBEDDED events for progress. No changes.
NgeCaseTrackerJob
Rails Sidekiq job polls Athena for NGE event status
Picks up embedding progress automatically. No changes.
VPC, subnets, security groups
Private subnets, NAT gateway for external API calls
CDK stacks deploy into existing network. NAT gateway needed for Voyage AI API.
Secrets Manager
Stores API keys and credentials for NGE modules
Voyage AI API key stored here. Standard pattern.
SSM Parameter Store
NGE modules publish service URLs; Rails reads them
documentsearch publishes API Gateway URL. Rails reads it. Standard pattern.
IAM roles
Rails -> Lambda authentication pattern established
Search API uses same IAM auth as documentexchanger.
Elasticsearch: Current Configuration (For Reference)
Understanding the current ES setup is critical because the BM25 leg of hybrid
search queries these existing indices.
BM25 Settings
No custom BM25 tuning. Uses Elasticsearch 7.4 defaults:
Parameter
Value
What It Controls
k1
1.2 (ES default)
Term frequency saturation
b
0.75 (ES default)
Document length normalization (0=none, 1=full)
search_type
dfs_query_then_fetch
Global IDF scoring (not per-shard)
Note for hybrid search : Default b=0.75 normalizes scores by document
length. Legal documents vary enormously in length (2-line emails vs 200-page
contracts). Post-prototype, evaluate whether tuning b lower (e.g., 0.5)
improves the BM25 leg for legal text — shorter documents shouldn't be penalized
as heavily in a production with mixed document types.
Index Architecture
Aspect
Configuration
Index strategy
Shared physical indices with per-case filtered aliases (NOT index-per-case)
Alias naming
{environment}_{npcase_id}_{type} (e.g., production_12345_exhibits)
Physical index naming
{environment}_{type}_{identifier}_{sequential_number}
Max shard size
30 GB (new physical index created when exceeded)
Shard size check
Only at case creation time (known issue: ADR-008)
Join field
Parent-child for exhibit -> pages (exhibit_join)
Nested fields
es_tags, es_exh_designations, shr_tags, relationships
Full-text field
search_text (uses nextpoint_analyzer)
Dynamic mapping
Strict (no auto-detected fields)
Exhibit mapping
290+ fields
Custom Analyzers
Analyzer
Purpose
nextpoint_analyzer
Email parsing, edge n-gram, path hierarchies (index-time)
nextpoint_search_analyzer
Search-time paired analyzer
edge_ngram_analyzer
Autocomplete (min_gram: 1, max_gram: 6)
custom_path_tree
Folder path hierarchy
Implication for Hybrid Search
The BM25 leg of hybrid search queries the existing per-case alias:
# shell/keyword/es_ops.py
def keyword_search ( case_id , query , filters , size = 100 ):
alias = f " { config . ENVIRONMENT } _ { case_id } _exhibits"
# Standard ES query against existing alias
# Returns BM25-scored results
No ES changes required. The existing index, mapping, and analyzers are used
as-is. The hybrid search module is a READ-ONLY consumer of the existing ES
infrastructure.
New Infrastructure
1. Voyage AI Embedding API Access
Item
Details
What
API key for Voyage AI hosted API (https://api.voyageai.com/v1/embeddings)
Model
voyage-law-2 (1024 dimensions, legal-optimized)
Storage
Secrets Manager (documentsearch/voyage-api-key)
Network
Lambda -> NAT Gateway -> Voyage AI API (HTTPS)
Cost
$0.12 per million tokens
Alternative
SageMaker endpoint (ml.g6.xlarge, $737/mo) if data-in-VPC required
Provisioning time
Minutes (API key signup)
2. Vector Store (Prototype: pgvector)
Item
Details
What
Aurora PostgreSQL 15 with pgvector extension
Instance
db.t3.medium (prototype) / db.r6g.large (production)
Why new DB engine
pgvector requires PostgreSQL. Current stack is Aurora MySQL.
Alternative
OpenSearch k-NN for production (see ADR)
Schema
One schema per case (mirrors MySQL _case_{id} pattern)
RDS Proxy
Required for Lambda connection pooling (~$87/mo)
Cost
~$150/mo (prototype) / ~$1,000/mo (production)
Provisioning time
15-30 minutes (Aurora cluster creation)
3. Vector Store (Production: OpenSearch)
Item
Details
What
Amazon OpenSearch Service with k-NN plugin (Faiss HNSW engine)
Instance
r6g.large.search x 2 (multi-AZ)
Why OpenSearch
Native hybrid search (BM25 + k-NN in one query), can replace ES 7.4 long-term
Index strategy
Index-per-case (NOT shared physical indices like current ES)
Cost
$3,900-5,000/mo (on-demand) / $2,700-3,400/mo (1-year RI)
Alternative
OpenSearch Serverless ($700-2,000/mo, but 60s refresh interval)
Provisioning time
15-30 minutes (domain creation)
Decision : Prototype uses pgvector. Production decision (OpenSearch Managed vs
Serverless) deferred until prototype validates retrieval quality and query volume.
See adr/adr-vector-store-selection.md.
4. documentsearch Lambda Functions (4)
Lambda
Memory
Timeout
Trigger
Purpose
Embedding Lambda
1024 MB
900s
SQS (live + backfill queues)
Chunk documents, call Voyage AI, store vectors
Search Lambda
512 MB
25s
API Gateway
Embed query, parallel BM25 + k-NN, RRF
Backfill Lambda
256 MB
25s
API Gateway
Trigger backfill for a case
Job Processor Lambda
256 MB
900s
SQS
Per-batch infrastructure lifecycle
All Lambdas: Python 3.10+, deployed in private subnets, packaged as Lambda layer.
5. SQS Queues (4)
Queue
Purpose
MaximumConcurrency
DLQ
live_embedding_queue
DOCUMENT_PROCESSED events from live ingest
10
Yes
live_embedding_dlq
Dead letters from live ingest
N/A
N/A
backfill_embedding_queue
BACKFILL_REQUESTED events for existing docs
3
Yes
backfill_embedding_dlq
Dead letters from backfill
N/A
N/A
Configuration: visibility timeout = 900s (matches Lambda timeout),
maxReceiveCount = 3, DLQ retention = 14 days.
6. SNS Subscriptions (2)
Subscription
Source Topic
Filter Policy
Live ingest
Existing shared SNS topic
eventType: ["DOCUMENT_PROCESSED", "IMPORT_CANCELLED"]
Backfill
Existing shared SNS topic
eventType: ["BACKFILL_REQUESTED"]
Both subscriptions use filterPolicyWithMessageBody and include caseId and
batchId filters for per-batch routing.
7. API Gateway
Item
Details
Type
REST API
Endpoints
POST /search, POST /backfill, GET /status/{case_id}
Auth
IAM authorizer (Rails -> Lambda)
Timeout
29s (API Gateway hard limit), Lambda set to 25s
Rate limiting
100 burst, 50 req/sec per route (default)
URL published to
SSM Parameter Store (documentsearch_api_url)
8. MySQL Tables (Per-Case Database)
Two new tables in each nextpoint_case_{id} database:
CREATE TABLE search_chunks (
id INT AUTO_INCREMENT PRIMARY KEY ,
document_id VARCHAR ( 255 ) NOT NULL ,
chunk_id VARCHAR ( 255 ) NOT NULL UNIQUE ,
chunk_index INT NOT NULL ,
chunk_text TEXT NOT NULL ,
metadata_json JSON ,
embedding_model VARCHAR ( 100 ),
created_at DATETIME DEFAULT CURRENT_TIMESTAMP ,
INDEX idx_document_id ( document_id )
);
CREATE TABLE search_embedding_status (
npcase_id INT NOT NULL ,
document_id VARCHAR ( 255 ) NOT NULL ,
checkpoint_id INT NOT NULL DEFAULT 0 ,
embedding_model VARCHAR ( 100 ),
chunk_count INT ,
status VARCHAR ( 20 ),
created_at DATETIME DEFAULT CURRENT_TIMESTAMP ,
updated_at DATETIME ON UPDATE CURRENT_TIMESTAMP ,
PRIMARY KEY ( npcase_id , document_id )
);
Migration strategy : These tables are created by documentsearch's own schema
migration, run per-case when the module first processes a document for that case.
Same pattern as documentloader's checkpoint tables. No shared migration with Rails.
9. CloudWatch Alarms
Alarm
Threshold
Action
Search p99 latency
> 2 seconds
SNS notification
Search error rate
> 5%
SNS notification
Embedding DLQ depth
> 0 for 15 minutes
SNS notification
Backfill DLQ depth
> 0 for 30 minutes
SNS notification
Voyage AI errors
> 10 in 5 minutes
SNS notification
10. CDK Stacks (3)
Stack
Resources
CommonResourcesStack
Vector store (pgvector or OpenSearch), Secrets Manager refs, shared IAM roles
SearchIngestStack
Embedding Lambda, SQS queues/DLQs, SNS subscriptions, Job Processor Lambda
SearchApiStack
API Gateway, Search/Backfill/Status Lambdas, CloudWatch alarms
Existing Infrastructure Modifications
Modifications Required: None for T1
The documentsearch module is fully additive. No existing infrastructure is
modified:
Existing Component
Modification
Why None Needed
SNS topic
None
New subscriptions are additive. Fan-out handles new consumer.
Elasticsearch 7.4
None
BM25 queries use existing aliases read-only. No mapping changes.
Aurora MySQL
None
New tables added to existing per-case databases via module migration.
Rails application
None for T1
T1+ adds new helper + UI. T1 is standalone.
documentextractor
None
Already publishes DOCUMENT_PROCESSED events.
documentloader
None
Unaffected by new SNS subscriber.
documentuploader
None
Unaffected.
PSM / Athena
None
Automatically captures new event types.
Backfill: Making Existing Data Searchable
What Already Exists for Each Document
Data
Location
Created By
Status
Raw file
S3 (case_{id}/documents/{uid}/{filename})
Upload
Available
Extracted text
S3 (case_{id}/documents/{uid}/extracted.txt)
documentextractor
Available
Metadata (author, date, subject)
MySQL (exhibits table)
documentloader
Available
ES index entry
ES alias ({env}_{case_id}_exhibits)
documentloader
Available
What Backfill Creates for Each Document
Data
Location
Created By
Chunk text + metadata
MySQL (search_chunks table)
documentsearch embedding Lambda
Chunk vectors (1024-dim)
Vector store (pgvector or OpenSearch)
documentsearch embedding Lambda
Embedding status
MySQL (search_embedding_status table)
documentsearch embedding Lambda
Backfill Flow
POST /backfill { case_id: 123 }
|
v
Backfill Lambda queries MySQL:
SELECT document_id, s3_text_path FROM exhibits
WHERE NOT EXISTS in search_embedding_status
|
v
For each un-embedded document:
Publish BACKFILL_REQUESTED to SNS
|
v
backfill_embedding_queue (SQS, MaxConcurrency: 3)
|
v
Embedding Lambda (same code as live ingest):
1. Fetch extracted text from S3 (~50ms, $0)
2. Chunk with metadata prepend (~10ms, $0)
3. Embed via Voyage AI (~400ms, $0.003/doc avg)
4. Store vectors in vector store (~50ms)
5. Store chunks in MySQL (~20ms)
6. Publish DOCUMENT_EMBEDDED (~10ms)
Backfill Scale Estimates
Case Size
Documents
Chunks (est.)
Voyage AI Cost
Time (MaxConc=3)
Small
1,000
10,000
~$0.24
~10 minutes
Medium
50,000
500,000
~$12
~8 hours
Large
500,000
5,000,000
~$120
~3.5 days
Time estimate assumes: 3 concurrent Lambda invocations, 1 document per
invocation, ~2 seconds per document (fetch + chunk + embed + store).
Real throughput will vary based on document size and Voyage API latency.
Backfill Does NOT Require
Not Needed
Why
Re-extraction of text
Already on S3 from documentextractor
Re-indexing in ES
Existing ES entries are used as-is for BM25
Rails downtime
Backfill is background processing
Import re-processing
Documents don't go through the pipeline again
Schema migration on exhibits table
New tables only; exhibits table read-only
Cost Summary
Prototype (Single Case, 2 Weeks)
Item
Monthly Cost
Aurora PostgreSQL (db.t3.medium)
~$70
RDS Proxy
~$87
Lambda compute (minimal)
~$5
Voyage AI (one case, ~50K docs)
~$12 (one-time)
API Gateway
~$5
SQS
~$1
Total
~$170/mo + $12 one-time
Production (1000 Cases)
Item
Monthly Cost
OpenSearch (r6g.large x2, gp3)
$3,900-5,000
OR OpenSearch Serverless
$700-2,000
Lambda compute
~$200
Voyage AI (ongoing ingest)
Variable (~$0.12/M tokens)
Voyage AI (initial backfill of all cases)
~$5,000-10,000 (one-time)
API Gateway
~$50
SQS
~$20
CloudWatch
~$30
Total (Managed)
~$4,200-5,300/mo + backfill
Total (Serverless)
~$1,000-2,300/mo + backfill
Note: These costs are IN ADDITION to existing ES 7.4 cluster costs. If
OpenSearch replaces ES 7.4 long-term (consolidation), the net increase is lower.
Infrastructure Provisioning Sequence
Prototype (Days 1-2)
1. Create Secrets Manager entry for Voyage AI API key
2. Provision Aurora PostgreSQL instance (db.t3.medium) + RDS Proxy
3. CREATE EXTENSION vector; on PostgreSQL
4. Deploy CommonResourcesStack (CDK)
5. Deploy SearchIngestStack (CDK) — creates Lambdas, queues, SNS subscriptions
6. Deploy SearchApiStack (CDK) — creates API Gateway, search Lambda
7. Publish API URL to SSM Parameter Store
8. Run backfill on sample case
Production
1. Provision OpenSearch domain (or convert from pgvector)
2. Deploy CDK stacks to dev -> staging -> qa -> prod
3. Verify DLQ = 0 post-deploy (standard NGE deployment check)
4. Backfill cases in batches (throttled, off-hours for large cases)
5. Enable search UI toggle in Rails (T1+ phase)
Production Corpus Analysis and Cost Model
Base Numbers (End of 2025)
Metric
Value
Source
Total documents
870,000,000 (870M)
Production corpus analysis
Total pages
6,400,000,000 (6.4B)
Production corpus analysis
Average pages per document
~7.4
6.4B / 870M
Estimated chunks per document
~15
~2 chunks per page (512 tokens, overlap)
Total estimated chunks
~13 billion
870M x 15
Average tokens per chunk
~200
512 target, shorter after accounting for small docs
Total estimated tokens
~2.6 trillion
13B chunks x 200 tokens
Scope Filters
Not all 870M documents need embedding. These filters are cumulative:
Filter
Documents After Filter
Reduction
Rationale
Full corpus
870M
--
Starting point
NGE-enabled cases only
~87M
~90% out
Only ~10% of cases are NGE-enabled. Legacy cases are the vast majority.
Discovery suite only
~78M
~10% out
~90% of documents are Discovery (documents). ~10% are Litigation (depositions/transcripts). Litigation is T2+ (later phase).
Active cases only
~78M
Minimal
Counts already represent active cases. Closed/archived cases excluded.
Realistic backfill scope
~78M
~91% reduction
Active NGE Discovery cases
Key insight : The NGE filter is the big one. Only ~10% of cases are
NGE-enabled, which reduces the corpus from 870M to ~87M. Discovery-only
further trims to ~78M. This is the realistic Phase 1 scope.
Voyage AI Embedding Cost
Scope
Documents
Est. Tokens
Direct API ($0.12/M)
SageMaker ($0.22/M)
Prototype (1 case)
~50K
~150M
~$18
~$33
Pilot (10 cases)
~500K
~1.5B
~$180
~$330
Phase 1 (top 100 active NGE)
~10M
~30B
~$3,600
~$6,600
Phase 2 (all active NGE Discovery)
~78M
~234B
~$28,000
~$51,500
Full corpus (if ever needed)
~870M
~2.6T
~$312,000
~$572,000
Backfill Time Estimates
Per-document throughput : ~2 seconds (S3 fetch + chunk + embed + store).
By Lambda Concurrency
Scope
Documents
MaxConc=10
MaxConc=50
MaxConc=100
Prototype (1 case)
50K
~3 hours
~30 min
~15 min
Pilot (10 cases)
500K
~28 hours
~6 hours
~3 hours
Phase 1 (100 cases)
10M
~23 days
~5 days
~2.3 days
Phase 2 (all NGE Discovery)
78M
~180 days
~36 days
~18 days
At MaximumConcurrency=10, Phase 2 takes 180 days. Too slow.
By Voyage AI API Rate Limits
The real constraint is Voyage API throughput, not Lambda concurrency. At
enterprise tier (~10,000 requests/min, 128 texts/request):
Metric
Value
Max throughput
10,000 req/min x 128 texts/req = 1.28M chunks/min
At 15 chunks/doc
~85,000 documents/minute
Phase 1 (10M docs)
~2 hours
Phase 2 (78M docs)
~15 hours
With enterprise API rate limits, Phase 2 completes in under a day.
Lambda concurrency should match API throughput: ~50-100 concurrent invocations
during backfill.
With SageMaker Endpoints (In-VPC Alternative)
Each ml.g6.xlarge: 12.6M tokens/hr, ~$1.01/hr.
Instances
Throughput
Phase 2 Time
Instance Cost (duration)
5
63M tokens/hr
~155 days
~$3,685/mo
20
252M tokens/hr
~39 days
~$14,740/mo
50
630M tokens/hr
~15 days
~$36,850/mo
Direct API with enterprise rate limits is faster and cheaper for burst backfill.
SageMaker makes sense only if data-in-VPC is required.
Vector Storage Cost
At Phase 2 scope (78M documents, ~1.17B chunks):
Storage Component
Calculation
Size
Raw vector data
1.17B x 1024 dims x 4 bytes
~4.8 TB
HNSW overhead (~1.5x)
Graph structure
~7.2 TB
Total vector storage
~7.2 TB
Vector Store
Monthly Storage Cost
Notes
OpenSearch Managed (gp3)
~$864/mo
Part of cluster cost
OpenSearch Serverless
~$173/mo
$0.024/GB
pgvector (Aurora)
~$720-1,440/mo
$0.10-0.20/GB
At full corpus (870M docs, ~13B chunks): ~80 TB. Significant. This is
why phased rollout matters.
Chunk Text Storage (MySQL)
Scope
Chunks
Avg Chunk Text
Total Size
Notes
Phase 1 (10M docs)
~150M
~400 bytes
~60 GB
Spread across per-case DBs
Phase 2 (78M docs)
~1.17B
~400 bytes
~468 GB
Spread across per-case DBs
Full corpus
~13B
~400 bytes
~5.2 TB
Meaningful increase to Aurora storage
Stored in per-case databases (search_chunks table). Aurora storage is elastic
but this is a real increase to track.
Recommended Phased Rollout
Phase
Scope
Docs
Embedding Cost
Vector Storage
Timeline
Gate
Prototype
1 known case
~50K
~$18
~50 GB
1 day
Validate retrieval quality
Pilot
10 active NGE cases
~500K
~$180
~500 GB
1 day
Attorney feedback
Phase 1
Top 100 active NGE cases
~10M
~$3,600
~700 GB
2-3 days
Confirm production stability
Phase 2
All active NGE Discovery
~78M
~$28,000
~7.2 TB
1-2 weeks
Business case approved
On-demand
Remaining cases
Per-case
Per-case
Per-case
On search
Embed when attorney searches
Phase 1 is the sweet spot : covers the most-searched cases, costs $3,600,
done in 2-3 days. Proves value before committing $28K for Phase 2.
On-demand backfill is the right strategy for the long tail. Most of the
870M documents are in legacy cases that will rarely be searched again. Only
embed when an attorney actually runs a semantic search on that case.
Cost Breakdown: One-Time Backfill vs Ongoing Monthly
One-Time Backfill Costs (Embedding Existing Documents)
These costs are incurred ONCE when existing documents are embedded for the
first time. After backfill completes, these costs do not recur.
Phase
Documents
Voyage AI Embedding
Compute (Lambda)
Total One-Time
Prototype
50K
$18
~$2
~$20
Pilot (10 cases)
500K
$180
~$15
~$195
Phase 1 (100 cases)
10M
$3,600
~$300
~$3,900
Phase 2 (all NGE Discovery)
78M
$28,000
~$2,300
~$30,300
Lambda compute estimate: ~2 seconds per document, 1024 MB memory,
$0.0000166667/GB-second = ~$0.000034/document.
Ongoing Monthly Costs: New Document Ingest (Embedding)
New documents imported into NGE cases flow through the embedding pipeline
automatically (DOCUMENT_PROCESSED events). These are recurring costs.
Metric
Estimate
Notes
New documents per month (estimate)
~2-5M
Active NGE imports across all cases
Chunks per document
~15
Tokens per month
~6-15B
2-5M docs x 15 chunks x 200 tokens
Voyage AI cost/month
~$720-1,800
$0.12/M tokens
Lambda compute/month
~$70-170
Embedding Lambda invocations
Total ingest/month
~$790-1,970
Note: You would need to validate the 2-5M new docs/month estimate against
actual import volumes. This can be derived from PSM/Athena event counts.
Ongoing Monthly Costs: Search Queries
Metric
Estimate
Notes
Search queries per month (estimate)
~30K-100K
Assumes 100-300 searches/day across all users
Voyage AI query embedding cost
~$0.04-0.12/month
~50 tokens/query x queries x $0.12/M tokens. Negligible.
Lambda compute (search)
~$5-15
512 MB, ~200ms per search
API Gateway
~$10-35
$3.50 per million requests
Total search/month
~$15-50
Search is cheap. Ingest is the cost driver.
Query embedding cost is effectively zero — ~$0.12/month even at 100K
searches. The embedding model call adds ~50ms latency but negligible cost.
Ongoing Monthly Costs: Infrastructure (Always-On)
Component
Managed OpenSearch
Serverless OpenSearch
Notes
Vector store
$3,900-5,000
$700-2,000
Cluster or OCU cost
Vector storage (7.2 TB, Phase 2)
Included
~$173
gp3 or $0.024/GB
SQS (4 queues)
~$20
~$20
CloudWatch
~$30
~$30
Total infra/month
~$3,950-5,050
~$923-2,223
Total Monthly Cost Summary (Post-Phase 2, Steady State)
Cost Category
Managed OpenSearch
Serverless OpenSearch
Infrastructure (always-on)
$3,950-5,050
$923-2,223
New document embedding
$790-1,970
$790-1,970
Search queries
$15-50
$15-50
Total monthly
$4,755-7,070
$1,728-4,243
Plus one-time backfill:
- Phase 1: ~$3,900
- Phase 2: ~$30,300
Cost Per Search Query
Component
Cost Per Query
Voyage AI (query embedding)
$0.0000012
Lambda (search, 200ms)
$0.0000017
API Gateway
$0.0000035
Vector store (amortized)
~$0.04-0.17 (infra / queries)
Total per query
~$0.04-0.17
The per-query cost is dominated by infrastructure amortization. At higher
query volumes, the per-query cost drops significantly.
Cost Comparison: Semantic Search vs Status Quo
Current (keyword only)
With semantic search (Managed OS)
With semantic search (Serverless OS)
ES 7.4 cluster
$X/mo (existing)
$X/mo (unchanged)
$X/mo (unchanged)
Vector store
$0
+$3,950-5,050/mo
+$923-2,223/mo
Embedding (ongoing)
$0
+$790-1,970/mo
+$790-1,970/mo
Search
Included in ES
+$15-50/mo
+$15-50/mo
Net new monthly
$0
+$4,755-7,070
+$1,728-4,243
If OpenSearch eventually replaces ES 7.4 (consolidation), the existing ES
cluster cost offsets the OpenSearch vector store cost — potentially making
the net increase just the embedding cost (~$790-1,970/mo).
Decision Log
Decision
Choice
Alternative
ADR
Embedding model
Voyage AI voyage-law-2 via direct API
SageMaker endpoint, Bedrock Titan V2
adr/adr-vector-store-selection.md
Vector store (prototype)
Aurora PostgreSQL + pgvector
OpenSearch, FAISS
adr/adr-vector-store-selection.md
Vector store (production)
OpenSearch Managed or Serverless
pgvector at scale
adr/adr-vector-store-selection.md
BM25 source
Existing ES 7.4 indices (read-only)
Duplicate in OpenSearch
N/A — reuse existing
Hybrid fusion
RRF (k=60) in application code
OpenSearch native hybrid (if consolidated)
reference-implementations/documentsearch.md
Multi-tenancy
Index-per-case (vectors) + per-case DB (chunks)
Shared index with filtering
reference-implementations/documentsearch.md
Backfill trigger
API + on-first-search auto-trigger
Bulk migration script only
reference-implementations/documentsearch.md
Search determinism
Audit logging + exact mode
Accept non-determinism
reference-implementations/documentsearch.md
?