Semantic Search: Product-Architecture Alignment¶
Context¶
This document maps the AI feature roadmap (Rakesh's tier analysis, March 2026) to the architecture work already completed. The purpose is to show that the majority of proposed AI features are served by a single architectural investment: the documentsearch module.
The Key Insight¶
8 of 10 proposed AI features run on the same infrastructure: vector embeddings + hybrid search + LLM synthesis. They are not separate builds. They are different queries against the same search endpoint.
Feature-to-Architecture Mapping¶
Tier 1: Deployable in 4-8 Weeks¶
| Feature | Architecture Status | What It Actually Is |
|---|---|---|
| Natural language search | Fully designed | documentsearch module: POST /search with hybrid BM25 + vector + RRF. The core prototype. |
| Document set summarization / ECA | Partially exists | See detailed ECA section below. |
| Privilege and PII/PHI flagging | Covered by T1 | See detailed Privilege and PII/PHI section below. |
All three Tier 1 features are served by the documentsearch prototype + existing nextpoint-ai.
Privilege and PII/PHI Flagging — Deep Dive¶
Two Problems, Often Confused¶
There are TWO privilege problems in eDiscovery — the traditional one and a new one created by AI:
- Finding privileged documents in a production (our use case) — using AI to detect attorney-client privilege before inadvertent production
- Whether AI-generated documents ARE privileged (new legal question) — courts ruling on whether GenAI outputs are protected. Not our use case but creates market context.
The Privilege Detection Problem¶
Why keyword search fails: Traditional privilege review searches for "privilege," "attorney," "counsel," "legal advice." This produces: - Massive false positives: HR documents about "attorney fees," vendor contracts referencing "legal department," newsletters mentioning "legal" - Critical misses: Conversational privilege where nobody uses formal terms — "I'd suggest we get Sarah's input on the antitrust exposure before responding" (Sarah is in-house counsel)
Traditional privilege logging averages ~7 documents per hour. AI-assisted privilege logging reaches ~35 documents per hour — a 5x throughput increase. Reviewers QC AI output instead of drafting from scratch.
How documentsearch Addresses Privilege (T1)¶
Semantic search understands the CONCEPT of privilege:
POST /search
{ "query": "communications seeking or providing legal advice about the transaction",
"case_id": 123 }
POST /search
{ "query": "discussions about legal exposure or regulatory risk",
"case_id": 123 }
POST /search
{ "query": "messages where someone asks for in-house counsel's opinion before responding",
"case_id": 123 }
The embedding model (trained on legal text) recognizes that "let's get Sarah's take on the liability question before we respond" is semantically similar to "seeking legal advice" — zero keyword overlap with "privilege."
The Three-Layer Privilege Workflow¶
| Layer | What It Does | Architecture | Status |
|---|---|---|---|
| 1. Semantic search | Surface privilege candidates across full corpus | documentsearch T1: POST /search |
Designed |
| 2. AI classification | Auto-classify each candidate as likely/unlikely privilege | T2: LLM reads document, assesses privilege indicators | Not yet designed |
| 3. Attorney review | Final privilege determination — human judgment required | Existing Rails privilege workflow | Exists |
Layer 1 is documentsearch. Layer 2 is a T2 enhancement. Layer 3 exists. Defensibility: AI assists and surfaces candidates; attorneys make the final call. Courts have accepted this — the requirement is documentation, validation, and consistency.
Privilege Log Automation (T2/T3, Highest Manual-Hour Impact)¶
The full privilege workflow with automation:
1. FIND privilege candidates ← documentsearch T1 (semantic search)
2. CLASSIFY each candidate ← T2 agent (LLM reads doc, assesses)
3. REVIEW and confirm ← attorney (existing Rails workflow)
4. GENERATE privilege log ← T2/T3 agent (LLM formats into log)
Step 4 is one of the most hated manual tasks in discovery — junior associates spend days formatting privilege log entries. An AI-generated first draft that attorneys review and approve is a concrete, quantifiable hour-saver. This is the feature Rakesh identified as Tier 3.
All four steps depend on step 1 (semantic search) existing first.
The PII/PHI Detection Problem¶
| Regulation | Penalty for Failure |
|---|---|
| GDPR | 4% of global revenue |
| CCPA/CPRA | $7,500 per intentional violation |
| HIPAA | Up to $2.1M per violation category |
| Court sanctions | Case-specific |
Auditable PII/PHI redaction is a prerequisite for responsible eDiscovery.
What Must Be Detected¶
| Category | Examples | Detection Difficulty |
|---|---|---|
| Structured PII | SSNs (XXX-XX-XXXX), credit cards, phone numbers | Easy — regex works |
| Unstructured PII | Names + addresses in context, composite identifiers | Hard — requires NLP |
| PHI | Patient names, diagnoses, treatment histories, insurance | Hard — medical context |
| Composite PII | Job title + department + date = identifies one person | Hardest — regex misses entirely |
Traditional tools catch structured patterns. AI catches unstructured and composite PII/PHI that regex misses — "the patient discussed her treatment plan with Dr. Martinez on March 15" contains PHI but no regex pattern.
Two Complementary Approaches for PII/PHI¶
Approach 1: Semantic search for PII/PHI topics (documentsearch T1)
POST /search
{ "query": "documents containing personal health information",
"case_id": 123 }
POST /search
{ "query": "communications discussing employee medical conditions",
"case_id": 123 }
Finds documents that DISCUSS PII/PHI topics. Useful for identifying document types that need redaction review. But doesn't detect specific entities (the actual SSN, the actual name).
Approach 2: Entity detection (T2, not yet designed)
Entity-level detection requires NER (Named Entity Recognition) or regex+NLP hybrid:
| Method | What It Finds | Accuracy |
|---|---|---|
| Regex patterns | SSNs, credit cards, phone numbers | High (structured), zero (unstructured) |
| NER models | Person names, organizations, locations | 85-95% |
| LLM classification | Contextual PII/PHI | 90%+ but expensive per doc |
| Hybrid (regex + NER + LLM) | All categories | Best accuracy, highest cost |
Architecture options:
Option A: Pipeline step (flag everything at ingest)
documentextractor → DOCUMENT_PROCESSED
├── documentsearch (embeddings for search)
└── PII/PHI detector (NER + regex for entity flagging)
Option B: Post-search scan (flag only reviewed documents)
Attorney searches → results returned → PII/PHI scan on those results
Option A is better for production workflows (comprehensive). Option B is cheaper (scan only what's being reviewed). Both depend on documentsearch infrastructure.
Competitive Landscape¶
| Competitor | PII/PHI Approach |
|---|---|
| Nebula Legal | AI text analysis + pattern recognizers. Names, SSNs, health data, financial info. Integrated into ECA and review. |
| Redactable | 30+ PII categories, 90%+ accuracy, 98% processing time reduction. |
| VIDIZMO | 255+ formats, 40+ PII types including spoken PII in audio. |
| iCONECT | Predictive PII detection trained on case-specific data. Continuous monitoring. |
| Nextpoint | No automated PII/PHI detection yet. Manual review only. |
The Court Context: AI-Generated Documents and Privilege¶
Two competing Feb 2026 rulings create market awareness (not our use case, but context attorneys will ask about):
| Case | Ruling |
|---|---|
| US v. Heppner (S.D.N.Y.) | NOT privileged — docs generated with consumer Claude, sent to attorney. AI is not an attorney. |
| Warner v. Gilbarco (E.D. Mich.) | Work product PROTECTED — AI-assisted analysis done at counsel's direction was protected. |
Why this matters for Nextpoint: Clients need assurance that their AI interactions within the platform are defensible. Nextpoint's architecture — data stays in VPC or goes to SOC 2-compliant API, enterprise-grade data isolation — is the right answer. Consumer AI tools (ChatGPT free tier) are the risk; enterprise platforms are not.
Build Path¶
| Capability | Tier | Timeline | Depends On |
|---|---|---|---|
| Privilege candidate detection (semantic) | T1 | Prototype (2 weeks) | documentsearch |
| PII/PHI topic detection (semantic) | T1 | Prototype (2 weeks) | documentsearch |
| PII/PHI entity detection (NER + regex) | T2 | +4-6 weeks after T1 | Pipeline integration or post-search |
| Privilege classification (LLM) | T2 | +4-6 weeks after T1 | documentsearch + Bedrock |
| Privilege log generation | T2/T3 | +4-6 weeks after T2 privilege | Privilege classification |
T1 (documentsearch) is the foundation for all five capabilities. Semantic search finds the candidates; entity detection, classification, and log generation layer on top.
Automated Privilege Log Generation — Deep Dive¶
Why This Is One of the Most Hated Tasks in Discovery¶
Privilege logging is the process of documenting every withheld document with enough detail for opposing counsel to assess the privilege claim — without revealing the privileged content itself. Under FRCP 26(b)(5)(A), parties must "describe the nature" of withheld materials so the opposing party can evaluate the claim.
In practice: junior associates spend DAYS creating spreadsheet entries for thousands of documents. Each entry requires identifying the date, author, all recipients, the privilege type (attorney-client, work product, common interest, etc.), and a brief description of the privileged content — all without disclosing the actual privileged communication. At ~7 documents per hour with traditional methods, a 2,000-document privilege set takes ~285 hours of associate time.
Document review accounts for 80%+ of total litigation spend (~$42 billion per year per ABA). Privilege logging is the most manual, least enjoyable portion of that spend.
New FRCP Rules (December 1, 2025) Change the Landscape¶
The most significant privilege log reforms in decades took effect December 1, 2025. Amended Rules 26(f) and 16(b) now require:
- Parties must discuss privilege log procedures at the START of discovery (Rule 26(f) conference), not at the end
- The discovery plan must include "views and proposals" on HOW and WHEN privilege claims will be made, including log format
- Courts may adopt these terms in the scheduling order and enforce them
What this means: Teams must be prepared to negotiate privilege log format (categorical vs document-by-document, metadata fields vs narrative descriptions) at the outset — before review even begins. AI-generated logs become a strategic advantage in this negotiation: "We propose metadata- based categorical logging with AI-assisted descriptions, validated by attorney review."
Privilege Log Field Requirements¶
| Field | Required? | Source | AI-Generable? |
|---|---|---|---|
| Document date | Yes (minimum) | Document metadata (exhibits table) | Yes — auto-extract |
| Author | Yes (minimum) | Document metadata / email headers | Yes — auto-extract |
| Recipients (To, CC, BCC) | Yes (minimum) | Email headers | Yes — auto-extract |
| Privilege type asserted | Yes (minimum) | Attorney determination | Partially — AI suggests, attorney confirms |
| Brief description of content | Yes (minimum) | Requires reading the document | Yes — LLM generates without revealing privilege |
| Document type | Negotiable | File metadata | Yes — auto-extract |
| Bates number / unique ID | Negotiable | Production metadata | Yes — auto-extract |
| Purpose of communication | Negotiable | Context analysis | Yes — LLM infers from content |
| Subject line | Negotiable | Email headers | Yes — auto-extract |
6 of 9 fields are auto-extractable from metadata. 2 more are LLM- generable. Only the privilege type assertion requires attorney judgment (and even there, AI can suggest).
Three Types of Privilege Logs¶
| Type | Description | AI Impact |
|---|---|---|
| Traditional (document-by-document) | One entry per document. Most detailed, most burdensome. | AI generates draft entry for each doc. Attorney reviews and edits. 5x throughput. |
| Categorical | Groups documents by shared characteristics (e.g., all emails between counsel and client in date range). | AI identifies categories automatically from document clusters. Most efficient. |
| Metadata | Auto-generated from metadata fields. May include human-coded privilege type column. | Almost fully automated. AI adds privilege type suggestion. |
The Dec 2025 rule changes encourage early negotiation of log format. If parties agree to categorical or metadata logs, automation handles 90%+ of the work. If traditional logs are required, AI still provides 5x throughput improvement.
How It Builds on documentsearch¶
Step 1: FIND privilege candidates (documentsearch T1)
POST /search { query: "communications seeking legal advice" }
→ Returns 2,000 candidate documents ranked by relevance
|
Step 2: CLASSIFY each candidate (T2 agent)
For each candidate, LLM reads document and determines:
- Is this actually privileged? (yes/no/uncertain)
- What privilege type? (attorney-client / work product / common interest)
- Who are the attorneys involved?
- What is the privileged subject matter (without revealing content)?
→ Classified set: 1,400 confirmed, 400 uncertain, 200 not privileged
|
Step 3: ATTORNEY REVIEW (existing Rails workflow)
Attorney reviews the 400 uncertain documents
Spot-checks a sample of the 1,400 confirmed
Makes final privilege calls
|
Step 4: GENERATE PRIVILEGE LOG (T2/T3 agent)
For each confirmed-privilege document:
|
├── Auto-extract from metadata:
│ Date, Author, Recipients, Document type, Bates number, Subject
|
├── LLM generates (without revealing privilege):
│ Brief description: "Email from in-house counsel to VP Engineering
│ regarding legal analysis of product liability exposure"
│ Purpose: "Seeking legal advice on regulatory compliance"
|
├── Attorney-confirmed privilege type:
│ "Attorney-Client Privilege"
|
└── Format into court-compliant privilege log:
- Traditional (one row per document)
- Categorical (grouped by type/participants/date range)
- Metadata (auto-generated with privilege type column)
- Format negotiated per Rule 26(f) agreement
The LLM Description Challenge¶
The hardest part of privilege logging is writing the brief description — it must describe the document's nature without revealing the privileged content. This is exactly what LLMs are good at: summarizing at a high level while respecting constraints.
LLM Prompt (simplified):
"Read this document. It has been identified as attorney-client privileged.
Write a brief description suitable for a privilege log entry that:
1. Identifies the general subject matter
2. Identifies the type of legal advice sought or provided
3. Does NOT reveal the specific content of the legal advice
4. Does NOT quote from the document
5. Is 1-2 sentences maximum
Example format: 'Email from [role] to [role] regarding legal analysis
of [general topic].'"
LLM Output:
"Email from in-house counsel to VP Engineering providing legal advice
regarding potential product liability exposure and regulatory
compliance obligations."
This is the same "summarize existing text" pattern that nextpoint-ai already handles for transcripts — constrained summarization anchored in the source document. The prompt just has a different constraint (don't reveal privileged content instead of don't exceed length).
Name Normalization (Critical for Log Quality)¶
One practical requirement: privilege logs must normalize names consistently. "John Smith," "J. Smith," "jsmith@company.com," and "Smith, John" must all resolve to the same person. AI handles this via entity resolution against the exhibits table (which already has custodian metadata).
Competitive Context¶
| Competitor | Privilege Log Capability |
|---|---|
| eDiscovery AI | Full privilege logging: privilege call, privilege type, attorney hits, privilege elements, log entry generation. Customizable to court/client standards. |
| KLDiscovery | Integrated privilege log builder with automated features and name standardization. |
| Epiq AI | Auto-classify for privilege + update classification as matter evolves. Up to 90% reviewer reduction. |
| Nextpoint | Manual privilege logging only. No automated generation. |
Build Estimate¶
| Component | Effort | Depends On |
|---|---|---|
| Metadata auto-extraction for log fields | 1-2 weeks | Exhibits table (exists) |
| LLM description generation prompt | 1-2 weeks | Bedrock Claude (exists via nextpoint-ai) |
| Name normalization / entity resolution | 1-2 weeks | Exhibits table custodian data |
| Log formatting (traditional / categorical / metadata) | 1-2 weeks | Rails export capability |
| Attorney review workflow integration | 1-2 weeks | Rails privilege workflow (exists) |
| Total | 4-8 weeks after T2 privilege classification ships | T1 search + T2 classification |
The estimate is 4-8 weeks because it depends on privilege classification (T2) existing first — you can't generate a log for documents that haven't been classified. But the metadata extraction and name normalization can start in parallel with T2 classification development.
Document Set Summarization / ECA — Deep Dive¶
What ECA Is in the EDRM¶
Early Case Assessment is not a formal EDRM stage — it spans identification through review. It sits at the far left of the EDRM model where decisions have the highest cost leverage: $1 spent on ECA saves $10-100 on downstream review. The core question ECA answers: settle or litigate? And if litigate, what's the scope, who are the key players, and where are the risks?
In 2026, courts increasingly enforce proportionality — discovery costs must align with actual case value. AI-powered ECA provides the quantitative evidence needed to argue for limited scope within the first 48 hours of receiving a production.
What Competitors Ship Today¶
The market has converged on a specific AI-powered ECA workflow. This is no longer differentiation — it's table stakes:
| Competitor | ECA Capability |
|---|---|
| HaystackID / eDiscovery AI (Case Insight, Mar 2025) | GenAI case memo: key issues, critical documents, risk areas. Real-time classification + summarization. |
| Reveal | AI-driven ECA across full EDRM. Integrated TAR + GenAI summarization. |
| Epiq Discover | Factsheets: people/events/evidence relationships + AI text summarization. |
| DISCO | LLM-powered ECA with document classification and summarization. |
| OpenText | Automated document summarization for review prioritization and strategic decisions. |
| Nextpoint | Deposition transcript summarization only (nextpoint-ai). No document set summarization / ECA yet. |
This is the gap Rakesh identified. Competitors demo an AI-generated case briefing in the first 5 minutes of an eval. We demo transcript summarization — valuable but narrower.
What the AI-Generated Case Briefing Looks Like¶
When a firm receives a 47,000-document production, the AI produces this within hours (not weeks):
CASE BRIEFING — Matter #2023-CV-04517
Generated: 2026-04-01 | 47,231 documents | 12 custodians
OVERVIEW
Production spans Jan 2021 - Dec 2023.
Central topic: product liability, suspension system defect.
KEY CUSTODIANS (ranked by relevance)
1. VP Engineering (3,400 docs, 70% of safety-related communications)
2. Director QA (2,100 docs, primary escalation recipient)
3. Outside Counsel - Baker McKenzie (890 docs, privilege review needed)
CRITICAL TIMELINE
2022-03: First internal safety report filed
2022-09: Engineering review committee formed
2023-03: Communication spike (4x normal volume, 2,847 docs in 6 weeks)
2023-06: Recall announcement
KEY THEMES
- Safety test results and internal escalation (4,200 docs)
- Regulatory compliance discussions (1,800 docs)
- Cost-benefit analysis of recall vs continued production (340 docs)
RISK AREAS
- Privilege: 847 documents flagged for attorney-client review
- PHI: 124 documents contain employee health records (redaction needed)
- Gap: VP Engineering has 0 documents in March-April 2023 despite
being the primary decision-maker (potential spoliation issue)
RECOMMENDED REVIEW SCOPE
Priority set: 4,200 documents (9% of total)
Estimated review time: 3-5 days (vs 2-3 weeks for full linear review)
That's the demo that changes a buyer's face. A partner gets a case briefing the same day the production arrives, not 3 weeks later after junior associates finish first-pass review.
What Nextpoint Already Has¶
| Capability | Component | Status |
|---|---|---|
| Document text extraction | documentextractor | Shipped — text on S3 |
| Document metadata (author, date, subject, custodian) | documentloader → MySQL | Shipped |
| ES keyword indexing | documentloader → ES 7.4 | Shipped |
| Transcript summarization (narrative, chronological, TOC) | nextpoint-ai (Bedrock Claude) | Shipped |
| Semantic search (hot docs, privilege, custodian-scoped) | documentsearch | Designed, prototype next |
What ECA Requires Beyond T1¶
| ECA Capability | What T1 Search Provides | What's New (T2 Agent) |
|---|---|---|
| Hot document identification | POST /search ranked results |
Nothing — T1 does this |
| Key custodian identification | Search results grouped by custodian | Aggregation + ranking logic |
| Privilege/PHI flagging | Conceptual search queries | Nothing — T1 does this |
| Case briefing memo | Search results as raw material | LLM synthesis into structured memo |
| Timeline construction | Date-filtered search results | LLM extracts date + event pairs, sorts chronologically |
| Communication pattern analysis | Custodian-filtered results | Metadata aggregation (sender/recipient frequency) |
| Theme extraction | Semantic search by topic | LLM clusters results into themes |
| Recommended review scope | Relevance-ranked document set | Threshold-based scope recommendation |
T1 provides the retrieval layer. The T2 ECA agent orchestrates multiple searches and synthesizes results into the case briefing.
The ECA Agent Architecture¶
Attorney triggers ECA on a case (or auto-triggered on first import completion)
|
v
ECA Agent Lambda (orchestrator — extends nextpoint-ai pattern)
|
├── Run 5-10 semantic searches across key topics:
│ "safety concerns", "legal advice", "financial analysis",
│ "regulatory communications", "personnel issues"
│ → Each returns ranked documents with custodian + date metadata
|
├── Aggregate metadata from exhibits table:
│ Custodian document counts, date distribution,
│ sender-recipient pairs, document types
|
├── Run privilege + PHI detection searches:
│ "communications seeking legal advice"
│ "documents containing personal health information"
│ → Flag counts per custodian
|
├── Feed aggregated data to Bedrock Claude:
│ "Given these search results and metadata, generate a
│ structured case briefing covering: overview, key custodians,
│ timeline, themes, risk areas, and recommended review scope."
│ → LLM summarizes what it FOUND, not what it imagines
│ → Every statement traces to a specific document from search
|
└── Deliver case briefing:
- In-app view (Rails)
- Downloadable PDF
- Searchable — attorney can click any cited document
The Defensibility Argument¶
Nextpoint's own blog makes the key point: AI summarization is most reliable when it summarizes existing text, not generating from scratch. The ECA memo is anchored in actual documents returned by semantic search — every claim in the briefing traces back to specific exhibits. This is the difference between a defensible ECA tool and a hallucination risk.
The audit trail: search queries logged → search results logged → LLM prompt logged → generated memo logged → attorney review and approval. The full chain is traceable from memo statement to source document.
Build Path¶
| Phase | What Ships | Timeline | Depends On |
|---|---|---|---|
| T1 prototype | Hot docs, privilege, custodian search — manual ECA | 2 weeks | Nothing |
| T1 production | Same at scale, all NGE cases | +1 quarter | Prototype validates quality |
| T2 ECA agent | Automated case briefing memo | +4-6 weeks after T1 production | documentsearch T1 + nextpoint-ai |
The 4-6 week estimate for the T2 ECA agent is realistic because:
- nextpoint-ai already handles LLM orchestration (Bedrock Claude, chunking,
prompt assembly, token tracking, FIFO + Standard SQS pattern)
- documentsearch provides the retrieval (multiple POST /search calls)
- The new code is: aggregation logic, memo prompt template, and formatting
This is not a from-scratch build — it's an orchestration layer connecting two existing systems.
Tier 2: 8-16 Week Builds¶
| Feature | Architecture Status | What It Actually Is |
|---|---|---|
| AI-assisted first-pass review (TAR 2.0) | Covered by T1+ | See detailed TAR 2.0 section below. |
| Deposition prep assistant | Covered by T1+ | See detailed Deposition Prep section below. |
| Audio/video transcription + search | Not in scope | Requires speech-to-text (AWS Transcribe or Whisper) as a pre-processing step. Text output feeds into documentsearch's existing pipeline. The embedding/search infrastructure handles it — the gap is audio extraction, not search. |
Two of three Tier 2 features are covered by documentsearch + minor Rails integration.
AI-Assisted First-Pass Review (TAR 2.0) — Deep Dive¶
Why This Is the Highest-Impact GDR Feature¶
Document review is 80%+ of total litigation spend — $42 billion per year (ABA). It's also the task attorneys and paralegals do every single day on every single matter. If Nextpoint makes this faster, it becomes the platform firms can't leave.
The current review workflow: reviewer opens a document, reads it, codes it as responsive/non-responsive/privileged, moves to the next document. Repeat 50,000 times. The order is random or chronological — the reviewer might spend 3 days on irrelevant documents before reaching the critical ones.
The TAR Evolution¶
| Generation | How It Works | Improvement |
|---|---|---|
| Manual review | Reviewer reads every document in linear order | Baseline (~50-80 docs/hr) |
| TAR 1.0 (Predictive Coding) | Train on seed set, machine classifies the rest, reviewer validates sample | 40-60% volume reduction |
| TAR 2.0 (Continuous Active Learning / CAL) | Model learns from EVERY reviewer decision, continuously re-ranks remaining documents | Identifies 90% of relevant docs in the first 10% of review |
| TAR + GenAI (emerging, 2025-2026) | GenAI summarization + rationale on top of CAL ranking. Reviewer reads a summary paragraph instead of a full document. | Up to 90% reviewer reduction (Epiq claims) |
TAR 2.0 with CAL is the current industry standard. The key insight: the model trains on every code, one document at a time. As the reviewer codes documents, the model re-ranks all remaining documents so the most likely relevant ones are next in the queue. This front-loads relevant documents — reviewers stop finding new relevant documents after reviewing a fraction of the total set.
Proven results: 90% of responsive documents identified in the first 10% of review. 40-90% cost reduction depending on matter complexity.
What Competitors Offer¶
| Competitor | TAR/Review Capability |
|---|---|
| Relativity aiR | 250+ customers, 200M+ predictions. GenAI coding rationale + relevance detection. 50%+ increase in linear review speed. |
| Epiq AI | Auto-classify for relevance, PII, privilege. Update classification as matter evolves. TAR + CAL + GenAI combined. Up to 90% reviewer reduction. |
| Everlaw | EverlawAI Assistant: summarize, classify, explain. CAL for continuous learning. |
| DISCO | AI-powered review with continuous learning. Cloud-native. |
| Nextpoint | Linear review with keyword search. No TAR, no CAL, no AI-assisted review. |
This is a significant competitive gap. TAR 2.0/CAL has been industry-standard for several years. Adding it is not innovation — it's catching up. GenAI on top of CAL is where the 2026 differentiation lies.
How documentsearch Enables TAR 2.0¶
The path from semantic search to TAR 2.0 is shorter than building TAR from scratch. Here's why:
Traditional TAR/CAL trains a classifier from scratch on reviewer decisions. It has no prior knowledge of document relevance — the seed set and ongoing coding are the only inputs.
Semantic search + reviewer feedback starts with a MUCH better baseline. The embedding model already knows which documents are conceptually related. When a reviewer codes a document as responsive, the system knows which un-reviewed documents have similar embeddings — and can prioritize them immediately, without waiting for hundreds of training examples.
Traditional CAL:
Start with zero knowledge
→ Reviewer codes 100 docs (seed set)
→ Model learns initial relevance signal
→ Reviewer codes next batch
→ Model improves
→ After ~500-1000 coded docs, model is reasonably accurate
Semantic search + feedback (our approach):
Start with embedding model's understanding of document similarity
→ Reviewer codes FIRST document as responsive
→ System immediately finds 50 documents with similar embeddings
→ Those 50 are boosted in the review queue
→ Reviewer codes second document
→ Relevance signal compounds with embedding similarity
→ After ~50-100 coded docs, system is already highly accurate
The embedding model provides a warm start. Traditional CAL is cold — it knows nothing until the reviewer trains it. Our approach starts warm because the embeddings already capture semantic relationships between documents.
The Three-Layer Review Architecture¶
| Layer | What | Tier | Status |
|---|---|---|---|
| 1. Semantic relevance ranking | POST /search { query: "[RFP request language]" } returns documents ranked by relevance to the production request |
T1 | Designed |
| 2. Review queue integration | Rails review queue sorts by documentsearch relevance score instead of chronological/random | T1+ | Designed (Rails change) |
| 3. Active learning feedback loop | Each reviewer coding decision updates relevance scores for remaining un-reviewed documents in near-real-time | T2+ | Architectural sketch (see Stickiness Moat section above) |
Layer 1 is documentsearch. Layer 2 is a Rails UI change. Layer 3 is the active learning enhancement that creates the stickiness moat.
Even without Layer 3, Layers 1+2 are a massive improvement. Sorting the review queue by semantic relevance (instead of random) means reviewers see the most-likely-responsive documents first. The "back half" of the queue — documents that would never be reached in a time-boxed review — is now prioritized by relevance rather than luck.
The Workflow (Layers 1+2, Shipping with T1+)¶
Paralegal receives RFP Request #3:
"All documents relating to the design and testing of the XYZ component"
|
v
POST /search { query: "documents relating to the design and testing
of the XYZ component", case_id: 123, limit: 5000 }
|
v
documentsearch returns 5,000 documents ranked by hybrid relevance score
|
v
Rails review queue populated in relevance order (highest score first)
|
v
Reviewer opens queue:
Document #1: Relevance score 0.94 — engineering test report for XYZ
Document #2: Relevance score 0.91 — email about XYZ design review
Document #3: Relevance score 0.89 — meeting notes on XYZ component
...
Document #4,800: Relevance score 0.12 — unrelated HR document
Document #5,000: Relevance score 0.08 — vendor invoice
|
v
Reviewer codes top documents rapidly (high hit rate)
Hit rate declines as relevance scores decrease
Reviewer/attorney makes informed decision to stop at a threshold
The hit rate curve is the key metric. With random ordering, the hit rate is roughly constant (e.g., 15% relevance across the set). With relevance-ranked ordering, the hit rate starts at 80-90% at the top and declines. Reviewers code 10x faster in the high-relevance zone because they're not wading through irrelevant documents.
The Active Learning Upgrade (Layer 3, Future)¶
See the "Stickiness Moat" section above for the full architectural sketch. The key addition: each reviewer coding decision feeds back into the ranking model, boosting similar un-reviewed documents. The review queue re-sorts in near-real-time. This is the feature that creates per-firm lock-in.
Build Path¶
| Component | Effort | Depends On |
|---|---|---|
| Semantic relevance ranking for RFP requests | 0 (T1 search already does this) | documentsearch T1 |
| Review queue sort-by-relevance (Rails) | 1-2 weeks | Rails review workflow + documentsearch API |
| Hit rate analytics (relevance score vs coding decision) | 1-2 weeks | Rails review coding data |
| Active learning feedback loop | 4-8 weeks | documentsearch embeddings + review coding data |
| Total for Layers 1+2 | 1-2 weeks after T1 | documentsearch T1 |
| Total for Layer 3 (active learning) | +4-8 weeks after Layers 1+2 | Layer 2 + feedback pipeline |
Layers 1+2 ship with T1+ (weeks of Rails work). Layer 3 is a separate build but depends on the same embedding infrastructure.
Deposition Prep Assistant — Deep Dive¶
What Attorneys Do Today (Manual)¶
Deposition prep is one of the most labor-intensive tasks in litigation. For each witness, an attorney must:
- Identify relevant documents: Search the production for every document connected to this witness and the topics they'll be deposed on
- Organize by topic: Group documents into deposition topics (what the witness knew about X, when they knew about Y, who they communicated with about Z)
- Build a chronology: Arrange key documents in date order to construct the timeline of the witness's knowledge and actions
- Flag contradictions: Identify documents that contradict or support what the witness is expected to say
- Create the depo binder: Compile the final set of documents for the deposition, organized for quick reference during questioning
This takes a senior associate 10-20 hours per witness per topic. For a case with 8 witnesses and 5 topics each, that's 400-800 hours of prep.
What Competitors Offer¶
| Competitor | Depo Prep Capability |
|---|---|
| Epiq Assist | Prepares deposition memos and questions up to 5x faster. Identifies key topics, communications, events, and documents. |
| Epiq Narrate | AI Transcript Analysis extracts facts, people, events. Cross-references against case documents. Surfaces contradictions. Exports chronologies with hyperlinks to evidence. |
| DISCO | AI-powered chronology builder. Links documents to timeline events. |
| U.S. Legal Support (DepoSummary Pro) | Chronological case timeline from testimony. Exhibit cross-referencing. Deponent insights. |
| Everlaw | Batch summarize up to 1,000 docs. Create chronology timelines. Investigative outlines. |
| Nextpoint | Transcript summarization (nextpoint-ai). Manual document search and folder assembly. |
Nextpoint's gap: transcript summarization is strong, but document gathering, chronology building, and contradiction detection are manual.
How Nextpoint's Existing Systems Combine for Depo Prep¶
The depo prep assistant doesn't require a new system — it orchestrates three systems that already exist (or are designed):
┌────────────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ documentsearch │ │ nextpoint-ai │ │ Rails │
│ (T1) │ │ (shipped) │ │ (existing) │
│ │ │ │ │ │
│ Semantic search │ │ Transcript │ │ Folders/tags │
│ Custodian filter │ │ summarization │ │ Document viewer │
│ Date filter │ │ Chronology │ │ Export │
│ Relevance ranking │ │ Narrative │ │ Review workflow │
└────────┬───────────┘ └────────┬───────────┘ └────────┬─────────┘
│ │ │
└──────────────┬───────────┘───────────────────────────┘
│
v
┌─────────────────────┐
│ Depo Prep │
│ Orchestration │
│ (new Rails + │
│ T2 agent) │
└─────────────────────┘
The Depo Prep Workflow (Automated)¶
Attorney specifies:
Witness: "Jeff Skilling"
Topics: ["Special Purpose Entities", "Board communications", "Regulatory response"]
Date range: Jan 2001 - Dec 2001
|
v
Step 1: DOCUMENT GATHERING (documentsearch T1)
For each topic, run:
POST /search { query: "Jeff Skilling discussions about Special Purpose Entities",
filters: { custodians: ["jeff.skilling@enron.com"],
date_range: { start: "2001-01-01", end: "2001-12-31" } } }
→ 3 topics × top 50 results each = ~100-150 unique documents
|
v
Step 2: ORGANIZE BY TOPIC (T2 agent)
LLM reads each document's snippet and assigns to topics:
- SPE-related: 67 documents
- Board communications: 43 documents
- Regulatory: 28 documents
- Cross-topic (multiple): 18 documents
|
v
Step 3: BUILD CHRONOLOGY (T2 agent + nextpoint-ai pattern)
Extract date + event pairs from each document:
2001-01-15: Skilling receives first SPE performance report (Exhibit #42)
2001-02-28: Board presentation mentions Raptor structure (Exhibit #67)
2001-03-14: Fastow proposes equity infusion for Raptor (Exhibit #89)
2001-03-15: Skilling replies "stay the course" (Exhibit #91)
...
→ Chronological timeline with exhibit links
|
v
Step 4: FLAG CONTRADICTIONS (T2 agent)
If prior deposition transcript exists:
Compare deposition claims against documents:
"Skilling testified he was unaware of SPE risks until Q3 2001.
However, Exhibit #42 (Jan 15, 2001) shows he received the
SPE performance report 6 months earlier."
|
v
Step 5: GENERATE DEPO BINDER (Rails)
Create folder: "Skilling Depo Prep — SPEs / Board / Regulatory"
Sub-folders by topic
Documents sorted by chronology within each topic
Cover sheet with:
- Chronological timeline
- Key documents highlighted
- Contradictions flagged
- Suggested question areas
|
v
Step 6: EXPORT
Depo binder available in Nextpoint viewer
Exportable as PDF with hyperlinks to exhibits
Shareable with co-counsel
The Time Savings¶
| Task | Manual (Today) | With Depo Prep Assistant |
|---|---|---|
| Document gathering (per witness/topic) | 3-5 hours | ~5 minutes (semantic search) |
| Organization by topic | 2-3 hours | ~2 minutes (LLM classification) |
| Chronology building | 3-5 hours | ~5 minutes (date extraction + sort) |
| Contradiction detection | 2-4 hours (if prior depo exists) | ~10 minutes (cross-reference) |
| Binder assembly | 1-2 hours | Automatic |
| Total per witness per topic | 10-20 hours | ~30 minutes + attorney review |
At $350/hour associate rate, saving 15 hours per witness = $5,250 per deposition. A case with 8 witnesses = $42,000 in associate time saved on a single matter.
Why This Is a GDR Feature¶
Deposition prep happens on every litigation matter. An attorney who builds depo binders in Nextpoint in 30 minutes (vs 15 hours manually) has a daily workflow inside the product. That's the stickiness: the attorney's deposition process lives in Nextpoint.
The combination matters more than any single feature. Semantic search alone is useful. Transcript summarization alone is useful. Depo prep combines them into a workflow that is greater than the sum of parts: find the documents (search), summarize the transcript (nextpoint-ai), build the binder (Rails), flag the contradictions (T2 agent). No competitor integrates all four in one platform for mid-market firms.
Build Path¶
| Component | Effort | Depends On |
|---|---|---|
| Custodian + topic + date search (T1) | 0 (documentsearch already does this) | documentsearch T1 |
| "Save search results to depo folder" (Rails) | 1-2 weeks | Rails folders + documentsearch API |
| Topic classification (LLM assigns docs to topics) | 2-3 weeks | Bedrock Claude |
| Chronology extraction (date + event pairs) | 2-3 weeks | Bedrock Claude + document metadata |
| Contradiction detection (docs vs prior testimony) | 3-4 weeks | Transcript text + document search results |
| Depo binder generation (folder + cover sheet) | 1-2 weeks | Rails export |
| Total (basic: search + folder) | 1-2 weeks (T1+) | documentsearch T1 |
| Total (full assistant with chronology + contradictions) | 8-12 weeks (T2) | T1 + nextpoint-ai + Bedrock |
The basic version (search + save to folder) ships with T1+ in weeks. The full assistant (chronology, contradictions, suggested questions) is a T2 build. Both are GDR features — even the basic version transforms the depo prep workflow.
Tier 3: 16-30 Week Builds¶
| Feature | Architecture Status | What It Actually Is |
|---|---|---|
| Chat with your case data (RAG) | This IS documentsearch + T2 agents | See detailed conversational RAG section below. |
| Automated privilege log generation | Detailed above | See Privilege and PII/PHI section. T1 search → T2 classification → T2/T3 log generation. 4-8 weeks after T2 classification. |
The "Tier 3" RAG capability IS the documentsearch module. It's not a 16-30 week build from scratch — it's a 2-week prototype + 1 quarter to production for the search infrastructure, plus agent service for the conversational layer.
Chat with Your Case Data (Conversational RAG) — Deep Dive¶
What It Is¶
An attorney asks open-ended questions of their entire document set in natural language and gets answers with citations to specific documents:
Attorney: "What did the CFO know about the accounting irregularities
and when did he first learn about them?"
System: "Based on the documents in this case:
The CFO first received information about the accounting
irregularities on March 12, 2022, when the Controller
forwarded an internal audit report (Exhibit #234, p.3).
Between March and June 2022, the CFO participated in 14
email threads discussing the irregularities (Exhibits #234,
#267, #301, #345, #378, #402, #445, #467, #489, #512,
#534, #567, #589, #612).
On June 3, 2022, the CFO directed the Controller to
'hold off on flagging this to the audit committee until
we have a complete picture' (Exhibit #445, p.2).
The CFO's first communication with outside counsel about
the issue was July 15, 2022 (Exhibit #612)."
This is the feature Harvey AI built at the top of the market ($11B valuation, March 2026). No mid-market eDiscovery platform does it well yet.
What Competitors Offer¶
| Competitor | Capability | Limitation |
|---|---|---|
| Harvey AI | Vault: upload docs, ask questions via RAG. Custom models with OpenAI on all US case law. | 10,000 document cap per Vault. $11B valuation reflects legal AI focus, not eDiscovery. Transactional focus (M&A, due diligence), not litigation-first. |
| Everlaw Project Query | Conversational search across terabytes of eDiscovery. Facts + references in seconds. Reasoning models for nuanced conclusions. | Closed beta (announced Legalweek 2025, GA expected 2025). Refinement process compensates for legal's complex corpus. |
| Lexis+ AI | Conversational search over case law + Shepard's validation. Highest accuracy in Stanford testing (65%). | Case law research, not case-specific document sets. Different use case. |
| Relativity aiR | AI-powered coding rationale and relevance detection at scale. 250+ customers, 200M+ predictions. | Classification-focused, not open-ended Q&A over documents. |
| Nextpoint | Natural language search (documentsearch, designed). No conversational Q&A yet. | Gap. |
Why This Is Different from Search¶
Search returns a ranked list of documents. Conversational RAG returns an answer with citations. The attorney doesn't review 20 documents — they read a synthesized response that tells them what happened, when, and points to the evidence.
| Search (documentsearch T1) | Conversational RAG (T2) | |
|---|---|---|
| Input | Query string | Natural language question |
| Output | Ranked document list with snippets | Synthesized answer with document citations |
| Attorney effort | Review 20 documents to extract the answer | Read the answer, verify 2-3 key citations |
| Latency | ~170ms | ~5-30 seconds (LLM synthesis) |
| Cost per query | ~$0.000001 (embedding only) | ~$0.01-0.10 (LLM invocation) |
| Hallucination risk | None (returns real documents) | Present (LLM may misinterpret or fabricate connections) |
The hallucination risk is the critical difference. Search returns actual documents — no hallucination possible. Conversational RAG has the LLM synthesize an answer, which can misstate facts or fabricate connections between documents. Stanford found legal AI tools hallucinate in 1 out of 6+ queries.
How to Mitigate Hallucination in Legal RAG¶
The architecture must make hallucination detectable and verifiable:
-
Every claim must cite a specific document. The LLM is prompted to never make a statement without citing an exhibit number and page. Uncited claims are flagged as "unverified."
-
Citations must be clickable. The attorney can click any citation and see the actual document passage. If the passage doesn't support the claim, the attorney knows immediately.
-
Confidence indicators. The system distinguishes between:
- "Found in document" (direct quote or close paraphrase)
- "Inferred from documents" (synthesized across multiple sources)
-
"Not found in documents" (question cannot be answered from this corpus)
-
Retrieval transparency. Show which documents the LLM was given as context. If the answer is wrong, the attorney can see whether the relevant document was in the context or missed by search.
-
ABA Formal Opinion 512 compliance. Mandates human-in-the-loop review. The system presents answers for attorney verification, not as final determinations.
Architecture: How It Builds on documentsearch¶
Attorney asks: "What did the CFO know about the irregularities?"
|
v
Conversational RAG Agent (T2)
|
├── Step 1: Query decomposition (LLM)
│ Break complex question into search queries:
│ - "CFO communications about accounting irregularities"
│ - "CFO first awareness of audit findings"
│ - "CFO direction to delay disclosure"
│
├── Step 2: Multiple semantic searches (documentsearch T1)
│ POST /search for each sub-query
│ Collect top 10-20 results per sub-query
│ Deduplicate across sub-queries
│ → 30-50 unique documents
│
├── Step 3: Passage retrieval
│ Fetch chunk text for each result from MySQL
│ (search_chunks table — already stored by embedding pipeline)
│ → 100-200 relevant passages
│
├── Step 4: LLM synthesis (Bedrock Claude)
│ Prompt: "Based on these document passages, answer the
│ attorney's question. Cite specific exhibit numbers and
│ pages for every factual claim. If a claim cannot be
│ supported by the provided passages, say so explicitly."
│ → Synthesized answer with citations
│
├── Step 5: Citation verification
│ For each citation in the answer, verify that the cited
│ passage actually supports the claim (automated check)
│ Flag unverifiable citations
│
└── Step 6: Response with transparency
- Synthesized answer
- Cited documents (clickable links to exhibits)
- Confidence level per claim
- List of all documents in context (transparency)
- "This is an AI-generated analysis. Attorney review required."
This is NOT a simple "send all docs to the LLM" approach. You can't feed 500K documents into a context window. The architecture is: 1. documentsearch narrows 500K docs to 30-50 relevant ones (semantic search) 2. Chunk text provides the specific passages (already stored in MySQL) 3. LLM synthesizes from those passages only (bounded context) 4. Citations link back to source documents (verifiable)
The Context Window Challenge¶
Even with 1M-token context windows (Claude), you can't feed an entire case:
| Case Size | Documents | Estimated Text | Fits in Context? |
|---|---|---|---|
| Small (1K docs) | 1,000 | ~50M tokens | No (50x too large) |
| Medium (50K docs) | 50,000 | ~2.5B tokens | No (2,500x too large) |
| Large (500K docs) | 500,000 | ~25B tokens | No (25,000x too large) |
This is why search is the prerequisite. You MUST narrow the corpus before the LLM can synthesize. documentsearch reduces 500K documents to 30-50 relevant passages that fit in the LLM context. Without this retrieval step, conversational RAG is impossible at eDiscovery scale.
Conversational Features (Multi-Turn)¶
Unlike single-query search, conversational RAG supports follow-up:
Attorney: "What did the CFO know about the irregularities?"
System: [answer with citations]
Attorney: "When did he first communicate with outside counsel about this?"
System: [follow-up answer — knows "he" = CFO, "this" = irregularities
from conversation context]
Attorney: "Are there any documents that contradict his deposition testimony
that he wasn't aware until July?"
System: [searches for contradicting evidence, references the deposition
claim, surfaces documents showing earlier awareness]
This multi-turn capability requires: - Conversation memory: Track what's been discussed, who "he" refers to - Query refinement: Each follow-up is a new search informed by prior context - Cross-reference: Connect new search results to previous answers
This is the T2 agent pattern — an orchestrator that maintains conversation state and issues multiple searches per turn.
Why This Is Harvey's Moat (and How Nextpoint Competes)¶
Harvey's $11B valuation is built on conversational legal AI. But Harvey serves general legal knowledge (case law, statutes) — not case-specific document sets. The attorney uploads documents to Harvey's Vault (10K doc cap) and asks questions.
Nextpoint's advantage: The documents are ALREADY in the platform. There is no upload step. An attorney working a case in Nextpoint can switch from document review to conversational Q&A without leaving the platform or re-uploading anything. The embedding pipeline processes documents as they're imported — by the time the attorney is ready to ask questions, the infrastructure is ready.
Harvey competes on general legal knowledge. Nextpoint competes on case-specific intelligence on the firm's own documents. These are different markets with different moats.
Build Path¶
| Component | Effort | Depends On |
|---|---|---|
| Query decomposition (break question into sub-queries) | 1-2 weeks | Bedrock Claude (exists) |
| Multi-query search orchestration | 1-2 weeks | documentsearch T1 |
| Passage retrieval and assembly | 1 week | search_chunks table (exists via T1) |
| LLM synthesis with citation prompting | 2-3 weeks | Bedrock Claude + prompt engineering |
| Citation verification | 1-2 weeks | Passage-to-claim matching |
| Multi-turn conversation memory | 2-3 weeks | Session state management |
| Rails UI (chat interface) | 2-3 weeks | Rails frontend |
| Total | 10-16 weeks after T1 production | documentsearch T1 + Bedrock |
This is Rakesh's "Tier 3, 16-30 weeks." Our estimate: 10-16 weeks after T1 production ships. The difference: we're not building RAG from scratch — documentsearch provides the retrieval layer, search_chunks provides the passage store, and Bedrock provides the LLM. The new code is orchestration, prompting, citation verification, and UI.
Hallucination Safeguards for Legal¶
| Safeguard | Implementation |
|---|---|
| Mandatory citations | Prompt constraint: every factual claim must cite exhibit + page |
| Clickable verification | UI links each citation to the actual document passage |
| Confidence levels | "Found in document" vs "Inferred" vs "Not found" |
| Context transparency | Show attorney which documents were in the LLM context |
| Attorney verification | All answers marked as AI-generated, requiring review |
| ABA Opinion 512 | Human-in-the-loop; system assists, attorney decides |
| Audit logging | Full prompt, context, and answer logged for defensibility |
Revised Timeline (Architecture-Informed)¶
The original tier estimates assume building each feature independently. Since they share infrastructure, the actual timeline is compressed:
| What | Timeline | Investment | Features Enabled |
|---|---|---|---|
| Prototype (1 case, validate quality) | 2 weeks | ~$23 | Natural language search demo, privilege review demo, PII/PHI flagging demo |
| Pilot (10 cases, attorney feedback) | +1 week | ~$227 | Same features on real cases with real attorneys |
| Production T1 (multi-tenant, backfill) | +1 quarter | $3,900 backfill + $1-2.3K/mo | Natural language search, privilege flagging, PII/PHI detection, responsive review, redaction ID, clawback, depo prep (basic), settlement prep |
| T1+ Rails integration | +2-3 weeks | Rails eng time | Review queue sort, save-to-folder, depo binder workflow |
| Document summarization | +2-4 weeks | Marginal (nextpoint-ai exists) | Early case assessment, document set briefing |
| T2 Agent service | +1 quarter | Bedrock costs | Gap analysis, pattern ID, "chat with case data", privilege log generation |
Total time from start to "chat with your case data": ~6-7 months. Not 16-30 weeks per feature — 6-7 months for ALL features on shared infrastructure.
Demo Impact (What Changes in Sales)¶
The 5-Minute Demo Script¶
This is the sequence that changes a buyer's perception:
Minute 1: Run a keyword search for "Special Purpose Entities." Both NXP and competitors return similar results. Attorney is unimpressed.
Minute 2: Run a natural language search: "internal discussions about Special Purpose Entities." Semantic search surfaces documents mentioning "Raptor," "JEDI," and "stay the course" — documents keyword search missed entirely. Attorney leans forward.
Minute 3: Filter by custodian: "show me only what Jeff Skilling discussed." Instant results scoped to one person. This is depo prep in one query.
Minute 4: Run a privilege query: "communications seeking or providing legal advice about the transaction." No keyword overlap with "privilege" — but the right documents surface. Privilege review without false positives.
Minute 5: Show the highlighted passage explaining WHY each document ranked. Transparency that no competitor's "AI search" provides.
That sequence demos 4 features in 5 minutes. All from one endpoint.
The GDR Story¶
Features that drive daily usage and prevent churn:
| Feature | Daily Workflow? | Why It Retains |
|---|---|---|
| Natural language search | Yes — every search session | Attorneys stop switching to other tools for conceptual queries |
| Responsive review (ranked) | Yes — every review session | 50%+ time savings on review, compounding with every matter |
| Privilege flagging | Yes — every production | Catches privilege documents keyword search misses |
| Depo prep | Per-deposition | Attorneys build depo binders in minutes, not hours |
| Document summarization / ECA | Per-matter | First hours after receiving production become productive. Partner gets case briefing same day, not week 3. |
An attorney saving 3+ hours per week on search and review does not cancel at renewal. These are not "nice to have" features — they become the daily workflow.
The Feature Customers Are Already Asking For: Interrogating Productions¶
We continually hear praise for one specific capability: using AI to interrogate productions, identify gaps, and generate summaries. This is not a hypothetical use case — it's the feature attorneys describe when they talk about what AI should do for them.
This maps directly to our T2 gap analysis architecture:
Attorney: "Show me what the VP of Engineering discussed about the safety
defect between March 1 and April 15."
System: Runs semantic search scoped to VP's documents in that window.
Result: 0 documents.
Attorney: "Now show me what every other senior engineer discussed."
System: Runs same query across 5 other custodians.
Result: 18, 23, 15, 21, 8 documents respectively.
Attorney: "The VP has zero documents on this topic during a period when
every peer was actively discussing it. That's my deposition."
That's gap analysis — and it's what attorneys are asking for by name. The T2 agent automates this (run the same query across all custodians, compare result counts, surface anomalies). But even at T1, an attorney can do it manually with custodian-filtered searches. T1 makes it possible. T2 makes it automatic.
The summaries complement gap analysis: once the relevant documents are found, nextpoint-ai summarizes them into a chronology or briefing. The combination — find the gaps, surface the evidence, summarize what it means — is the workflow attorneys describe as transformative.
This is the strongest argument for prioritizing documentsearch: the feature customers praise most requires the search infrastructure as its foundation. Gap analysis without semantic search is manual keyword iteration. Gap analysis with semantic search is the absence-as-evidence capability that no competitor offers.
The Stickiness Moat: Active Learning from Reviewer Decisions¶
Beyond search, the highest-impact GDR feature is continuous active learning from reviewer decisions. The concept: the model learns from each reviewer's responsive/non-responsive coding and reclassifies remaining documents in real time. The longer a firm uses the platform, the smarter it gets on their matter types and their attorneys' review patterns.
This is the feature that converts Nextpoint from "a tool attorneys use" to "a system the firm depends on." It creates a stickiness loop:
Reviewer codes document as responsive
→ Model updates relevance weights for similar documents
→ Remaining documents re-ranked by predicted responsiveness
→ Reviewer sees more-likely-responsive documents next
→ Model gets better with each decision
→ Switching to a competitor means starting from zero
How this builds on documentsearch:
| Layer | What It Provides | Status |
|---|---|---|
| T1: Semantic search | Initial relevance ranking via embeddings + BM25 | Designed |
| T1+: Review queue integration | Ranked review order in Rails | Designed |
| Active learning (new) | Feedback loop from reviewer decisions to re-ranking | Not yet architected |
Active learning requires: 1. documentsearch T1 as the base relevance signal (vector similarity + BM25) 2. Review coding data from Rails (responsive/non-responsive decisions per document) 3. A re-ranking model that combines base relevance with reviewer feedback 4. Real-time re-scoring of un-reviewed documents as new decisions come in
The architecture for this is NOT in the current documentsearch design — it's a layer on top. But it depends entirely on the embedding infrastructure. Without vector representations of documents, there's no notion of "similar to the documents the reviewer marked responsive." The embeddings are the prerequisite.
Architectural sketch (not fully designed):
T1: documentsearch provides base relevance scores (vector + BM25)
↓
T1+: Review queue sorted by base relevance
↓
Active learning layer (future):
Reviewer codes document as responsive
→ System identifies the document's vector embedding
→ Finds un-reviewed documents with similar embeddings
→ Boosts their predicted responsiveness score
→ Review queue re-sorts in near-real-time
→ Reviewer sees the next-most-likely-responsive document
This is a T2+ capability that should be designed after T1 proves retrieval quality. It has the highest GDR impact because it creates per-firm, per-matter lock-in — a competitor would need to rebuild the learned relevance model from scratch. But it requires the embedding infrastructure first.
Key decision for later: Should active learning use the same vector embeddings as search, or fine-tune a separate model per matter? Using the same embeddings is simpler (find similar documents by vector proximity). Fine-tuning per matter is more accurate but more expensive. This decision should be made after T1 production data reveals how well base embeddings correlate with reviewer decisions.
Cost Summary (For Leadership)¶
One-Time Investment¶
| Phase | Documents | Cost | Timeline |
|---|---|---|---|
| Prototype (both embedding models tested) | 50K | $23 | 2 weeks |
| Pilot (10 cases) | 500K | $227 | 1 day |
| Phase 1 (100 active cases) | 10M | $3,900 | 2-3 days |
| Phase 2 (all NGE Discovery) | 78M | $28,000-$30,300 | 1-2 weeks |
Ongoing Monthly¶
| Managed OpenSearch | Serverless OpenSearch | |
|---|---|---|
| Infrastructure | $4,000-5,100 | $973-2,273 |
| New document embedding | $860-2,140 | $860-2,140 |
| Search queries | $15-50 | $15-50 |
| Total monthly | $4,875-7,290 | $1,848-4,463 |
Why the Prototype Must Validate the Full Production Path¶
A common trap with AI prototypes: build something impressive on a laptop with a curated dataset, get stakeholder buy-in, then spend the next quarter discovering it doesn't fit the existing infrastructure, can't handle the real data, and costs 5-10x what leadership approved. The prototype worked; the product doesn't.
We avoid this by designing the prototype to validate FOUR things, not just retrieval quality:
1. Retrieval quality — Does semantic search produce the "wow" moment? Does it find documents keyword search misses?
2. Production cost model — Which embedding model can we afford at 78M documents? The prototype tests both Voyage AI voyage-law-2 ($0.12/M, legal-tuned) and Bedrock Titan V2 ($0.02/M, general-purpose) on the same case for $23. That $3 for Titan V2 answers a $100K question before committing a quarter of build work.
| Model | Prototype Cost | If Chosen: Year 1 |
|---|---|---|
| Voyage AI voyage-law-2 | $20 | $56K-122K |
| Amazon Bedrock Titan V2 | $3 | $15K-32K |
3. Infrastructure fit — Does it plug into what we already have?
The prototype is NOT a standalone app. It's built on existing Nextpoint infrastructure from day 1:
| Concern | How the Prototype Validates It |
|---|---|
| Pipeline integration | Subscribes to the real DOCUMENT_PROCESSED SNS events (same events documentloader consumes) |
| Existing data | Uses extracted text already on S3 from documentextractor — no re-extraction |
| Existing search | BM25 leg queries real ES 7.4 per-case aliases — validates hybrid search against real keyword infrastructure |
| Per-case isolation | Uses real per-case MySQL database for chunk storage — validates multi-tenant pattern |
| Backfill path | Embeds a real case with existing documents — validates the backfill pipeline on actual data |
| SQS batching | Uses batchSize: 10 with maxBatchingWindow: 60s — same pattern as documentloader |
If ANY of these don't work on the prototype case, we find out in week 1, not month 4.
4. Data reality — Does it work on real legal documents, not curated demos?
The prototype runs on an actual Nextpoint case — with real email threads, real attachments, real metadata inconsistencies, real document types. The chunking strategy faces real legal documents (not academic papers or blog posts). If email-aware chunking breaks on real MBOX extractions, or metadata is missing from real exhibits, we find out immediately.
| Data Validation | What Could Go Wrong | When We Find Out |
|---|---|---|
| Extracted text quality | Some documents have OCR errors, garbled text | Week 1 |
| Metadata completeness | Some exhibits missing author/date/subject | Week 1 |
| Email thread parsing | Reply chains not cleanly separated | Week 1 |
| Large documents (500+ pages) | Chunking produces too many vectors, Voyage API times out | Week 1 |
| Mixed document types | Spreadsheets, images-as-PDFs don't have useful text | Week 1 |
The prototype costs $23 and 2 weeks. It answers: does this work with our data, our infrastructure, our cost constraints, and our quality bar? All four answers before committing to a quarter of production build.
Cost-Optimized Alternative (If Titan V2 Passes Quality Test)¶
| Standard (voyage-law-2) | Cost-Optimized (Titan V2) | |
|---|---|---|
| Backfill (78M NGE Discovery docs) | $30,300 | $4,700 |
| Monthly ongoing | $4,875-7,290 | $945-2,320 |
| Year 1 total | $56K-122K | $15K-32K |
Production Corpus¶
| Metric | Value |
|---|---|
| Total documents | 870M |
| NGE-enabled Discovery (backfill scope) | ~78M |
| Backfill cost (NGE Discovery, standard model) | ~$30,300 |
| Backfill cost (NGE Discovery, cost-optimized) | ~$4,700 |
What's Already Done¶
| Deliverable | Status | Document |
|---|---|---|
| Module architecture (hexagonal, event-driven) | Complete | documentsearch.md |
| 14 use cases mapped to tiers | Complete | semantic-search-use-cases.md |
| Infrastructure inventory (new + existing) | Complete | semantic-search-infrastructure.md |
| Executive cost summary with cost-optimized alternative | Complete | semantic-search-cost-summary.md |
| Vector store evaluation (5 options) | Complete | adr/adr-vector-store-selection.md |
| Embedding pattern (asymmetric, AWS deployment) | Complete | patterns/asymmetric-embeddings.md |
| BM25 current state analysis | Complete | documentsearch.md (ES section) |
| Legal defensibility (determinism, audit logs) | Complete | documentsearch.md (defensibility section) |
| Non-functional requirements | Complete | documentsearch.md (NFR section) |
| Backfill design (existing 870M documents) | Complete | documentsearch.md + semantic-search-infrastructure.md |
| Production corpus cost model | Complete | semantic-search-cost-summary.md |
| PageIndex T2 evaluation plan | Complete | semantic-search-use-cases.md (Appendix C) |
The architecture is ready. The next step is the 2-week prototype.
What's NOT Done (And Doesn't Need to Be Before Prototype)¶
| Item | When Needed | Why Not Now | Depends On |
|---|---|---|---|
| Gap analysis agent (automated) | T2 agent service | T1 search is the primitive it calls | documentsearch T1 |
| Document summarization / ECA agent | +4-6 weeks after T1 production | Orchestration layer connecting documentsearch (retrieval) + nextpoint-ai (LLM synthesis). Not from scratch — both systems exist. | documentsearch T1 + nextpoint-ai |
| Active learning / continuous TAR | After responsive review ships (T1+) | Requires embeddings + reviewer feedback loop | documentsearch T1 + Rails review coding data |
| Automated privilege log generation | T2 agent service | Needs search results + LLM formatting | documentsearch T1 + Bedrock |
| "Chat with case data" conversational layer | T2 agent service | Conversational UX on top of search | documentsearch T1 + T2 agent |
| Audio/video transcription | Separate workstream | Different pre-processing pipeline (AWS Transcribe) | Text output feeds into existing search infra |
All of these build ON TOP of the documentsearch module. None can ship before it. The module is the foundation — everything else is a layer.
Dependency Chain¶
documentsearch T1 (hybrid search)
│
├─→ Gap analysis (T2 agent) ← customers already asking for this
│
├─→ Responsive review (T1+ Rails integration)
│ │
│ └─→ Active learning / continuous TAR ← highest GDR stickiness
│
├─→ Document summarization (nextpoint-ai extension)
│
├─→ Privilege log generation (T2 agent)
│
└─→ "Chat with case data" (T2 agent + conversational UX)
The two highest-value features — gap analysis (customer demand) and active learning (GDR stickiness) — both require documentsearch embeddings as their foundation. Neither can exist without the vector infrastructure. This is why the prototype is the critical first step.
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.