Semantic Search: Product-Architecture Alignment¶

Context¶

This document maps the AI feature roadmap (Rakesh's tier analysis, March 2026) to the architecture work already completed. The purpose is to show that the majority of proposed AI features are served by a single architectural investment: the documentsearch module.

The Key Insight¶

8 of 10 proposed AI features run on the same infrastructure: vector embeddings + hybrid search + LLM synthesis. They are not separate builds. They are different queries against the same search endpoint.

One module (documentsearch) → One endpoint (POST /search) → 8 features

Feature-to-Architecture Mapping¶

Tier 1: Deployable in 4-8 Weeks¶

Feature	Architecture Status	What It Actually Is
Natural language search	Fully designed	documentsearch module: `POST /search` with hybrid BM25 + vector + RRF. The core prototype.
Document set summarization / ECA	Partially exists	See detailed ECA section below.
Privilege and PII/PHI flagging	Covered by T1	See detailed Privilege and PII/PHI section below.

All three Tier 1 features are served by the documentsearch prototype + existing nextpoint-ai.

Privilege and PII/PHI Flagging — Deep Dive¶

Two Problems, Often Confused¶

There are TWO privilege problems in eDiscovery — the traditional one and a new one created by AI:

Finding privileged documents in a production (our use case) — using AI to detect attorney-client privilege before inadvertent production
Whether AI-generated documents ARE privileged (new legal question) — courts ruling on whether GenAI outputs are protected. Not our use case but creates market context.

The Privilege Detection Problem¶

Why keyword search fails: Traditional privilege review searches for "privilege," "attorney," "counsel," "legal advice." This produces: - Massive false positives: HR documents about "attorney fees," vendor contracts referencing "legal department," newsletters mentioning "legal" - Critical misses: Conversational privilege where nobody uses formal terms — "I'd suggest we get Sarah's input on the antitrust exposure before responding" (Sarah is in-house counsel)

Traditional privilege logging averages ~7 documents per hour. AI-assisted privilege logging reaches ~35 documents per hour — a 5x throughput increase. Reviewers QC AI output instead of drafting from scratch.

How documentsearch Addresses Privilege (T1)¶

Semantic search understands the CONCEPT of privilege:

POST /search
{ "query": "communications seeking or providing legal advice about the transaction",
  "case_id": 123 }

POST /search
{ "query": "discussions about legal exposure or regulatory risk",
  "case_id": 123 }

POST /search
{ "query": "messages where someone asks for in-house counsel's opinion before responding",
  "case_id": 123 }

The embedding model (trained on legal text) recognizes that "let's get Sarah's take on the liability question before we respond" is semantically similar to "seeking legal advice" — zero keyword overlap with "privilege."

The Three-Layer Privilege Workflow¶

Layer	What It Does	Architecture	Status
1. Semantic search	Surface privilege candidates across full corpus	documentsearch T1: `POST /search`	Designed
2. AI classification	Auto-classify each candidate as likely/unlikely privilege	T2: LLM reads document, assesses privilege indicators	Not yet designed
3. Attorney review	Final privilege determination — human judgment required	Existing Rails privilege workflow	Exists

Layer 1 is documentsearch. Layer 2 is a T2 enhancement. Layer 3 exists. Defensibility: AI assists and surfaces candidates; attorneys make the final call. Courts have accepted this — the requirement is documentation, validation, and consistency.

Privilege Log Automation (T2/T3, Highest Manual-Hour Impact)¶

The full privilege workflow with automation:

1. FIND privilege candidates         ← documentsearch T1 (semantic search)
2. CLASSIFY each candidate           ← T2 agent (LLM reads doc, assesses)
3. REVIEW and confirm                ← attorney (existing Rails workflow)
4. GENERATE privilege log            ← T2/T3 agent (LLM formats into log)

Step 4 is one of the most hated manual tasks in discovery — junior associates spend days formatting privilege log entries. An AI-generated first draft that attorneys review and approve is a concrete, quantifiable hour-saver. This is the feature Rakesh identified as Tier 3.

All four steps depend on step 1 (semantic search) existing first.

The PII/PHI Detection Problem¶

Regulation	Penalty for Failure
GDPR	4% of global revenue
CCPA/CPRA	$7,500 per intentional violation
HIPAA	Up to $2.1M per violation category
Court sanctions	Case-specific

Auditable PII/PHI redaction is a prerequisite for responsible eDiscovery.

What Must Be Detected¶

Category	Examples	Detection Difficulty
Structured PII	SSNs (XXX-XX-XXXX), credit cards, phone numbers	Easy — regex works
Unstructured PII	Names + addresses in context, composite identifiers	Hard — requires NLP
PHI	Patient names, diagnoses, treatment histories, insurance	Hard — medical context
Composite PII	Job title + department + date = identifies one person	Hardest — regex misses entirely

Traditional tools catch structured patterns. AI catches unstructured and composite PII/PHI that regex misses — "the patient discussed her treatment plan with Dr. Martinez on March 15" contains PHI but no regex pattern.

Two Complementary Approaches for PII/PHI¶

Approach 1: Semantic search for PII/PHI topics (documentsearch T1)

POST /search
{ "query": "documents containing personal health information",
  "case_id": 123 }

POST /search
{ "query": "communications discussing employee medical conditions",
  "case_id": 123 }

Finds documents that DISCUSS PII/PHI topics. Useful for identifying document types that need redaction review. But doesn't detect specific entities (the actual SSN, the actual name).

Approach 2: Entity detection (T2, not yet designed)

Entity-level detection requires NER (Named Entity Recognition) or regex+NLP hybrid:

Method	What It Finds	Accuracy
Regex patterns	SSNs, credit cards, phone numbers	High (structured), zero (unstructured)
NER models	Person names, organizations, locations	85-95%
LLM classification	Contextual PII/PHI	90%+ but expensive per doc
Hybrid (regex + NER + LLM)	All categories	Best accuracy, highest cost

Architecture options:

Option A: Pipeline step (flag everything at ingest)
  documentextractor → DOCUMENT_PROCESSED
    ├── documentsearch (embeddings for search)
    └── PII/PHI detector (NER + regex for entity flagging)

Option B: Post-search scan (flag only reviewed documents)
  Attorney searches → results returned → PII/PHI scan on those results

Option A is better for production workflows (comprehensive). Option B is cheaper (scan only what's being reviewed). Both depend on documentsearch infrastructure.

Competitive Landscape¶

Competitor	PII/PHI Approach
Nebula Legal	AI text analysis + pattern recognizers. Names, SSNs, health data, financial info. Integrated into ECA and review.
Redactable	30+ PII categories, 90%+ accuracy, 98% processing time reduction.
VIDIZMO	255+ formats, 40+ PII types including spoken PII in audio.
iCONECT	Predictive PII detection trained on case-specific data. Continuous monitoring.
Nextpoint	No automated PII/PHI detection yet. Manual review only.

The Court Context: AI-Generated Documents and Privilege¶

Two competing Feb 2026 rulings create market awareness (not our use case, but context attorneys will ask about):

Case	Ruling
US v. Heppner (S.D.N.Y.)	NOT privileged — docs generated with consumer Claude, sent to attorney. AI is not an attorney.
Warner v. Gilbarco (E.D. Mich.)	Work product PROTECTED — AI-assisted analysis done at counsel's direction was protected.

Why this matters for Nextpoint: Clients need assurance that their AI interactions within the platform are defensible. Nextpoint's architecture — data stays in VPC or goes to SOC 2-compliant API, enterprise-grade data isolation — is the right answer. Consumer AI tools (ChatGPT free tier) are the risk; enterprise platforms are not.

Build Path¶

Capability	Tier	Timeline	Depends On
Privilege candidate detection (semantic)	T1	Prototype (2 weeks)	documentsearch
PII/PHI topic detection (semantic)	T1	Prototype (2 weeks)	documentsearch
PII/PHI entity detection (NER + regex)	T2	+4-6 weeks after T1	Pipeline integration or post-search
Privilege classification (LLM)	T2	+4-6 weeks after T1	documentsearch + Bedrock
Privilege log generation	T2/T3	+4-6 weeks after T2 privilege	Privilege classification

T1 (documentsearch) is the foundation for all five capabilities. Semantic search finds the candidates; entity detection, classification, and log generation layer on top.

Automated Privilege Log Generation — Deep Dive¶

Why This Is One of the Most Hated Tasks in Discovery¶

Privilege logging is the process of documenting every withheld document with enough detail for opposing counsel to assess the privilege claim — without revealing the privileged content itself. Under FRCP 26(b)(5)(A), parties must "describe the nature" of withheld materials so the opposing party can evaluate the claim.

In practice: junior associates spend DAYS creating spreadsheet entries for thousands of documents. Each entry requires identifying the date, author, all recipients, the privilege type (attorney-client, work product, common interest, etc.), and a brief description of the privileged content — all without disclosing the actual privileged communication. At ~7 documents per hour with traditional methods, a 2,000-document privilege set takes ~285 hours of associate time.

Document review accounts for 80%+ of total litigation spend (~$42 billion per year per ABA). Privilege logging is the most manual, least enjoyable portion of that spend.

New FRCP Rules (December 1, 2025) Change the Landscape¶

The most significant privilege log reforms in decades took effect December 1, 2025. Amended Rules 26(f) and 16(b) now require:

Parties must discuss privilege log procedures at the START of discovery (Rule 26(f) conference), not at the end
The discovery plan must include "views and proposals" on HOW and WHEN privilege claims will be made, including log format
Courts may adopt these terms in the scheduling order and enforce them

What this means: Teams must be prepared to negotiate privilege log format (categorical vs document-by-document, metadata fields vs narrative descriptions) at the outset — before review even begins. AI-generated logs become a strategic advantage in this negotiation: "We propose metadata- based categorical logging with AI-assisted descriptions, validated by attorney review."

Privilege Log Field Requirements¶

Field	Required?	Source	AI-Generable?
Document date	Yes (minimum)	Document metadata (exhibits table)	Yes — auto-extract
Author	Yes (minimum)	Document metadata / email headers	Yes — auto-extract
Recipients (To, CC, BCC)	Yes (minimum)	Email headers	Yes — auto-extract
Privilege type asserted	Yes (minimum)	Attorney determination	Partially — AI suggests, attorney confirms
Brief description of content	Yes (minimum)	Requires reading the document	Yes — LLM generates without revealing privilege
Document type	Negotiable	File metadata	Yes — auto-extract
Bates number / unique ID	Negotiable	Production metadata	Yes — auto-extract
Purpose of communication	Negotiable	Context analysis	Yes — LLM infers from content
Subject line	Negotiable	Email headers	Yes — auto-extract

6 of 9 fields are auto-extractable from metadata. 2 more are LLM- generable. Only the privilege type assertion requires attorney judgment (and even there, AI can suggest).

Three Types of Privilege Logs¶

Type	Description	AI Impact
Traditional (document-by-document)	One entry per document. Most detailed, most burdensome.	AI generates draft entry for each doc. Attorney reviews and edits. 5x throughput.
Categorical	Groups documents by shared characteristics (e.g., all emails between counsel and client in date range).	AI identifies categories automatically from document clusters. Most efficient.
Metadata	Auto-generated from metadata fields. May include human-coded privilege type column.	Almost fully automated. AI adds privilege type suggestion.

The Dec 2025 rule changes encourage early negotiation of log format. If parties agree to categorical or metadata logs, automation handles 90%+ of the work. If traditional logs are required, AI still provides 5x throughput improvement.

How It Builds on documentsearch¶

Step 1: FIND privilege candidates (documentsearch T1)
  POST /search { query: "communications seeking legal advice" }
  → Returns 2,000 candidate documents ranked by relevance
  |
Step 2: CLASSIFY each candidate (T2 agent)
  For each candidate, LLM reads document and determines:
  - Is this actually privileged? (yes/no/uncertain)
  - What privilege type? (attorney-client / work product / common interest)
  - Who are the attorneys involved?
  - What is the privileged subject matter (without revealing content)?
  → Classified set: 1,400 confirmed, 400 uncertain, 200 not privileged
  |
Step 3: ATTORNEY REVIEW (existing Rails workflow)
  Attorney reviews the 400 uncertain documents
  Spot-checks a sample of the 1,400 confirmed
  Makes final privilege calls
  |
Step 4: GENERATE PRIVILEGE LOG (T2/T3 agent)
  For each confirmed-privilege document:
  |
  ├── Auto-extract from metadata:
  │     Date, Author, Recipients, Document type, Bates number, Subject
  |
  ├── LLM generates (without revealing privilege):
  │     Brief description: "Email from in-house counsel to VP Engineering
  │     regarding legal analysis of product liability exposure"
  │     Purpose: "Seeking legal advice on regulatory compliance"
  |
  ├── Attorney-confirmed privilege type:
  │     "Attorney-Client Privilege"
  |
  └── Format into court-compliant privilege log:
        - Traditional (one row per document)
        - Categorical (grouped by type/participants/date range)
        - Metadata (auto-generated with privilege type column)
        - Format negotiated per Rule 26(f) agreement

The LLM Description Challenge¶

The hardest part of privilege logging is writing the brief description — it must describe the document's nature without revealing the privileged content. This is exactly what LLMs are good at: summarizing at a high level while respecting constraints.

LLM Prompt (simplified):
  "Read this document. It has been identified as attorney-client privileged.
   Write a brief description suitable for a privilege log entry that:
   1. Identifies the general subject matter
   2. Identifies the type of legal advice sought or provided
   3. Does NOT reveal the specific content of the legal advice
   4. Does NOT quote from the document
   5. Is 1-2 sentences maximum

   Example format: 'Email from [role] to [role] regarding legal analysis
   of [general topic].'"

LLM Output:
  "Email from in-house counsel to VP Engineering providing legal advice
   regarding potential product liability exposure and regulatory
   compliance obligations."

This is the same "summarize existing text" pattern that nextpoint-ai already handles for transcripts — constrained summarization anchored in the source document. The prompt just has a different constraint (don't reveal privileged content instead of don't exceed length).

Name Normalization (Critical for Log Quality)¶

One practical requirement: privilege logs must normalize names consistently. "John Smith," "J. Smith," "jsmith@company.com," and "Smith, John" must all resolve to the same person. AI handles this via entity resolution against the exhibits table (which already has custodian metadata).

Competitive Context¶

Competitor	Privilege Log Capability
eDiscovery AI	Full privilege logging: privilege call, privilege type, attorney hits, privilege elements, log entry generation. Customizable to court/client standards.
KLDiscovery	Integrated privilege log builder with automated features and name standardization.
Epiq AI	Auto-classify for privilege + update classification as matter evolves. Up to 90% reviewer reduction.
Nextpoint	Manual privilege logging only. No automated generation.

Build Estimate¶

Component	Effort	Depends On
Metadata auto-extraction for log fields	1-2 weeks	Exhibits table (exists)
LLM description generation prompt	1-2 weeks	Bedrock Claude (exists via nextpoint-ai)
Name normalization / entity resolution	1-2 weeks	Exhibits table custodian data
Log formatting (traditional / categorical / metadata)	1-2 weeks	Rails export capability
Attorney review workflow integration	1-2 weeks	Rails privilege workflow (exists)
Total	4-8 weeks after T2 privilege classification ships	T1 search + T2 classification

The estimate is 4-8 weeks because it depends on privilege classification (T2) existing first — you can't generate a log for documents that haven't been classified. But the metadata extraction and name normalization can start in parallel with T2 classification development.

Document Set Summarization / ECA — Deep Dive¶

What ECA Is in the EDRM¶

Early Case Assessment is not a formal EDRM stage — it spans identification through review. It sits at the far left of the EDRM model where decisions have the highest cost leverage: $1 spent on ECA saves $10-100 on downstream review. The core question ECA answers: settle or litigate? And if litigate, what's the scope, who are the key players, and where are the risks?

In 2026, courts increasingly enforce proportionality — discovery costs must align with actual case value. AI-powered ECA provides the quantitative evidence needed to argue for limited scope within the first 48 hours of receiving a production.

What Competitors Ship Today¶

The market has converged on a specific AI-powered ECA workflow. This is no longer differentiation — it's table stakes:

Competitor	ECA Capability
HaystackID / eDiscovery AI (Case Insight, Mar 2025)	GenAI case memo: key issues, critical documents, risk areas. Real-time classification + summarization.
Reveal	AI-driven ECA across full EDRM. Integrated TAR + GenAI summarization.
Epiq Discover	Factsheets: people/events/evidence relationships + AI text summarization.
DISCO	LLM-powered ECA with document classification and summarization.
OpenText	Automated document summarization for review prioritization and strategic decisions.
Nextpoint	Deposition transcript summarization only (nextpoint-ai). No document set summarization / ECA yet.

This is the gap Rakesh identified. Competitors demo an AI-generated case briefing in the first 5 minutes of an eval. We demo transcript summarization — valuable but narrower.

What the AI-Generated Case Briefing Looks Like¶

When a firm receives a 47,000-document production, the AI produces this within hours (not weeks):

CASE BRIEFING — Matter #2023-CV-04517
Generated: 2026-04-01 | 47,231 documents | 12 custodians

OVERVIEW
  Production spans Jan 2021 - Dec 2023.
  Central topic: product liability, suspension system defect.

KEY CUSTODIANS (ranked by relevance)
  1. VP Engineering (3,400 docs, 70% of safety-related communications)
  2. Director QA (2,100 docs, primary escalation recipient)
  3. Outside Counsel - Baker McKenzie (890 docs, privilege review needed)

CRITICAL TIMELINE
  2022-03: First internal safety report filed
  2022-09: Engineering review committee formed
  2023-03: Communication spike (4x normal volume, 2,847 docs in 6 weeks)
  2023-06: Recall announcement

KEY THEMES
  - Safety test results and internal escalation (4,200 docs)
  - Regulatory compliance discussions (1,800 docs)
  - Cost-benefit analysis of recall vs continued production (340 docs)

RISK AREAS
  - Privilege: 847 documents flagged for attorney-client review
  - PHI: 124 documents contain employee health records (redaction needed)
  - Gap: VP Engineering has 0 documents in March-April 2023 despite
    being the primary decision-maker (potential spoliation issue)

RECOMMENDED REVIEW SCOPE
  Priority set: 4,200 documents (9% of total)
  Estimated review time: 3-5 days (vs 2-3 weeks for full linear review)

That's the demo that changes a buyer's face. A partner gets a case briefing the same day the production arrives, not 3 weeks later after junior associates finish first-pass review.

What Nextpoint Already Has¶

Capability	Component	Status
Document text extraction	documentextractor	Shipped — text on S3
Document metadata (author, date, subject, custodian)	documentloader → MySQL	Shipped
ES keyword indexing	documentloader → ES 7.4	Shipped
Transcript summarization (narrative, chronological, TOC)	nextpoint-ai (Bedrock Claude)	Shipped
Semantic search (hot docs, privilege, custodian-scoped)	documentsearch	Designed, prototype next

What ECA Requires Beyond T1¶

ECA Capability	What T1 Search Provides	What's New (T2 Agent)
Hot document identification	`POST /search` ranked results	Nothing — T1 does this
Key custodian identification	Search results grouped by custodian	Aggregation + ranking logic
Privilege/PHI flagging	Conceptual search queries	Nothing — T1 does this
Case briefing memo	Search results as raw material	LLM synthesis into structured memo
Timeline construction	Date-filtered search results	LLM extracts date + event pairs, sorts chronologically
Communication pattern analysis	Custodian-filtered results	Metadata aggregation (sender/recipient frequency)
Theme extraction	Semantic search by topic	LLM clusters results into themes
Recommended review scope	Relevance-ranked document set	Threshold-based scope recommendation

T1 provides the retrieval layer. The T2 ECA agent orchestrates multiple searches and synthesizes results into the case briefing.

The ECA Agent Architecture¶

Attorney triggers ECA on a case (or auto-triggered on first import completion)
    |
    v
ECA Agent Lambda (orchestrator — extends nextpoint-ai pattern)
    |
    ├── Run 5-10 semantic searches across key topics:
    │     "safety concerns", "legal advice", "financial analysis",
    │     "regulatory communications", "personnel issues"
    │     → Each returns ranked documents with custodian + date metadata
    |
    ├── Aggregate metadata from exhibits table:
    │     Custodian document counts, date distribution,
    │     sender-recipient pairs, document types
    |
    ├── Run privilege + PHI detection searches:
    │     "communications seeking legal advice"
    │     "documents containing personal health information"
    │     → Flag counts per custodian
    |
    ├── Feed aggregated data to Bedrock Claude:
    │     "Given these search results and metadata, generate a
    │      structured case briefing covering: overview, key custodians,
    │      timeline, themes, risk areas, and recommended review scope."
    │     → LLM summarizes what it FOUND, not what it imagines
    │     → Every statement traces to a specific document from search
    |
    └── Deliver case briefing:
          - In-app view (Rails)
          - Downloadable PDF
          - Searchable — attorney can click any cited document

The Defensibility Argument¶

Nextpoint's own blog makes the key point: AI summarization is most reliable when it summarizes existing text, not generating from scratch. The ECA memo is anchored in actual documents returned by semantic search — every claim in the briefing traces back to specific exhibits. This is the difference between a defensible ECA tool and a hallucination risk.

The audit trail: search queries logged → search results logged → LLM prompt logged → generated memo logged → attorney review and approval. The full chain is traceable from memo statement to source document.

Build Path¶

Phase	What Ships	Timeline	Depends On
T1 prototype	Hot docs, privilege, custodian search — manual ECA	2 weeks	Nothing
T1 production	Same at scale, all NGE cases	+1 quarter	Prototype validates quality
T2 ECA agent	Automated case briefing memo	+4-6 weeks after T1 production	documentsearch T1 + nextpoint-ai

The 4-6 week estimate for the T2 ECA agent is realistic because: - nextpoint-ai already handles LLM orchestration (Bedrock Claude, chunking, prompt assembly, token tracking, FIFO + Standard SQS pattern) - documentsearch provides the retrieval (multiple POST /search calls) - The new code is: aggregation logic, memo prompt template, and formatting

This is not a from-scratch build — it's an orchestration layer connecting two existing systems.

Tier 2: 8-16 Week Builds¶

Feature	Architecture Status	What It Actually Is
AI-assisted first-pass review (TAR 2.0)	Covered by T1+	See detailed TAR 2.0 section below.
Deposition prep assistant	Covered by T1+	See detailed Deposition Prep section below.
Audio/video transcription + search	Not in scope	Requires speech-to-text (AWS Transcribe or Whisper) as a pre-processing step. Text output feeds into documentsearch's existing pipeline. The embedding/search infrastructure handles it — the gap is audio extraction, not search.

Two of three Tier 2 features are covered by documentsearch + minor Rails integration.

AI-Assisted First-Pass Review (TAR 2.0) — Deep Dive¶

Why This Is the Highest-Impact GDR Feature¶

Document review is 80%+ of total litigation spend — $42 billion per year (ABA). It's also the task attorneys and paralegals do every single day on every single matter. If Nextpoint makes this faster, it becomes the platform firms can't leave.

The current review workflow: reviewer opens a document, reads it, codes it as responsive/non-responsive/privileged, moves to the next document. Repeat 50,000 times. The order is random or chronological — the reviewer might spend 3 days on irrelevant documents before reaching the critical ones.

The TAR Evolution¶

Generation	How It Works	Improvement
Manual review	Reviewer reads every document in linear order	Baseline (~50-80 docs/hr)
TAR 1.0 (Predictive Coding)	Train on seed set, machine classifies the rest, reviewer validates sample	40-60% volume reduction
TAR 2.0 (Continuous Active Learning / CAL)	Model learns from EVERY reviewer decision, continuously re-ranks remaining documents	Identifies 90% of relevant docs in the first 10% of review
TAR + GenAI (emerging, 2025-2026)	GenAI summarization + rationale on top of CAL ranking. Reviewer reads a summary paragraph instead of a full document.	Up to 90% reviewer reduction (Epiq claims)

TAR 2.0 with CAL is the current industry standard. The key insight: the model trains on every code, one document at a time. As the reviewer codes documents, the model re-ranks all remaining documents so the most likely relevant ones are next in the queue. This front-loads relevant documents — reviewers stop finding new relevant documents after reviewing a fraction of the total set.

Proven results: 90% of responsive documents identified in the first 10% of review. 40-90% cost reduction depending on matter complexity.

What Competitors Offer¶

Competitor	TAR/Review Capability
Relativity aiR	250+ customers, 200M+ predictions. GenAI coding rationale + relevance detection. 50%+ increase in linear review speed.
Epiq AI	Auto-classify for relevance, PII, privilege. Update classification as matter evolves. TAR + CAL + GenAI combined. Up to 90% reviewer reduction.
Everlaw	EverlawAI Assistant: summarize, classify, explain. CAL for continuous learning.
DISCO	AI-powered review with continuous learning. Cloud-native.
Nextpoint	Linear review with keyword search. No TAR, no CAL, no AI-assisted review.

This is a significant competitive gap. TAR 2.0/CAL has been industry-standard for several years. Adding it is not innovation — it's catching up. GenAI on top of CAL is where the 2026 differentiation lies.

How documentsearch Enables TAR 2.0¶

The path from semantic search to TAR 2.0 is shorter than building TAR from scratch. Here's why:

Traditional TAR/CAL trains a classifier from scratch on reviewer decisions. It has no prior knowledge of document relevance — the seed set and ongoing coding are the only inputs.

Semantic search + reviewer feedback starts with a MUCH better baseline. The embedding model already knows which documents are conceptually related. When a reviewer codes a document as responsive, the system knows which un-reviewed documents have similar embeddings — and can prioritize them immediately, without waiting for hundreds of training examples.

Traditional CAL:
  Start with zero knowledge
  → Reviewer codes 100 docs (seed set)
  → Model learns initial relevance signal
  → Reviewer codes next batch
  → Model improves
  → After ~500-1000 coded docs, model is reasonably accurate

Semantic search + feedback (our approach):
  Start with embedding model's understanding of document similarity
  → Reviewer codes FIRST document as responsive
  → System immediately finds 50 documents with similar embeddings
  → Those 50 are boosted in the review queue
  → Reviewer codes second document
  → Relevance signal compounds with embedding similarity
  → After ~50-100 coded docs, system is already highly accurate

The embedding model provides a warm start. Traditional CAL is cold — it knows nothing until the reviewer trains it. Our approach starts warm because the embeddings already capture semantic relationships between documents.

The Three-Layer Review Architecture¶

Layer	What	Tier	Status
1. Semantic relevance ranking	`POST /search { query: "[RFP request language]" }` returns documents ranked by relevance to the production request	T1	Designed
2. Review queue integration	Rails review queue sorts by documentsearch relevance score instead of chronological/random	T1+	Designed (Rails change)
3. Active learning feedback loop	Each reviewer coding decision updates relevance scores for remaining un-reviewed documents in near-real-time	T2+	Architectural sketch (see Stickiness Moat section above)

Layer 1 is documentsearch. Layer 2 is a Rails UI change. Layer 3 is the active learning enhancement that creates the stickiness moat.

Even without Layer 3, Layers 1+2 are a massive improvement. Sorting the review queue by semantic relevance (instead of random) means reviewers see the most-likely-responsive documents first. The "back half" of the queue — documents that would never be reached in a time-boxed review — is now prioritized by relevance rather than luck.

The Workflow (Layers 1+2, Shipping with T1+)¶

Paralegal receives RFP Request #3:
  "All documents relating to the design and testing of the XYZ component"
      |
      v
POST /search { query: "documents relating to the design and testing
                        of the XYZ component", case_id: 123, limit: 5000 }
      |
      v
documentsearch returns 5,000 documents ranked by hybrid relevance score
      |
      v
Rails review queue populated in relevance order (highest score first)
      |
      v
Reviewer opens queue:
  Document #1: Relevance score 0.94 — engineering test report for XYZ
  Document #2: Relevance score 0.91 — email about XYZ design review
  Document #3: Relevance score 0.89 — meeting notes on XYZ component
  ...
  Document #4,800: Relevance score 0.12 — unrelated HR document
  Document #5,000: Relevance score 0.08 — vendor invoice
      |
      v
Reviewer codes top documents rapidly (high hit rate)
Hit rate declines as relevance scores decrease
Reviewer/attorney makes informed decision to stop at a threshold

The hit rate curve is the key metric. With random ordering, the hit rate is roughly constant (e.g., 15% relevance across the set). With relevance-ranked ordering, the hit rate starts at 80-90% at the top and declines. Reviewers code 10x faster in the high-relevance zone because they're not wading through irrelevant documents.

The Active Learning Upgrade (Layer 3, Future)¶

See the "Stickiness Moat" section above for the full architectural sketch. The key addition: each reviewer coding decision feeds back into the ranking model, boosting similar un-reviewed documents. The review queue re-sorts in near-real-time. This is the feature that creates per-firm lock-in.

Build Path¶

Component	Effort	Depends On
Semantic relevance ranking for RFP requests	0 (T1 search already does this)	documentsearch T1
Review queue sort-by-relevance (Rails)	1-2 weeks	Rails review workflow + documentsearch API
Hit rate analytics (relevance score vs coding decision)	1-2 weeks	Rails review coding data
Active learning feedback loop	4-8 weeks	documentsearch embeddings + review coding data
Total for Layers 1+2	1-2 weeks after T1	documentsearch T1
Total for Layer 3 (active learning)	+4-8 weeks after Layers 1+2	Layer 2 + feedback pipeline

Layers 1+2 ship with T1+ (weeks of Rails work). Layer 3 is a separate build but depends on the same embedding infrastructure.

Deposition Prep Assistant — Deep Dive¶

What Attorneys Do Today (Manual)¶

Deposition prep is one of the most labor-intensive tasks in litigation. For each witness, an attorney must:

Identify relevant documents: Search the production for every document connected to this witness and the topics they'll be deposed on
Organize by topic: Group documents into deposition topics (what the witness knew about X, when they knew about Y, who they communicated with about Z)
Build a chronology: Arrange key documents in date order to construct the timeline of the witness's knowledge and actions
Flag contradictions: Identify documents that contradict or support what the witness is expected to say
Create the depo binder: Compile the final set of documents for the deposition, organized for quick reference during questioning

This takes a senior associate 10-20 hours per witness per topic. For a case with 8 witnesses and 5 topics each, that's 400-800 hours of prep.

What Competitors Offer¶

Competitor	Depo Prep Capability
Epiq Assist	Prepares deposition memos and questions up to 5x faster. Identifies key topics, communications, events, and documents.
Epiq Narrate	AI Transcript Analysis extracts facts, people, events. Cross-references against case documents. Surfaces contradictions. Exports chronologies with hyperlinks to evidence.
DISCO	AI-powered chronology builder. Links documents to timeline events.
U.S. Legal Support (DepoSummary Pro)	Chronological case timeline from testimony. Exhibit cross-referencing. Deponent insights.
Everlaw	Batch summarize up to 1,000 docs. Create chronology timelines. Investigative outlines.
Nextpoint	Transcript summarization (nextpoint-ai). Manual document search and folder assembly.

Nextpoint's gap: transcript summarization is strong, but document gathering, chronology building, and contradiction detection are manual.

How Nextpoint's Existing Systems Combine for Depo Prep¶

The depo prep assistant doesn't require a new system — it orchestrates three systems that already exist (or are designed):

┌────────────────────┐     ┌───────────────────┐     ┌─────────────────┐
│  documentsearch    │     │   nextpoint-ai     │     │   Rails          │
│  (T1)              │     │   (shipped)        │     │   (existing)     │
│                    │     │                    │     │                  │
│  Semantic search   │     │  Transcript        │     │  Folders/tags    │
│  Custodian filter  │     │  summarization     │     │  Document viewer │
│  Date filter       │     │  Chronology        │     │  Export          │
│  Relevance ranking │     │  Narrative          │     │  Review workflow │
└────────┬───────────┘     └────────┬───────────┘     └────────┬─────────┘
         │                          │                           │
         └──────────────┬───────────┘───────────────────────────┘
                        │
                        v
              ┌─────────────────────┐
              │  Depo Prep          │
              │  Orchestration      │
              │  (new Rails +       │
              │   T2 agent)         │
              └─────────────────────┘

The Depo Prep Workflow (Automated)¶

Attorney specifies:
  Witness: "Jeff Skilling"
  Topics: ["Special Purpose Entities", "Board communications", "Regulatory response"]
  Date range: Jan 2001 - Dec 2001
      |
      v
Step 1: DOCUMENT GATHERING (documentsearch T1)
  For each topic, run:
    POST /search { query: "Jeff Skilling discussions about Special Purpose Entities",
                   filters: { custodians: ["jeff.skilling@enron.com"],
                              date_range: { start: "2001-01-01", end: "2001-12-31" } } }
  → 3 topics × top 50 results each = ~100-150 unique documents
      |
      v
Step 2: ORGANIZE BY TOPIC (T2 agent)
  LLM reads each document's snippet and assigns to topics:
    - SPE-related: 67 documents
    - Board communications: 43 documents
    - Regulatory: 28 documents
    - Cross-topic (multiple): 18 documents
      |
      v
Step 3: BUILD CHRONOLOGY (T2 agent + nextpoint-ai pattern)
  Extract date + event pairs from each document:
    2001-01-15: Skilling receives first SPE performance report (Exhibit #42)
    2001-02-28: Board presentation mentions Raptor structure (Exhibit #67)
    2001-03-14: Fastow proposes equity infusion for Raptor (Exhibit #89)
    2001-03-15: Skilling replies "stay the course" (Exhibit #91)
    ...
  → Chronological timeline with exhibit links
      |
      v
Step 4: FLAG CONTRADICTIONS (T2 agent)
  If prior deposition transcript exists:
    Compare deposition claims against documents:
    "Skilling testified he was unaware of SPE risks until Q3 2001.
     However, Exhibit #42 (Jan 15, 2001) shows he received the
     SPE performance report 6 months earlier."
      |
      v
Step 5: GENERATE DEPO BINDER (Rails)
  Create folder: "Skilling Depo Prep — SPEs / Board / Regulatory"
  Sub-folders by topic
  Documents sorted by chronology within each topic
  Cover sheet with:
    - Chronological timeline
    - Key documents highlighted
    - Contradictions flagged
    - Suggested question areas
      |
      v
Step 6: EXPORT
  Depo binder available in Nextpoint viewer
  Exportable as PDF with hyperlinks to exhibits
  Shareable with co-counsel

The Time Savings¶

Task	Manual (Today)	With Depo Prep Assistant
Document gathering (per witness/topic)	3-5 hours	~5 minutes (semantic search)
Organization by topic	2-3 hours	~2 minutes (LLM classification)
Chronology building	3-5 hours	~5 minutes (date extraction + sort)
Contradiction detection	2-4 hours (if prior depo exists)	~10 minutes (cross-reference)
Binder assembly	1-2 hours	Automatic
Total per witness per topic	10-20 hours	~30 minutes + attorney review

At $350/hour associate rate, saving 15 hours per witness = $5,250 per deposition. A case with 8 witnesses = $42,000 in associate time saved on a single matter.

Why This Is a GDR Feature¶

Deposition prep happens on every litigation matter. An attorney who builds depo binders in Nextpoint in 30 minutes (vs 15 hours manually) has a daily workflow inside the product. That's the stickiness: the attorney's deposition process lives in Nextpoint.

The combination matters more than any single feature. Semantic search alone is useful. Transcript summarization alone is useful. Depo prep combines them into a workflow that is greater than the sum of parts: find the documents (search), summarize the transcript (nextpoint-ai), build the binder (Rails), flag the contradictions (T2 agent). No competitor integrates all four in one platform for mid-market firms.

Build Path¶

Component	Effort	Depends On
Custodian + topic + date search (T1)	0 (documentsearch already does this)	documentsearch T1
"Save search results to depo folder" (Rails)	1-2 weeks	Rails folders + documentsearch API
Topic classification (LLM assigns docs to topics)	2-3 weeks	Bedrock Claude
Chronology extraction (date + event pairs)	2-3 weeks	Bedrock Claude + document metadata
Contradiction detection (docs vs prior testimony)	3-4 weeks	Transcript text + document search results
Depo binder generation (folder + cover sheet)	1-2 weeks	Rails export
Total (basic: search + folder)	1-2 weeks (T1+)	documentsearch T1
Total (full assistant with chronology + contradictions)	8-12 weeks (T2)	T1 + nextpoint-ai + Bedrock

The basic version (search + save to folder) ships with T1+ in weeks. The full assistant (chronology, contradictions, suggested questions) is a T2 build. Both are GDR features — even the basic version transforms the depo prep workflow.

Tier 3: 16-30 Week Builds¶

Feature	Architecture Status	What It Actually Is
Chat with your case data (RAG)	This IS documentsearch + T2 agents	See detailed conversational RAG section below.
Automated privilege log generation	Detailed above	See Privilege and PII/PHI section. T1 search → T2 classification → T2/T3 log generation. 4-8 weeks after T2 classification.

The "Tier 3" RAG capability IS the documentsearch module. It's not a 16-30 week build from scratch — it's a 2-week prototype + 1 quarter to production for the search infrastructure, plus agent service for the conversational layer.

Chat with Your Case Data (Conversational RAG) — Deep Dive¶

What It Is¶

An attorney asks open-ended questions of their entire document set in natural language and gets answers with citations to specific documents:

Attorney: "What did the CFO know about the accounting irregularities
           and when did he first learn about them?"

System:   "Based on the documents in this case:

           The CFO first received information about the accounting
           irregularities on March 12, 2022, when the Controller
           forwarded an internal audit report (Exhibit #234, p.3).

           Between March and June 2022, the CFO participated in 14
           email threads discussing the irregularities (Exhibits #234,
           #267, #301, #345, #378, #402, #445, #467, #489, #512,
           #534, #567, #589, #612).

           On June 3, 2022, the CFO directed the Controller to
           'hold off on flagging this to the audit committee until
           we have a complete picture' (Exhibit #445, p.2).

           The CFO's first communication with outside counsel about
           the issue was July 15, 2022 (Exhibit #612)."

This is the feature Harvey AI built at the top of the market ($11B valuation, March 2026). No mid-market eDiscovery platform does it well yet.

What Competitors Offer¶

Competitor	Capability	Limitation
Harvey AI	Vault: upload docs, ask questions via RAG. Custom models with OpenAI on all US case law.	10,000 document cap per Vault. $11B valuation reflects legal AI focus, not eDiscovery. Transactional focus (M&A, due diligence), not litigation-first.
Everlaw Project Query	Conversational search across terabytes of eDiscovery. Facts + references in seconds. Reasoning models for nuanced conclusions.	Closed beta (announced Legalweek 2025, GA expected 2025). Refinement process compensates for legal's complex corpus.
Lexis+ AI	Conversational search over case law + Shepard's validation. Highest accuracy in Stanford testing (65%).	Case law research, not case-specific document sets. Different use case.
Relativity aiR	AI-powered coding rationale and relevance detection at scale. 250+ customers, 200M+ predictions.	Classification-focused, not open-ended Q&A over documents.
Nextpoint	Natural language search (documentsearch, designed). No conversational Q&A yet.	Gap.

Why This Is Different from Search¶

Search returns a ranked list of documents. Conversational RAG returns an answer with citations. The attorney doesn't review 20 documents — they read a synthesized response that tells them what happened, when, and points to the evidence.

	Search (documentsearch T1)	Conversational RAG (T2)
Input	Query string	Natural language question
Output	Ranked document list with snippets	Synthesized answer with document citations
Attorney effort	Review 20 documents to extract the answer	Read the answer, verify 2-3 key citations
Latency	~170ms	~5-30 seconds (LLM synthesis)
Cost per query	~$0.000001 (embedding only)	~$0.01-0.10 (LLM invocation)
Hallucination risk	None (returns real documents)	Present (LLM may misinterpret or fabricate connections)

The hallucination risk is the critical difference. Search returns actual documents — no hallucination possible. Conversational RAG has the LLM synthesize an answer, which can misstate facts or fabricate connections between documents. Stanford found legal AI tools hallucinate in 1 out of 6+ queries.

How to Mitigate Hallucination in Legal RAG¶

The architecture must make hallucination detectable and verifiable:

Every claim must cite a specific document. The LLM is prompted to never make a statement without citing an exhibit number and page. Uncited claims are flagged as "unverified."
Citations must be clickable. The attorney can click any citation and see the actual document passage. If the passage doesn't support the claim, the attorney knows immediately.
Confidence indicators. The system distinguishes between:
"Found in document" (direct quote or close paraphrase)
"Inferred from documents" (synthesized across multiple sources)
"Not found in documents" (question cannot be answered from this corpus)
Retrieval transparency. Show which documents the LLM was given as context. If the answer is wrong, the attorney can see whether the relevant document was in the context or missed by search.
ABA Formal Opinion 512 compliance. Mandates human-in-the-loop review. The system presents answers for attorney verification, not as final determinations.

Architecture: How It Builds on documentsearch¶

Attorney asks: "What did the CFO know about the irregularities?"
        |
        v
   Conversational RAG Agent (T2)
        |
        ├── Step 1: Query decomposition (LLM)
        │     Break complex question into search queries:
        │     - "CFO communications about accounting irregularities"
        │     - "CFO first awareness of audit findings"
        │     - "CFO direction to delay disclosure"
        │
        ├── Step 2: Multiple semantic searches (documentsearch T1)
        │     POST /search for each sub-query
        │     Collect top 10-20 results per sub-query
        │     Deduplicate across sub-queries
        │     → 30-50 unique documents
        │
        ├── Step 3: Passage retrieval
        │     Fetch chunk text for each result from MySQL
        │     (search_chunks table — already stored by embedding pipeline)
        │     → 100-200 relevant passages
        │
        ├── Step 4: LLM synthesis (Bedrock Claude)
        │     Prompt: "Based on these document passages, answer the
        │     attorney's question. Cite specific exhibit numbers and
        │     pages for every factual claim. If a claim cannot be
        │     supported by the provided passages, say so explicitly."
        │     → Synthesized answer with citations
        │
        ├── Step 5: Citation verification
        │     For each citation in the answer, verify that the cited
        │     passage actually supports the claim (automated check)
        │     Flag unverifiable citations
        │
        └── Step 6: Response with transparency
              - Synthesized answer
              - Cited documents (clickable links to exhibits)
              - Confidence level per claim
              - List of all documents in context (transparency)
              - "This is an AI-generated analysis. Attorney review required."

This is NOT a simple "send all docs to the LLM" approach. You can't feed 500K documents into a context window. The architecture is: 1. documentsearch narrows 500K docs to 30-50 relevant ones (semantic search) 2. Chunk text provides the specific passages (already stored in MySQL) 3. LLM synthesizes from those passages only (bounded context) 4. Citations link back to source documents (verifiable)

The Context Window Challenge¶

Even with 1M-token context windows (Claude), you can't feed an entire case:

Case Size	Documents	Estimated Text	Fits in Context?
Small (1K docs)	1,000	~50M tokens	No (50x too large)
Medium (50K docs)	50,000	~2.5B tokens	No (2,500x too large)
Large (500K docs)	500,000	~25B tokens	No (25,000x too large)

This is why search is the prerequisite. You MUST narrow the corpus before the LLM can synthesize. documentsearch reduces 500K documents to 30-50 relevant passages that fit in the LLM context. Without this retrieval step, conversational RAG is impossible at eDiscovery scale.

Conversational Features (Multi-Turn)¶

Unlike single-query search, conversational RAG supports follow-up:

Attorney: "What did the CFO know about the irregularities?"
System:   [answer with citations]

Attorney: "When did he first communicate with outside counsel about this?"
System:   [follow-up answer — knows "he" = CFO, "this" = irregularities
           from conversation context]

Attorney: "Are there any documents that contradict his deposition testimony
           that he wasn't aware until July?"
System:   [searches for contradicting evidence, references the deposition
           claim, surfaces documents showing earlier awareness]

This multi-turn capability requires: - Conversation memory: Track what's been discussed, who "he" refers to - Query refinement: Each follow-up is a new search informed by prior context - Cross-reference: Connect new search results to previous answers

This is the T2 agent pattern — an orchestrator that maintains conversation state and issues multiple searches per turn.

Why This Is Harvey's Moat (and How Nextpoint Competes)¶

Harvey's $11B valuation is built on conversational legal AI. But Harvey serves general legal knowledge (case law, statutes) — not case-specific document sets. The attorney uploads documents to Harvey's Vault (10K doc cap) and asks questions.

Nextpoint's advantage: The documents are ALREADY in the platform. There is no upload step. An attorney working a case in Nextpoint can switch from document review to conversational Q&A without leaving the platform or re-uploading anything. The embedding pipeline processes documents as they're imported — by the time the attorney is ready to ask questions, the infrastructure is ready.

Harvey competes on general legal knowledge. Nextpoint competes on case-specific intelligence on the firm's own documents. These are different markets with different moats.

Build Path¶

Component	Effort	Depends On
Query decomposition (break question into sub-queries)	1-2 weeks	Bedrock Claude (exists)
Multi-query search orchestration	1-2 weeks	documentsearch T1
Passage retrieval and assembly	1 week	search_chunks table (exists via T1)
LLM synthesis with citation prompting	2-3 weeks	Bedrock Claude + prompt engineering
Citation verification	1-2 weeks	Passage-to-claim matching
Multi-turn conversation memory	2-3 weeks	Session state management
Rails UI (chat interface)	2-3 weeks	Rails frontend
Total	10-16 weeks after T1 production	documentsearch T1 + Bedrock

This is Rakesh's "Tier 3, 16-30 weeks." Our estimate: 10-16 weeks after T1 production ships. The difference: we're not building RAG from scratch — documentsearch provides the retrieval layer, search_chunks provides the passage store, and Bedrock provides the LLM. The new code is orchestration, prompting, citation verification, and UI.

Hallucination Safeguards for Legal¶

Safeguard	Implementation
Mandatory citations	Prompt constraint: every factual claim must cite exhibit + page
Clickable verification	UI links each citation to the actual document passage
Confidence levels	"Found in document" vs "Inferred" vs "Not found"
Context transparency	Show attorney which documents were in the LLM context
Attorney verification	All answers marked as AI-generated, requiring review
ABA Opinion 512	Human-in-the-loop; system assists, attorney decides
Audit logging	Full prompt, context, and answer logged for defensibility

Revised Timeline (Architecture-Informed)¶

The original tier estimates assume building each feature independently. Since they share infrastructure, the actual timeline is compressed:

What	Timeline	Investment	Features Enabled
Prototype (1 case, validate quality)	2 weeks	~$23	Natural language search demo, privilege review demo, PII/PHI flagging demo
Pilot (10 cases, attorney feedback)	+1 week	~$227	Same features on real cases with real attorneys
Production T1 (multi-tenant, backfill)	+1 quarter	$3,900 backfill + $1-2.3K/mo	Natural language search, privilege flagging, PII/PHI detection, responsive review, redaction ID, clawback, depo prep (basic), settlement prep
T1+ Rails integration	+2-3 weeks	Rails eng time	Review queue sort, save-to-folder, depo binder workflow
Document summarization	+2-4 weeks	Marginal (nextpoint-ai exists)	Early case assessment, document set briefing
T2 Agent service	+1 quarter	Bedrock costs	Gap analysis, pattern ID, "chat with case data", privilege log generation

Total time from start to "chat with your case data": ~6-7 months. Not 16-30 weeks per feature — 6-7 months for ALL features on shared infrastructure.

Demo Impact (What Changes in Sales)¶

The 5-Minute Demo Script¶

This is the sequence that changes a buyer's perception:

Minute 1: Run a keyword search for "Special Purpose Entities." Both NXP and competitors return similar results. Attorney is unimpressed.

Minute 2: Run a natural language search: "internal discussions about Special Purpose Entities." Semantic search surfaces documents mentioning "Raptor," "JEDI," and "stay the course" — documents keyword search missed entirely. Attorney leans forward.

Minute 3: Filter by custodian: "show me only what Jeff Skilling discussed." Instant results scoped to one person. This is depo prep in one query.

Minute 4: Run a privilege query: "communications seeking or providing legal advice about the transaction." No keyword overlap with "privilege" — but the right documents surface. Privilege review without false positives.

Minute 5: Show the highlighted passage explaining WHY each document ranked. Transparency that no competitor's "AI search" provides.

That sequence demos 4 features in 5 minutes. All from one endpoint.

The GDR Story¶

Features that drive daily usage and prevent churn:

Feature	Daily Workflow?	Why It Retains
Natural language search	Yes — every search session	Attorneys stop switching to other tools for conceptual queries
Responsive review (ranked)	Yes — every review session	50%+ time savings on review, compounding with every matter
Privilege flagging	Yes — every production	Catches privilege documents keyword search misses
Depo prep	Per-deposition	Attorneys build depo binders in minutes, not hours
Document summarization / ECA	Per-matter	First hours after receiving production become productive. Partner gets case briefing same day, not week 3.

An attorney saving 3+ hours per week on search and review does not cancel at renewal. These are not "nice to have" features — they become the daily workflow.

The Feature Customers Are Already Asking For: Interrogating Productions¶

We continually hear praise for one specific capability: using AI to interrogate productions, identify gaps, and generate summaries. This is not a hypothetical use case — it's the feature attorneys describe when they talk about what AI should do for them.

This maps directly to our T2 gap analysis architecture:

Attorney: "Show me what the VP of Engineering discussed about the safety
           defect between March 1 and April 15."

System:   Runs semantic search scoped to VP's documents in that window.
          Result: 0 documents.

Attorney: "Now show me what every other senior engineer discussed."

System:   Runs same query across 5 other custodians.
          Result: 18, 23, 15, 21, 8 documents respectively.

Attorney: "The VP has zero documents on this topic during a period when
           every peer was actively discussing it. That's my deposition."

That's gap analysis — and it's what attorneys are asking for by name. The T2 agent automates this (run the same query across all custodians, compare result counts, surface anomalies). But even at T1, an attorney can do it manually with custodian-filtered searches. T1 makes it possible. T2 makes it automatic.

The summaries complement gap analysis: once the relevant documents are found, nextpoint-ai summarizes them into a chronology or briefing. The combination — find the gaps, surface the evidence, summarize what it means — is the workflow attorneys describe as transformative.

This is the strongest argument for prioritizing documentsearch: the feature customers praise most requires the search infrastructure as its foundation. Gap analysis without semantic search is manual keyword iteration. Gap analysis with semantic search is the absence-as-evidence capability that no competitor offers.

The Stickiness Moat: Active Learning from Reviewer Decisions¶

Beyond search, the highest-impact GDR feature is continuous active learning from reviewer decisions. The concept: the model learns from each reviewer's responsive/non-responsive coding and reclassifies remaining documents in real time. The longer a firm uses the platform, the smarter it gets on their matter types and their attorneys' review patterns.

This is the feature that converts Nextpoint from "a tool attorneys use" to "a system the firm depends on." It creates a stickiness loop:

Reviewer codes document as responsive
  → Model updates relevance weights for similar documents
  → Remaining documents re-ranked by predicted responsiveness
  → Reviewer sees more-likely-responsive documents next
  → Model gets better with each decision
  → Switching to a competitor means starting from zero

How this builds on documentsearch:

Layer	What It Provides	Status
T1: Semantic search	Initial relevance ranking via embeddings + BM25	Designed
T1+: Review queue integration	Ranked review order in Rails	Designed
Active learning (new)	Feedback loop from reviewer decisions to re-ranking	Not yet architected

Active learning requires: 1. documentsearch T1 as the base relevance signal (vector similarity + BM25) 2. Review coding data from Rails (responsive/non-responsive decisions per document) 3. A re-ranking model that combines base relevance with reviewer feedback 4. Real-time re-scoring of un-reviewed documents as new decisions come in

The architecture for this is NOT in the current documentsearch design — it's a layer on top. But it depends entirely on the embedding infrastructure. Without vector representations of documents, there's no notion of "similar to the documents the reviewer marked responsive." The embeddings are the prerequisite.

Architectural sketch (not fully designed):

T1: documentsearch provides base relevance scores (vector + BM25)
     ↓
T1+: Review queue sorted by base relevance
     ↓
Active learning layer (future):
  Reviewer codes document as responsive
    → System identifies the document's vector embedding
    → Finds un-reviewed documents with similar embeddings
    → Boosts their predicted responsiveness score
    → Review queue re-sorts in near-real-time
    → Reviewer sees the next-most-likely-responsive document

This is a T2+ capability that should be designed after T1 proves retrieval quality. It has the highest GDR impact because it creates per-firm, per-matter lock-in — a competitor would need to rebuild the learned relevance model from scratch. But it requires the embedding infrastructure first.

Key decision for later: Should active learning use the same vector embeddings as search, or fine-tune a separate model per matter? Using the same embeddings is simpler (find similar documents by vector proximity). Fine-tuning per matter is more accurate but more expensive. This decision should be made after T1 production data reveals how well base embeddings correlate with reviewer decisions.

Cost Summary (For Leadership)¶

One-Time Investment¶

Phase	Documents	Cost	Timeline
Prototype (both embedding models tested)	50K	$23	2 weeks
Pilot (10 cases)	500K	$227	1 day
Phase 1 (100 active cases)	10M	$3,900	2-3 days
Phase 2 (all NGE Discovery)	78M	$28,000-$30,300	1-2 weeks

Ongoing Monthly¶

	Managed OpenSearch	Serverless OpenSearch
Infrastructure	$4,000-5,100	$973-2,273
New document embedding	$860-2,140	$860-2,140
Search queries	$15-50	$15-50
Total monthly	$4,875-7,290	$1,848-4,463

Why the Prototype Must Validate the Full Production Path¶

A common trap with AI prototypes: build something impressive on a laptop with a curated dataset, get stakeholder buy-in, then spend the next quarter discovering it doesn't fit the existing infrastructure, can't handle the real data, and costs 5-10x what leadership approved. The prototype worked; the product doesn't.

We avoid this by designing the prototype to validate FOUR things, not just retrieval quality:

1. Retrieval quality — Does semantic search produce the "wow" moment? Does it find documents keyword search misses?

2. Production cost model — Which embedding model can we afford at 78M documents? The prototype tests both Voyage AI voyage-law-2 ($0.12/M, legal-tuned) and Bedrock Titan V2 ($0.02/M, general-purpose) on the same case for $23. That $3 for Titan V2 answers a $100K question before committing a quarter of build work.

Model	Prototype Cost	If Chosen: Year 1
Voyage AI voyage-law-2	$20	$56K-122K
Amazon Bedrock Titan V2	$3	$15K-32K

3. Infrastructure fit — Does it plug into what we already have?

The prototype is NOT a standalone app. It's built on existing Nextpoint infrastructure from day 1:

Concern	How the Prototype Validates It
Pipeline integration	Subscribes to the real `DOCUMENT_PROCESSED` SNS events (same events documentloader consumes)
Existing data	Uses extracted text already on S3 from documentextractor — no re-extraction
Existing search	BM25 leg queries real ES 7.4 per-case aliases — validates hybrid search against real keyword infrastructure
Per-case isolation	Uses real per-case MySQL database for chunk storage — validates multi-tenant pattern
Backfill path	Embeds a real case with existing documents — validates the backfill pipeline on actual data
SQS batching	Uses `batchSize: 10` with `maxBatchingWindow: 60s` — same pattern as documentloader

If ANY of these don't work on the prototype case, we find out in week 1, not month 4.

4. Data reality — Does it work on real legal documents, not curated demos?

The prototype runs on an actual Nextpoint case — with real email threads, real attachments, real metadata inconsistencies, real document types. The chunking strategy faces real legal documents (not academic papers or blog posts). If email-aware chunking breaks on real MBOX extractions, or metadata is missing from real exhibits, we find out immediately.

Data Validation	What Could Go Wrong	When We Find Out
Extracted text quality	Some documents have OCR errors, garbled text	Week 1
Metadata completeness	Some exhibits missing author/date/subject	Week 1
Email thread parsing	Reply chains not cleanly separated	Week 1
Large documents (500+ pages)	Chunking produces too many vectors, Voyage API times out	Week 1
Mixed document types	Spreadsheets, images-as-PDFs don't have useful text	Week 1

The prototype costs $23 and 2 weeks. It answers: does this work with our data, our infrastructure, our cost constraints, and our quality bar? All four answers before committing to a quarter of production build.

Cost-Optimized Alternative (If Titan V2 Passes Quality Test)¶

	Standard (voyage-law-2)	Cost-Optimized (Titan V2)
Backfill (78M NGE Discovery docs)	$30,300	$4,700
Monthly ongoing	$4,875-7,290	$945-2,320
Year 1 total	$56K-122K	$15K-32K

Production Corpus¶

Metric	Value
Total documents	870M
NGE-enabled Discovery (backfill scope)	~78M
Backfill cost (NGE Discovery, standard model)	~$30,300
Backfill cost (NGE Discovery, cost-optimized)	~$4,700

What's Already Done¶

Deliverable	Status	Document
Module architecture (hexagonal, event-driven)	Complete	`documentsearch.md`
14 use cases mapped to tiers	Complete	`semantic-search-use-cases.md`
Infrastructure inventory (new + existing)	Complete	`semantic-search-infrastructure.md`
Executive cost summary with cost-optimized alternative	Complete	`semantic-search-cost-summary.md`
Vector store evaluation (5 options)	Complete	`adr/adr-vector-store-selection.md`
Embedding pattern (asymmetric, AWS deployment)	Complete	`patterns/asymmetric-embeddings.md`
BM25 current state analysis	Complete	`documentsearch.md` (ES section)
Legal defensibility (determinism, audit logs)	Complete	`documentsearch.md` (defensibility section)
Non-functional requirements	Complete	`documentsearch.md` (NFR section)
Backfill design (existing 870M documents)	Complete	`documentsearch.md` + `semantic-search-infrastructure.md`
Production corpus cost model	Complete	`semantic-search-cost-summary.md`
PageIndex T2 evaluation plan	Complete	`semantic-search-use-cases.md` (Appendix C)

The architecture is ready. The next step is the 2-week prototype.

What's NOT Done (And Doesn't Need to Be Before Prototype)¶

Item	When Needed	Why Not Now	Depends On
Gap analysis agent (automated)	T2 agent service	T1 search is the primitive it calls	documentsearch T1
Document summarization / ECA agent	+4-6 weeks after T1 production	Orchestration layer connecting documentsearch (retrieval) + nextpoint-ai (LLM synthesis). Not from scratch — both systems exist.	documentsearch T1 + nextpoint-ai
Active learning / continuous TAR	After responsive review ships (T1+)	Requires embeddings + reviewer feedback loop	documentsearch T1 + Rails review coding data
Automated privilege log generation	T2 agent service	Needs search results + LLM formatting	documentsearch T1 + Bedrock
"Chat with case data" conversational layer	T2 agent service	Conversational UX on top of search	documentsearch T1 + T2 agent
Audio/video transcription	Separate workstream	Different pre-processing pipeline (AWS Transcribe)	Text output feeds into existing search infra

All of these build ON TOP of the documentsearch module. None can ship before it. The module is the foundation — everything else is a layer.

Dependency Chain¶

documentsearch T1 (hybrid search)
  │
  ├─→ Gap analysis (T2 agent) ← customers already asking for this
  │
  ├─→ Responsive review (T1+ Rails integration)
  │     │
  │     └─→ Active learning / continuous TAR ← highest GDR stickiness
  │
  ├─→ Document summarization (nextpoint-ai extension)
  │
  ├─→ Privilege log generation (T2 agent)
  │
  └─→ "Chat with case data" (T2 agent + conversational UX)

The two highest-value features — gap analysis (customer demand) and active learning (GDR stickiness) — both require documentsearch embeddings as their foundation. Neither can exist without the vector infrastructure. This is why the prototype is the critical first step.

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.