Reference Implementation: nextpoint-ai¶
Overview¶
nextpoint-ai is an AI-powered transcript summarization service for the Nextpoint Litigation suite. It processes deposition transcripts using Amazon Bedrock (Claude models) to generate structured summaries in multiple formats: narrative, chronological, table of contents, and custom user-defined formats.
EDRM Stage: 7 (Analysis) — AI-powered analysis of deposition transcripts. Suite: Litigation (depositions), but operates as an independent service.
Architecture¶
nextpoint-ai/
├── services/
│ ├── transcript-summary/
│ │ └── src/
│ │ ├── orchestrator.py # Orchestrator Lambda — fair queuing, rate limiting,
│ │ │ chunking, event routing (6 event types)
│ │ ├── processor.py # Processor Lambda — Bedrock calls, idempotent
│ │ │ chunk processing, final combining
│ │ ├── orchestrator_config.py # Rate limits, timeouts, model IDs, all config
│ │ ├── prompts.py # Prompt assembly for Bedrock API
│ │ ├── prompt_config_loader.py # S3-based prompt config with TTL caching
│ │ └── qa_eval.py # Inline QA evaluation for eval-user summaries
│ ├── transcript-chat/ # Placeholder for future chat service
│ └── shared/python/
│ ├── database/core/
│ │ ├── db_models.py # SQLAlchemy ORM (AIJob, AIJobChunk, etc.)
│ │ ├── db_operations.py # Database operations layer
│ │ └── schema.sql # MySQL schema
│ ├── common/constants.py # Enums (JobType, JobStatus, ChunkStatus, SummaryType)
│ └── utils/ # aws_utils, s3_utils, error_classifier, etc.
├── cdk/
│ ├── bin/nextpoint-ai-app.ts # CDK app — VPC, Security, Monitoring, TranscriptSummary stacks
│ ├── lib/transcript-summary/
│ │ └── transcript-summary-stack.ts # SQS queues, Lambdas, EventBridge rules, IAM
│ └── config/infrastructure-config.ts # Multi-env, multi-region config
└── docs/ # Architecture docs (DB structure, RDS proxy sharing)
Pattern Mapping¶
| Pattern | nextpoint-ai Implementation | Standard NGE Pattern |
|---|---|---|
| Event transport | EventBridge (nextpoint.rails source) |
SNS (standard for NGE modules) |
| Job queue | SQS FIFO (orchestrator) + SQS Standard (processor) | SQS Standard with SNS fan-out |
| Concurrency | Orchestrator: 1 concurrent (FIFO); Processor: 20 concurrent | Lambda concurrency per SQS |
| Database | Shared Aurora MySQL (ai_jobs, ai_job_chunks tables) | Per-case MySQL databases |
| Idempotency | Chunk-level, summary-type-level, final-generation-level | Checkpoint-based composite PK |
| Error handling | Custom error_classifier.py |
Exception hierarchy (Recoverable/Permanent/Silent) |
| Rate limiting | 100 RPM global, per-case limits, fair queuing (HIGH/MEDIUM/LOW) | No rate limiting in NGE |
| Infrastructure | AWS CDK (TypeScript) | AWS CDK (TypeScript) ✓ |
| ORM | SQLAlchemy 2.x + PyMySQL | SQLAlchemy 2.x + PyMySQL ✓ |
| Multi-region | c2 (us-east-1), c4 (us-west-1), c5 (ca-central-1) | Same regions ✓ |
Key Design Decisions¶
EventBridge Instead of SNS¶
Uses EventBridge for Rails integration because:
- Rails publishes summary.requested events to a shared EventBridge bus
- EventBridge provides content-based filtering (rule matching) without SNS topic management
- Natural fit for cross-service integration (Rails → AI) vs intra-module communication (SNS)
Two-Lambda Pipeline¶
Rails → EventBridge → SQS FIFO → Orchestrator (1 concurrent)
│
▼
SQS Standard → Processor (20 concurrent) → Bedrock
│
▼
Aurora MySQL (state) + S3 (output)
Orchestrator (single instance via FIFO):
- Receives 6 event types: summary.requested, job.completed, pass1.completed,
retry.scheduled, queue.process, monitor.check
- Fair queuing with priority levels (HIGH/MEDIUM/LOW)
- Rate limiting: 100 RPM global, per-case limits
- Chunks transcripts and stores chunk text in S3 (avoids EventBridge 256KB limit)
- Stuck-processing recovery (60min job timeout, 20min chunk timeout)
Processor (20 concurrent via Standard SQS):
- Processes individual chunks in parallel
- Calls Bedrock converse() API directly (Claude 3.5 Sonnet / Claude 4.5 models)
- Multi-level idempotency
- Supports 1-pass or 2-pass AI summarization
- Tracks input/output tokens per chunk for billing
- Handles final.process for pass completion and AI-powered final combining
S3-Based Chunk Storage¶
Chunk text goes to S3 rather than EventBridge/SQS payloads to avoid the 256KB message size limit. Chunks are stored at predictable S3 paths and referenced by ID in messages.
Prompt Hot-Reload¶
Prompt templates are loaded from S3 with TTL caching, allowing runtime updates without redeployment. Custom summary types can provide fully custom prompts, filenames, and processing parameters via S3 JSON config files.
Configuration¶
Key defaults (all overridable via env vars):
| Setting | Default | Notes |
|---|---|---|
| Default model | Claude 4.5 Haiku | Also supports Claude 3.5 Sonnet, Claude 4.5 Sonnet |
| Max concurrent requests | 5 | Global |
| Requests per minute | 100 | Global (× 0.9 safety buffer) |
| Max concurrent per case | 1 | Per-case isolation |
| Max RPM per case | 10 | Per-case throttle |
| Max chunk size | 60,000 chars | Transcript chunking |
| Job processing timeout | 20 min | Stuck recovery |
| Chunk processing timeout | 14 min | Must exceed Bedrock read_timeout (13 min) |
| Number of passes | 2 | Supports 1 or 2 pass summarization |
| Temperature | 0.1 | Low for factual accuracy |
| Max tokens | 3,000 (orchestrator), 2,000-8,000 (processor) | Per-call limits |
Database Schema¶
Uses shared Aurora MySQL (not per-case databases). Shares VPC, RDS cluster, and security groups with documentloader.
| Table | Purpose |
|---|---|
ai_jobs |
Central job table — job_id, status (QUEUED/PROCESSING/COMPLETED/FAILED), priority, npcase_id, deposition_id, user_id, summary_types (JSON), model_id, summary_outputs (JSON: summary_type → s3_path/metadata/preview), token counts |
ai_job_chunks |
Per-chunk tracking — unique (job_id, chunk_number, summary_type, pass_number), S3 input/output paths, status, tokens, retries |
ai_job_operations |
Full Bedrock operation history — operation type (chunk/final), pass number, tokens, processing time, errors |
ai_processing_metrics |
Derived analytics per (job_id, summary_type) — processing time, queue wait, token usage, estimated cost USD, throughput |
ai_rate_limits |
Sliding window rate limiting — limit_type (global/case/user), request_count, window_start/window_end |
Integration with Rails¶
Inbound: Rails publishes summary.requested to EventBridge:
{
"source": "nextpoint.rails",
"detail-type": "summary.requested",
"detail": {
"npcase_id": 123,
"deposition_id": 456,
"deposition_volume_id": 789,
"user_id": 42,
"s3_path": "case_123/transcripts/...",
"summary_types": ["narrative", "chronological", "toc"],
"custom_config": { ... }
}
}
Outbound: Summaries stored in S3 at predictable paths. job.completed event
handled by orchestrator for cleanup.
Shared infrastructure: Same VPC, Aurora cluster, security groups as Rails and NGE modules.
Divergences from Standard NGE Patterns¶
| Aspect | nextpoint-ai | Standard NGE |
|---|---|---|
| Event transport | EventBridge | SNS |
| Database scope | Shared Aurora (not per-case) | Per-case MySQL databases |
| Architecture boundary | No hexagonal core/shell | core/ + shell/ separation |
| Error handling | Custom error classifier | Recoverable/Permanent/Silent hierarchy |
| Concurrency model | FIFO (1) + Standard (20) | SQS batch processing |
| Rate limiting | Built-in (100 RPM, per-case) | None |
| Billing tracking | Token counting per chunk | N/A |
Key File Locations¶
| File | Purpose |
|---|---|
services/transcript-summary/src/orchestrator.py |
Orchestrator Lambda |
services/transcript-summary/src/processor.py |
Processor Lambda |
services/transcript-summary/src/orchestrator_config.py |
All configuration |
services/transcript-summary/src/prompts.py |
Prompt assembly |
services/shared/python/database/core/db_models.py |
SQLAlchemy models |
services/shared/python/common/constants.py |
Enums and constants |
cdk/lib/transcript-summary/transcript-summary-stack.ts |
CDK infrastructure |
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.