Skip to content

Reference Implementation: nextpoint-ai

Overview

nextpoint-ai is an AI-powered transcript summarization service for the Nextpoint Litigation suite. It processes deposition transcripts using Amazon Bedrock (Claude models) to generate structured summaries in multiple formats: narrative, chronological, table of contents, and custom user-defined formats.

EDRM Stage: 7 (Analysis) — AI-powered analysis of deposition transcripts. Suite: Litigation (depositions), but operates as an independent service.

Architecture

nextpoint-ai/
├── services/
│   ├── transcript-summary/
│   │   └── src/
│   │       ├── orchestrator.py          # Orchestrator Lambda — fair queuing, rate limiting,
│   │       │                              chunking, event routing (6 event types)
│   │       ├── processor.py             # Processor Lambda — Bedrock calls, idempotent
│   │       │                              chunk processing, final combining
│   │       ├── orchestrator_config.py   # Rate limits, timeouts, model IDs, all config
│   │       ├── prompts.py               # Prompt assembly for Bedrock API
│   │       ├── prompt_config_loader.py  # S3-based prompt config with TTL caching
│   │       └── qa_eval.py              # Inline QA evaluation for eval-user summaries
│   ├── transcript-chat/                 # Placeholder for future chat service
│   └── shared/python/
│       ├── database/core/
│       │   ├── db_models.py             # SQLAlchemy ORM (AIJob, AIJobChunk, etc.)
│       │   ├── db_operations.py         # Database operations layer
│       │   └── schema.sql               # MySQL schema
│       ├── common/constants.py          # Enums (JobType, JobStatus, ChunkStatus, SummaryType)
│       └── utils/                       # aws_utils, s3_utils, error_classifier, etc.
├── cdk/
│   ├── bin/nextpoint-ai-app.ts          # CDK app — VPC, Security, Monitoring, TranscriptSummary stacks
│   ├── lib/transcript-summary/
│   │   └── transcript-summary-stack.ts  # SQS queues, Lambdas, EventBridge rules, IAM
│   └── config/infrastructure-config.ts  # Multi-env, multi-region config
└── docs/                                # Architecture docs (DB structure, RDS proxy sharing)

Pattern Mapping

Pattern nextpoint-ai Implementation Standard NGE Pattern
Event transport EventBridge (nextpoint.rails source) SNS (standard for NGE modules)
Job queue SQS FIFO (orchestrator) + SQS Standard (processor) SQS Standard with SNS fan-out
Concurrency Orchestrator: 1 concurrent (FIFO); Processor: 20 concurrent Lambda concurrency per SQS
Database Shared Aurora MySQL (ai_jobs, ai_job_chunks tables) Per-case MySQL databases
Idempotency Chunk-level, summary-type-level, final-generation-level Checkpoint-based composite PK
Error handling Custom error_classifier.py Exception hierarchy (Recoverable/Permanent/Silent)
Rate limiting 100 RPM global, per-case limits, fair queuing (HIGH/MEDIUM/LOW) No rate limiting in NGE
Infrastructure AWS CDK (TypeScript) AWS CDK (TypeScript) ✓
ORM SQLAlchemy 2.x + PyMySQL SQLAlchemy 2.x + PyMySQL ✓
Multi-region c2 (us-east-1), c4 (us-west-1), c5 (ca-central-1) Same regions ✓

Key Design Decisions

EventBridge Instead of SNS

Uses EventBridge for Rails integration because: - Rails publishes summary.requested events to a shared EventBridge bus - EventBridge provides content-based filtering (rule matching) without SNS topic management - Natural fit for cross-service integration (Rails → AI) vs intra-module communication (SNS)

Two-Lambda Pipeline

Rails → EventBridge → SQS FIFO → Orchestrator (1 concurrent)
                              SQS Standard → Processor (20 concurrent) → Bedrock
                                Aurora MySQL (state) + S3 (output)

Orchestrator (single instance via FIFO): - Receives 6 event types: summary.requested, job.completed, pass1.completed, retry.scheduled, queue.process, monitor.check - Fair queuing with priority levels (HIGH/MEDIUM/LOW) - Rate limiting: 100 RPM global, per-case limits - Chunks transcripts and stores chunk text in S3 (avoids EventBridge 256KB limit) - Stuck-processing recovery (60min job timeout, 20min chunk timeout)

Processor (20 concurrent via Standard SQS): - Processes individual chunks in parallel - Calls Bedrock converse() API directly (Claude 3.5 Sonnet / Claude 4.5 models) - Multi-level idempotency - Supports 1-pass or 2-pass AI summarization - Tracks input/output tokens per chunk for billing - Handles final.process for pass completion and AI-powered final combining

S3-Based Chunk Storage

Chunk text goes to S3 rather than EventBridge/SQS payloads to avoid the 256KB message size limit. Chunks are stored at predictable S3 paths and referenced by ID in messages.

Prompt Hot-Reload

Prompt templates are loaded from S3 with TTL caching, allowing runtime updates without redeployment. Custom summary types can provide fully custom prompts, filenames, and processing parameters via S3 JSON config files.

Configuration

Key defaults (all overridable via env vars):

Setting Default Notes
Default model Claude 4.5 Haiku Also supports Claude 3.5 Sonnet, Claude 4.5 Sonnet
Max concurrent requests 5 Global
Requests per minute 100 Global (× 0.9 safety buffer)
Max concurrent per case 1 Per-case isolation
Max RPM per case 10 Per-case throttle
Max chunk size 60,000 chars Transcript chunking
Job processing timeout 20 min Stuck recovery
Chunk processing timeout 14 min Must exceed Bedrock read_timeout (13 min)
Number of passes 2 Supports 1 or 2 pass summarization
Temperature 0.1 Low for factual accuracy
Max tokens 3,000 (orchestrator), 2,000-8,000 (processor) Per-call limits

Database Schema

Uses shared Aurora MySQL (not per-case databases). Shares VPC, RDS cluster, and security groups with documentloader.

Table Purpose
ai_jobs Central job table — job_id, status (QUEUED/PROCESSING/COMPLETED/FAILED), priority, npcase_id, deposition_id, user_id, summary_types (JSON), model_id, summary_outputs (JSON: summary_type → s3_path/metadata/preview), token counts
ai_job_chunks Per-chunk tracking — unique (job_id, chunk_number, summary_type, pass_number), S3 input/output paths, status, tokens, retries
ai_job_operations Full Bedrock operation history — operation type (chunk/final), pass number, tokens, processing time, errors
ai_processing_metrics Derived analytics per (job_id, summary_type) — processing time, queue wait, token usage, estimated cost USD, throughput
ai_rate_limits Sliding window rate limiting — limit_type (global/case/user), request_count, window_start/window_end

Integration with Rails

Inbound: Rails publishes summary.requested to EventBridge:

{
  "source": "nextpoint.rails",
  "detail-type": "summary.requested",
  "detail": {
    "npcase_id": 123,
    "deposition_id": 456,
    "deposition_volume_id": 789,
    "user_id": 42,
    "s3_path": "case_123/transcripts/...",
    "summary_types": ["narrative", "chronological", "toc"],
    "custom_config": { ... }
  }
}

Outbound: Summaries stored in S3 at predictable paths. job.completed event handled by orchestrator for cleanup.

Shared infrastructure: Same VPC, Aurora cluster, security groups as Rails and NGE modules.

Divergences from Standard NGE Patterns

Aspect nextpoint-ai Standard NGE
Event transport EventBridge SNS
Database scope Shared Aurora (not per-case) Per-case MySQL databases
Architecture boundary No hexagonal core/shell core/ + shell/ separation
Error handling Custom error classifier Recoverable/Permanent/Silent hierarchy
Concurrency model FIFO (1) + Standard (20) SQS batch processing
Rate limiting Built-in (100 RPM, per-case) None
Billing tracking Token counting per chunk N/A

Key File Locations

File Purpose
services/transcript-summary/src/orchestrator.py Orchestrator Lambda
services/transcript-summary/src/processor.py Processor Lambda
services/transcript-summary/src/orchestrator_config.py All configuration
services/transcript-summary/src/prompts.py Prompt assembly
services/shared/python/database/core/db_models.py SQLAlchemy models
services/shared/python/common/constants.py Enums and constants
cdk/lib/transcript-summary/transcript-summary-stack.ts CDK infrastructure
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.