Nextpoint Architecture Repository¶

The single source of truth for the architectural knowledge behind the Nextpoint eDiscovery and Litigation platform. This repo captures how our systems are built, why decisions were made, and what the rules are -- so every engineer and AI coding tool operates from the same playbook.

By the Numbers¶

Category	Count
Reference implementations	28 (every NGE module + legacy platform + AI services)
Reusable patterns	28 (retry logic, checkpoint pipelines, circuit breakers, etc.)
Enforcement rules	18 (boundaries, events, errors, testing, security, deployment)
Architecture Decision Records	11 active ADRs
Engineering backlog items	26 tracked and prioritized
Research article reviews	25 groups
Total	~27,500 lines across 120 files

Purpose¶

Define -- what are our patterns and why (principles/, adr/)
Enforce -- AI tools pick these up automatically across all repos (rules/)
Evolve -- as we learn, patterns update in one place (patterns/)

Quick Start¶

I want to...	Do this
Start a new NGE module	Copy `templates/service-module/` and follow the README inside
Work in any Nextpoint repo	Reference this repo's `rules/` in your project's CLAUDE.md
Make an architectural decision	Add an ADR in `adr/` using the established template
Add a reusable pattern	Add it to `patterns/` with a real code example
Review a PR	The `pr-review` service injects these rules into automated review agents
Understand a module	Read its entry in `reference-implementations/`
Check for stale docs	Run `/sync-repos` to check all Bitbucket repos for drift (runs weekly automatically)
Add a new repo to tracking	Add an entry to `sync-config.yml` with repo slug, doc path(s), and watch paths

Core Architectural Principles¶

Event-driven service modules communicating via AWS SNS (not microservices)
Hexagonal boundaries: core/ (domain logic) and shell/ (infrastructure)
Multi-tenant per-case databases -- each case gets its own MySQL schema
All handlers are idempotent -- SNS/SQS guarantees at-least-once delivery
Exception types control message flow -- Recoverable, Permanent, Silent
Resumable checkpoint pipelines for long-running document processing
Events are facts -- past tense naming (DocumentLoaded, PaymentProcessed)
Shared vocabulary -- see principles/glossary.md for canonical terminology

What's Documented¶

NGE Platform (7 Modules)¶

Module	Purpose
documentloader	Document ingestion pipeline with 11-step checkpoint state machine
documentextractor	Text, PDF, and metadata extraction from raw files
documentuploader	PDF processing via Nutrient (PSPDFKit) engine
documentexporter	Document export via Step Functions + ECS Fargate
documentexchanger	Batch transfer between eDiscovery cases
documentpageservice	PDF page manipulation (reorder, rotate, split)
unzipservice	Archive extraction (ZIP, RAR, 7Z, TAR)

Legacy Platform (5 Areas)¶

Area	Purpose
rails-monolith	Core Rails 7.0.8 application (140+ models, 130+ controllers)
shared-libs	77+ shared Ruby modules used by Rails and workers
workers	Document processing engine (29 specialized Ruby workers)
rails-frontend	Dual JS pipeline, ~105 React components + 156 legacy jQuery files
legacy-nge-integration	Complete integration map between Legacy and NGE
nge-legacy-divergence-map	85+ code divergence points on `nge_enabled?`

AI and Search Services¶

Service	Purpose
documentsearch	Semantic search: voyage-law-2 embeddings + OpenSearch hybrid search
nextpoint-ai	AI transcript summarization (Bedrock Claude)
pr-review	Automated multi-agent PR review (5 parallel agents + verifier)
query-language-engine	Search query parser (TypeScript, Chevrotain)
neardupe	Near-duplicate detection (PySpark LSH on EMR)
search-hit-report-backend	Search hit report generator

Semantic Search Strategy¶

14 attorney use cases mapped to implementation tiers, from hot doc identification to multi-deposition analysis:

Use case mapping -- tiers, personas, build sequence
Infrastructure inventory -- new + existing components, costs
Product alignment -- competitive landscape, value metrics
Cost summary -- per-phase cost estimates

Data Mining (Separate Product)¶

Service	Purpose
eda	Data Mining backend (Ruby, Lambda + AWS Batch + Glue + Athena)
eda-front-end	Data Mining SPA frontend (TypeScript Web Components)

Technology Stack¶

NGE Modules¶

Language: Python 3.10+, type hints required
AWS: Lambda, SNS, SQS, RDS (Aurora MySQL), S3, Elasticsearch, ECS Fargate, Step Functions
ORM: SQLAlchemy 2.x with PyMySQL
Infrastructure: AWS CDK (TypeScript)
Testing: pytest, moto, pytest-mock, pytest-cov
Formatting: black (line-length=100), isort (profile=black)

Legacy Platform¶

Language: Ruby 3.1.4 (Rails 7.0.8)
Database: MySQL (per-case via PerCaseModel), Elasticsearch 7.4, Redis
Jobs: Sidekiq 7.3.9 (Rails), custom polling daemon (workers)
Testing: Minitest, Test::Unit, mocha

Repo Structure¶

nextpoint-architecture/
├── README.md                    # This file -- human-readable entry point
├── CLAUDE.md                    # AI tool entry point (master index for Claude Code / Kiro)
├── BACKLOG.md                   # Engineering backlog (26 items, prioritized)
├── sync-config.yml              # Repo-to-doc mapping for automated drift detection
├── .claude/
│   ├── settings.json            # Hooks configuration
│   ├── commands/                # Slash commands (check-boundaries, new-adr, sync-repos, etc.)
│   └── skills/                  # Auto-loaded skills (exploring-module, reviewing-architecture, etc.)
├── scripts/
│   └── bb-api.sh                # Bitbucket REST API helper (auth, commits, drift check)
├── principles/                  # The "why" -- architectural decisions explained
├── patterns/                    # The "how" -- 28 reusable implementation patterns
├── rules/                       # The "must" -- 18 enforcement rules for AI tools
├── templates/                   # The "start here" -- scaffolding for new modules
├── reference-implementations/   # The "proof" -- 28 documented systems
├── adr/                         # Architecture Decision Records (11 active)
└── article-reviews/             # Industry research evaluated against our architecture (25 groups)

Architecture Decision Records¶

ADR	Decision
ADR-001	SNS for all inter-module and intra-module communication
ADR-002	Service modules over microservices
ADR-003	Per-case database schemas for multi-tenancy
ADR-004	Incremental frontend modernization (no big rewrite)
ADR-005	Extract bulk operations to Lambda service
ADR-006	Extract bates/stamps to standalone service
ADR-007	Extract custom reports to Lambda/Step Functions
ADR-008	Elasticsearch upgrade + search service extraction
ADR-009	Video processing modernization (Litigation suite)
ADR-010	Deposition/transcript processing modernization
ADR-011	Custom field storage modernization (S3 + ES + async audit)

Current Focus Areas¶

Semantic search -- Phase 1 prototype, 14 use cases, voyage-law-2 + OpenSearch hybrid search
Legacy modernization -- ADRs 005-011 queued for extraction from Rails monolith
AI tooling -- pr-review multi-agent service live, skill eval testing, harness pattern research

For AI Tools¶

This repo is designed to be consumed by AI coding tools. The master index for Claude Code and Kiro is CLAUDE.md, which references all rules, patterns, and reference implementations. Each Nextpoint module's own CLAUDE.md points back here for architectural context.

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.