Skip to content

Nextpoint Architecture Repository

The single source of truth for the architectural knowledge behind the Nextpoint eDiscovery and Litigation platform. This repo captures how our systems are built, why decisions were made, and what the rules are -- so every engineer and AI coding tool operates from the same playbook.

By the Numbers

Category Count
Reference implementations 28 (every NGE module + legacy platform + AI services)
Reusable patterns 28 (retry logic, checkpoint pipelines, circuit breakers, etc.)
Enforcement rules 18 (boundaries, events, errors, testing, security, deployment)
Architecture Decision Records 11 active ADRs
Engineering backlog items 26 tracked and prioritized
Research article reviews 25 groups
Total ~27,500 lines across 120 files

Purpose

  1. Define -- what are our patterns and why (principles/, adr/)
  2. Enforce -- AI tools pick these up automatically across all repos (rules/)
  3. Evolve -- as we learn, patterns update in one place (patterns/)

Quick Start

I want to... Do this
Start a new NGE module Copy templates/service-module/ and follow the README inside
Work in any Nextpoint repo Reference this repo's rules/ in your project's CLAUDE.md
Make an architectural decision Add an ADR in adr/ using the established template
Add a reusable pattern Add it to patterns/ with a real code example
Review a PR The pr-review service injects these rules into automated review agents
Understand a module Read its entry in reference-implementations/
Check for stale docs Run /sync-repos to check all Bitbucket repos for drift (runs weekly automatically)
Add a new repo to tracking Add an entry to sync-config.yml with repo slug, doc path(s), and watch paths

Core Architectural Principles

  1. Event-driven service modules communicating via AWS SNS (not microservices)
  2. Hexagonal boundaries: core/ (domain logic) and shell/ (infrastructure)
  3. Multi-tenant per-case databases -- each case gets its own MySQL schema
  4. All handlers are idempotent -- SNS/SQS guarantees at-least-once delivery
  5. Exception types control message flow -- Recoverable, Permanent, Silent
  6. Resumable checkpoint pipelines for long-running document processing
  7. Events are facts -- past tense naming (DocumentLoaded, PaymentProcessed)
  8. Shared vocabulary -- see principles/glossary.md for canonical terminology

What's Documented

NGE Platform (7 Modules)

Module Purpose
documentloader Document ingestion pipeline with 11-step checkpoint state machine
documentextractor Text, PDF, and metadata extraction from raw files
documentuploader PDF processing via Nutrient (PSPDFKit) engine
documentexporter Document export via Step Functions + ECS Fargate
documentexchanger Batch transfer between eDiscovery cases
documentpageservice PDF page manipulation (reorder, rotate, split)
unzipservice Archive extraction (ZIP, RAR, 7Z, TAR)

Legacy Platform (5 Areas)

Area Purpose
rails-monolith Core Rails 7.0.8 application (140+ models, 130+ controllers)
shared-libs 77+ shared Ruby modules used by Rails and workers
workers Document processing engine (29 specialized Ruby workers)
rails-frontend Dual JS pipeline, ~105 React components + 156 legacy jQuery files
legacy-nge-integration Complete integration map between Legacy and NGE
nge-legacy-divergence-map 85+ code divergence points on nge_enabled?

AI and Search Services

Service Purpose
documentsearch Semantic search: voyage-law-2 embeddings + OpenSearch hybrid search
nextpoint-ai AI transcript summarization (Bedrock Claude)
pr-review Automated multi-agent PR review (5 parallel agents + verifier)
query-language-engine Search query parser (TypeScript, Chevrotain)
neardupe Near-duplicate detection (PySpark LSH on EMR)
search-hit-report-backend Search hit report generator

Semantic Search Strategy

14 attorney use cases mapped to implementation tiers, from hot doc identification to multi-deposition analysis:

Data Mining (Separate Product)

Service Purpose
eda Data Mining backend (Ruby, Lambda + AWS Batch + Glue + Athena)
eda-front-end Data Mining SPA frontend (TypeScript Web Components)

Technology Stack

NGE Modules

  • Language: Python 3.10+, type hints required
  • AWS: Lambda, SNS, SQS, RDS (Aurora MySQL), S3, Elasticsearch, ECS Fargate, Step Functions
  • ORM: SQLAlchemy 2.x with PyMySQL
  • Infrastructure: AWS CDK (TypeScript)
  • Testing: pytest, moto, pytest-mock, pytest-cov
  • Formatting: black (line-length=100), isort (profile=black)

Legacy Platform

  • Language: Ruby 3.1.4 (Rails 7.0.8)
  • Database: MySQL (per-case via PerCaseModel), Elasticsearch 7.4, Redis
  • Jobs: Sidekiq 7.3.9 (Rails), custom polling daemon (workers)
  • Testing: Minitest, Test::Unit, mocha

Repo Structure

nextpoint-architecture/
├── README.md                    # This file -- human-readable entry point
├── CLAUDE.md                    # AI tool entry point (master index for Claude Code / Kiro)
├── BACKLOG.md                   # Engineering backlog (26 items, prioritized)
├── sync-config.yml              # Repo-to-doc mapping for automated drift detection
├── .claude/
│   ├── settings.json            # Hooks configuration
│   ├── commands/                # Slash commands (check-boundaries, new-adr, sync-repos, etc.)
│   └── skills/                  # Auto-loaded skills (exploring-module, reviewing-architecture, etc.)
├── scripts/
│   └── bb-api.sh                # Bitbucket REST API helper (auth, commits, drift check)
├── principles/                  # The "why" -- architectural decisions explained
├── patterns/                    # The "how" -- 28 reusable implementation patterns
├── rules/                       # The "must" -- 18 enforcement rules for AI tools
├── templates/                   # The "start here" -- scaffolding for new modules
├── reference-implementations/   # The "proof" -- 28 documented systems
├── adr/                         # Architecture Decision Records (11 active)
└── article-reviews/             # Industry research evaluated against our architecture (25 groups)

Architecture Decision Records

ADR Decision
ADR-001 SNS for all inter-module and intra-module communication
ADR-002 Service modules over microservices
ADR-003 Per-case database schemas for multi-tenancy
ADR-004 Incremental frontend modernization (no big rewrite)
ADR-005 Extract bulk operations to Lambda service
ADR-006 Extract bates/stamps to standalone service
ADR-007 Extract custom reports to Lambda/Step Functions
ADR-008 Elasticsearch upgrade + search service extraction
ADR-009 Video processing modernization (Litigation suite)
ADR-010 Deposition/transcript processing modernization
ADR-011 Custom field storage modernization (S3 + ES + async audit)

Current Focus Areas

  • Semantic search -- Phase 1 prototype, 14 use cases, voyage-law-2 + OpenSearch hybrid search
  • Legacy modernization -- ADRs 005-011 queued for extraction from Rails monolith
  • AI tooling -- pr-review multi-agent service live, skill eval testing, harness pattern research

For AI Tools

This repo is designed to be consumed by AI coding tools. The master index for Claude Code and Kiro is CLAUDE.md, which references all rules, patterns, and reference implementations. Each Nextpoint module's own CLAUDE.md points back here for architectural context.

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.