Skip to content

Legacy ↔ NGE Integration Map

Overview

This document maps every integration point between the Legacy platform (Rails monolith, shared_libs, workers) and the NGE modules (documentloader, documentextractor, documentuploader, documentexporter, documentexchanger, documentpageservice, unzipservice).

The Legacy Rails app serves as the orchestration layer — it initiates NGE processing, tracks events, and presents results to users. NGE modules are headless processing engines that read/write to shared state (MySQL, S3, Athena).

Integration Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     LEGACY RAILS MONOLITH                       │
│                                                                 │
│  ┌──────────┐  ┌──────────────┐  ┌────────────────────────┐   │
│  │ Web UI   │  │ Sidekiq Jobs │  │ ApplicationController  │   │
│  │ (React/  │  │              │  │ (HMAC-SHA1 API)        │   │
│  │  ERB)    │  │ NgeCaseTrack │  │                        │   │
│  └────┬─────┘  │ NgeExport    │  └───────────┬────────────┘   │
│       │        │ BatchCompletn│              │                 │
│       │        └──────┬───────┘              │                 │
│       │               │                      │                 │
│  ┌────┴───────────────┴──────────────────────┴────────────┐   │
│  │              Per-Case MySQL Database                     │   │
│  │    exhibits │ attachments │ batches │ batch_proc_events  │   │
│  └─────────────────────┬───────────────────────────────────┘   │
│                         │                                       │
│  ┌──────────────────────┴──────────────────────────────────┐   │
│  │                    S3 Bucket                              │   │
│  │   /case_{id}/attachment/{uid}/{file}                      │   │
│  │   /case_{id}/export/{uid}/{file}                          │   │
│  └──────────────────────┬──────────────────────────────────┘   │
└─────────────────────────┼───────────────────────────────────────┘
         ┌────────────────┼────────────────┐
         │                │                │
         ▼                ▼                ▼
┌─────────────┐  ┌──────────────┐  ┌──────────────┐
│ documentloader│  │documentextrct│  │documentupldr │
│ (Lambda/SQS) │  │(ECS/DynamoDB)│  │(ECS/Nutrient)│
└──────┬──────┘  └──────┬───────┘  └──────┬───────┘
       │                │                  │
       └────────────────┴──────────────────┘
              ┌──────────────────┐
              │  Athena / Firehose│ ──→ NgeCaseTrackerJob
              │  (Event Stream)   │     (polls every N seconds)
              └──────────────────┘

Integration Points by Type

1. Shared Database (MySQL/Aurora)

Contract: Both Legacy and NGE read/write the same per-case MySQL databases.

Database Convention Used By
Shared DB {base_db} (e.g., nextpoint_production) Rails (users, accounts, cases)
Per-case DB {base_db}_case_{case_id} Rails + all NGE modules

Critical shared tables (per-case DB):

Table Legacy Writes NGE Writes Shared Fields
exhibits Yes (create, update, search) Yes (documentloader creates) nutrient_id, nge_file_hash, nge_enabled
attachments Yes (create, update) Yes (documentloader creates) S3 paths, page counts, search text
batches Yes (create, status updates) Yes (status via events) loader_status, loader_status_updated_at_gmt
batch_processing_events Yes (insert from Athena) Indirect (via Athena pipeline) event_type, status, details
labels Yes Yes (documentloader creates) Tag assignments

Schema alignment: NGE's SQLAlchemy ORM models (db_models.py) must match the Rails schema.rb definitions exactly. Any migration in either system must be coordinated.

2. S3 Storage

Contract: Shared bucket with consistent path conventions.

s3://{bucket}/case_{npcase_id}/{model_type}/{unique_id}/{filename}
Model Type Legacy Access NGE Access
attachment Read/write (shared_libs NextPointS3) Read/write (all NGE modules)
export Read/write Write (documentexporter)
batch Write (upload) Read (documentloader)
video Read/write N/A
deposition Read/write N/A
transcript Read/write N/A

Key S3 paths: - Upload source: s3://{case-folder-bucket}/case_{id}/... (user uploads) - Processed output: s3://{bucket}/case_{id}/attachment/{uid}/{file} - Export output: s3://{bucket}/case_{id}/export/{uid}/{file} - Search text: stored as attachment metadata - Markup/redaction: _unredacted_original suffix convention

3. Event Pipeline (Athena)

Flow: All NGE modules publish SNS events → PSM (Firehose) captures them → Athena makes them queryable → Rails polls at batch end to persist into MySQL.

documentextractor publishes SNS events
    ├──→ documentloader (SQS subscriber) — DB writes
    │         │ publishes SNS events → downstream modules + PSM
    ├──→ documentuploader (SQS subscriber) — page images
    │         │ publishes SNS events → downstream modules + PSM
    └──→ PSM Firehose (SNS subscriber) — captures events from ALL modules
         S3 storage (Parquet format)
         Athena table (queryable)
         At batch end: NgeCaseTrackerJob (Sidekiq) queries Athena
         Events persisted into BatchProcessingEvent table (per-case MySQL)
         BatchCompletionJob runs
         Rails UI displays processing results

Key points: - All three modules (extractor, loader, uploader) publish SNS events - Each module's events can trigger work in downstream modules - PSM (Processing Status Monitor) subscribes to events from all modules via Firehose → Parquet → Athena - Events are persisted to MySQL only at batch end when NgeCaseTrackerJob runs

Event types tracked:

Event Type Source Module Rails Handling
DOCUMENT_LOADED documentloader Creates/updates exhibit + attachment records
FILTERED_DUPLICATE documentloader Marks as filtered per dedupe settings
DUPLICATE_FOUND documentloader Adds duplicate warning
ERROR All modules Records processing error for display
WARNING All modules Records warning for display
NOTIFICATION All modules Informational event
WORKER_STARTED documentextractor Filtered (not displayed to users)
JOB_STARTED All modules Filtered (not displayed to users)
JOB_FINISHED All modules Filtered (not displayed to users)

Event processing details (NgeCaseTrackerJob): - Queries Athena for loader events and standard events (ERROR/WARNING/NOTIFICATION) separately - Parses JSON eventdetail from Athena rows into BatchProcessingEvent attributes - Bulk inserts events via BatchProcessingEvent.insert_all in slices of 1000 - For extractor events, back-fills exhibit_id on existing records (created before exhibit IDs were known) - For production imports, updates nge_produced_data_fields on the batch - For uploader source, enqueues BatchCompletionJob to finalize the batch

BatchProcessingEvent model: - Phases: preprocessing (0), processing (1), verification (2) - Scopes: errored, warnings, pst_warnings, resolved, unresolved - details column stored as YAML, parsed safely via details_parsed - InternalEventCode auto-created on first reference for unknown event types - NGE batches get extra "Description" column rendering in the UI

4. Lambda Invocation (documentexporter)

Direction: Rails → NGE

# app/sidekiq/nge_export_job.rb
lambda_client = Aws::Lambda::Client.new(region: region)
lambda_client.invoke(
  function_name: "#{region}-#{env}-nge-export-lambda",
  invocation_type: 'Event',  # async
  payload: {
    bucket: bucket,
    export_id: export_id,
    manifest_zip: manifest_s3_path,
    volumes: volume_configs
  }.to_json
)

Lambda naming convention: {region}-{environment}-nge-export-lambda - e.g., us-east-1-production-nge-export-lambda

5. HTTP API (documentpageservice)

Direction: Bidirectional

Rails → documentpageservice:

# app/services/nge_page_service.rb
# URL from SSM: /nge/dps/{env}/api/apiUrl
NgePageService.process_nge_page_job(
  operation: :reorder,  # or :rotate, :add, :remove, :split
  case_id: case_id,
  exhibit_id: exhibit_id,
  pages: page_config
)

documentpageservice → Rails:

# HMAC-SHA1 authenticated callback to Rails API
POST /api/page_service_callback
Header: API-Authorization: {HMAC-SHA1 signature}
Body: { exhibit_id, status, result_data }

6. SSM Parameter Store

Direction: NGE publishes → Rails reads

Parameter Publisher Consumer
/nge/dps/{env}/api/apiUrl documentpageservice CDK Rails NgePageService
Other NGE service URLs NGE CDK stacks Rails service clients

7. AWS Secrets Manager

Shared secrets accessed by both Legacy and NGE:

Secret Used By
Database credentials Rails (direct), NGE modules (direct)
API signing key Rails (HMAC verification), workers (HMAC signing)
Nutrient API key Rails (nextpoint_nutrient.rb), documentuploader
S3 encryption keys Both systems

8. Cognito (User Identity)

Direction: Rails manages → NGE trusts

Rails auto-provisions Cognito users on creation (create_cognito_user callback). NGE modules don't directly interact with Cognito — they operate as internal services authenticated by Lambda execution roles or ECS task roles.

Integration by NGE Module

documentextractor (pipeline entry point)

  • Trigger: User uploads to S3 → Batch::AsUpload created → ProcessorApi.import() HTTP POST to documentextractor API (POST /import). Cancellation via DELETE /import/{case}/{job}/{batch}.
  • Role: Entry point for all NGE imports. Assigns workers from DynamoDB pool, extracts content (text, metadata, file conversion via Hyland Filters). Publishes SNS events as documents are processed — these fan out to downstream modules.
  • SNS fan-out: documentextractor → SNS → subscribed by documentloader, documentuploader, and PSM (Firehose for event capture).
  • Tracking: processor_job_id stored on Batch record; ImportStatus created
  • Case routing: Npcase#nge_enabled? determines NGE vs Legacy workers
  • Key classes: ProcessorApi (lib/processor_api.rb), ProcessorApiHelper (app/helpers/processor_api_helper.rb), Batch::AsUpload (app/models/batch/as_upload.rb)

documentloader

  • Trigger: SNS events from documentextractor (subscribed via SQS)
  • Role: Writes exhibits, attachments, and metadata to the per-case MySQL database. 11-step checkpoint pipeline. Handles deduplication, family linking, ES indexing.
  • Shared state: exhibits, attachments, batches tables; S3 files
  • Publishes: SNS events consumed by downstream modules and PSM (Firehose → Athena)

documentuploader

  • Trigger: SNS events from documentextractor (subscribed via SQS)
  • Shared state: attachments table (page images, Nutrient IDs); S3 files
  • Integration: Nutrient (PSPDFKit) for PDF rendering
  • Publishes: SNS events consumed by downstream modules and PSM (Firehose → Athena)

documentexporter

  • Trigger: NgeExportJob (Sidekiq) invokes Lambda async
  • Input: Export manifest ZIP on S3, volume configurations
  • Output: Export ZIP volumes on S3
  • Shared state: exports, export_volumes, export_exhibits tables

documentexchanger

  • Trigger: User initiates document exchange between cases
  • Shared state: Cross-case database operations via AWS Glue ETL
  • Integration: Dynamic Lambda + SQS provisioning per exchange

documentpageservice

  • Trigger: NgePageService HTTP call from Rails
  • Callback: HMAC-SHA1 authenticated POST back to Rails API
  • URL discovery: SSM Parameter Store (/nge/dps/{env}/api/apiUrl)
  • Operations: Page reorder, rotate, add, remove, split

unzipservice

  • Trigger: ECS RunTask from Rails or other NGE module
  • Input/Output: S3 paths via environment variables
  • No database: Standalone file extraction service

Migration Strategy

NGE Feature Flag

The nge_enabled? flag on Npcase determines which processing path the case uses. A case is permanently either NGE or Legacy — it is not a migration toggle.

nge_enabled? = true  → NGE modules (Lambda/ECS)
nge_enabled? = false → Legacy workers (EC2 polling daemon)

The Legacy code has separate paths throughout based on this flag — from batch creation, through document processing, to event tracking and UI rendering. Legacy workers remain active for non-document jobs (video, transcripts, treatments) regardless of the flag.

Coexistence Requirements

For the migration period, both systems must: - Write to the same database schema (synchronized migrations) - Use the same S3 path conventions - Share HMAC-SHA1 signing keys - Both be able to read batch_processing_events for status tracking

Risk Areas

Schema Drift

  • Rails migrations and NGE SQLAlchemy models must stay synchronized
  • No automated validation exists today
  • Recommendation: Add schema comparison CI check

S3 Path Changes

  • Any change to S3 path conventions breaks both systems
  • The _unredacted_original suffix is particularly fragile
  • Recommendation: Document S3 path contract in shared patterns/

Athena Polling Latency

  • NgeCaseTrackerJob polling introduces latency between NGE processing completion and Rails UI update
  • Future: Direct SNS → Rails webhook or EventBridge → Rails for real-time updates

Shared Secrets

  • Both systems access the same Secrets Manager secrets
  • Key rotation must be coordinated
  • Recommendation: Use IAM role-based access where possible
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.