Legacy ↔ NGE Integration Map¶
Overview¶
This document maps every integration point between the Legacy platform (Rails monolith, shared_libs, workers) and the NGE modules (documentloader, documentextractor, documentuploader, documentexporter, documentexchanger, documentpageservice, unzipservice).
The Legacy Rails app serves as the orchestration layer — it initiates NGE processing, tracks events, and presents results to users. NGE modules are headless processing engines that read/write to shared state (MySQL, S3, Athena).
Integration Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ LEGACY RAILS MONOLITH │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ Web UI │ │ Sidekiq Jobs │ │ ApplicationController │ │
│ │ (React/ │ │ │ │ (HMAC-SHA1 API) │ │
│ │ ERB) │ │ NgeCaseTrack │ │ │ │
│ └────┬─────┘ │ NgeExport │ └───────────┬────────────┘ │
│ │ │ BatchCompletn│ │ │
│ │ └──────┬───────┘ │ │
│ │ │ │ │
│ ┌────┴───────────────┴──────────────────────┴────────────┐ │
│ │ Per-Case MySQL Database │ │
│ │ exhibits │ attachments │ batches │ batch_proc_events │ │
│ └─────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────┴──────────────────────────────────┐ │
│ │ S3 Bucket │ │
│ │ /case_{id}/attachment/{uid}/{file} │ │
│ │ /case_{id}/export/{uid}/{file} │ │
│ └──────────────────────┬──────────────────────────────────┘ │
└─────────────────────────┼───────────────────────────────────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ documentloader│ │documentextrct│ │documentupldr │
│ (Lambda/SQS) │ │(ECS/DynamoDB)│ │(ECS/Nutrient)│
└──────┬──────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────┴──────────────────┘
│
▼
┌──────────────────┐
│ Athena / Firehose│ ──→ NgeCaseTrackerJob
│ (Event Stream) │ (polls every N seconds)
└──────────────────┘
Integration Points by Type¶
1. Shared Database (MySQL/Aurora)¶
Contract: Both Legacy and NGE read/write the same per-case MySQL databases.
| Database | Convention | Used By |
|---|---|---|
| Shared DB | {base_db} (e.g., nextpoint_production) |
Rails (users, accounts, cases) |
| Per-case DB | {base_db}_case_{case_id} |
Rails + all NGE modules |
Critical shared tables (per-case DB):
| Table | Legacy Writes | NGE Writes | Shared Fields |
|---|---|---|---|
exhibits |
Yes (create, update, search) | Yes (documentloader creates) | nutrient_id, nge_file_hash, nge_enabled |
attachments |
Yes (create, update) | Yes (documentloader creates) | S3 paths, page counts, search text |
batches |
Yes (create, status updates) | Yes (status via events) | loader_status, loader_status_updated_at_gmt |
batch_processing_events |
Yes (insert from Athena) | Indirect (via Athena pipeline) | event_type, status, details |
labels |
Yes | Yes (documentloader creates) | Tag assignments |
Schema alignment: NGE's SQLAlchemy ORM models (db_models.py) must match the
Rails schema.rb definitions exactly. Any migration in either system must be
coordinated.
2. S3 Storage¶
Contract: Shared bucket with consistent path conventions.
| Model Type | Legacy Access | NGE Access |
|---|---|---|
attachment |
Read/write (shared_libs NextPointS3) |
Read/write (all NGE modules) |
export |
Read/write | Write (documentexporter) |
batch |
Write (upload) | Read (documentloader) |
video |
Read/write | N/A |
deposition |
Read/write | N/A |
transcript |
Read/write | N/A |
Key S3 paths:
- Upload source: s3://{case-folder-bucket}/case_{id}/... (user uploads)
- Processed output: s3://{bucket}/case_{id}/attachment/{uid}/{file}
- Export output: s3://{bucket}/case_{id}/export/{uid}/{file}
- Search text: stored as attachment metadata
- Markup/redaction: _unredacted_original suffix convention
3. Event Pipeline (Athena)¶
Flow: All NGE modules publish SNS events → PSM (Firehose) captures them → Athena makes them queryable → Rails polls at batch end to persist into MySQL.
documentextractor publishes SNS events
│
├──→ documentloader (SQS subscriber) — DB writes
│ │ publishes SNS events → downstream modules + PSM
│
├──→ documentuploader (SQS subscriber) — page images
│ │ publishes SNS events → downstream modules + PSM
│
└──→ PSM Firehose (SNS subscriber) — captures events from ALL modules
│
▼
S3 storage (Parquet format)
│
▼
Athena table (queryable)
│
▼
At batch end: NgeCaseTrackerJob (Sidekiq) queries Athena
│
▼
Events persisted into BatchProcessingEvent table (per-case MySQL)
│
▼
BatchCompletionJob runs
│
▼
Rails UI displays processing results
Key points:
- All three modules (extractor, loader, uploader) publish SNS events
- Each module's events can trigger work in downstream modules
- PSM (Processing Status Monitor) subscribes to events from all modules via
Firehose → Parquet → Athena
- Events are persisted to MySQL only at batch end when NgeCaseTrackerJob runs
Event types tracked:
| Event Type | Source Module | Rails Handling |
|---|---|---|
DOCUMENT_LOADED |
documentloader | Creates/updates exhibit + attachment records |
FILTERED_DUPLICATE |
documentloader | Marks as filtered per dedupe settings |
DUPLICATE_FOUND |
documentloader | Adds duplicate warning |
ERROR |
All modules | Records processing error for display |
WARNING |
All modules | Records warning for display |
NOTIFICATION |
All modules | Informational event |
WORKER_STARTED |
documentextractor | Filtered (not displayed to users) |
JOB_STARTED |
All modules | Filtered (not displayed to users) |
JOB_FINISHED |
All modules | Filtered (not displayed to users) |
Event processing details (NgeCaseTrackerJob):
- Queries Athena for loader events and standard events (ERROR/WARNING/NOTIFICATION) separately
- Parses JSON eventdetail from Athena rows into BatchProcessingEvent attributes
- Bulk inserts events via BatchProcessingEvent.insert_all in slices of 1000
- For extractor events, back-fills exhibit_id on existing records (created before
exhibit IDs were known)
- For production imports, updates nge_produced_data_fields on the batch
- For uploader source, enqueues BatchCompletionJob to finalize the batch
BatchProcessingEvent model:
- Phases: preprocessing (0), processing (1), verification (2)
- Scopes: errored, warnings, pst_warnings, resolved, unresolved
- details column stored as YAML, parsed safely via details_parsed
- InternalEventCode auto-created on first reference for unknown event types
- NGE batches get extra "Description" column rendering in the UI
4. Lambda Invocation (documentexporter)¶
Direction: Rails → NGE
# app/sidekiq/nge_export_job.rb
lambda_client = Aws::Lambda::Client.new(region: region)
lambda_client.invoke(
function_name: "#{region}-#{env}-nge-export-lambda",
invocation_type: 'Event', # async
payload: {
bucket: bucket,
export_id: export_id,
manifest_zip: manifest_s3_path,
volumes: volume_configs
}.to_json
)
Lambda naming convention: {region}-{environment}-nge-export-lambda
- e.g., us-east-1-production-nge-export-lambda
5. HTTP API (documentpageservice)¶
Direction: Bidirectional
Rails → documentpageservice:
# app/services/nge_page_service.rb
# URL from SSM: /nge/dps/{env}/api/apiUrl
NgePageService.process_nge_page_job(
operation: :reorder, # or :rotate, :add, :remove, :split
case_id: case_id,
exhibit_id: exhibit_id,
pages: page_config
)
documentpageservice → Rails:
# HMAC-SHA1 authenticated callback to Rails API
POST /api/page_service_callback
Header: API-Authorization: {HMAC-SHA1 signature}
Body: { exhibit_id, status, result_data }
6. SSM Parameter Store¶
Direction: NGE publishes → Rails reads
| Parameter | Publisher | Consumer |
|---|---|---|
/nge/dps/{env}/api/apiUrl |
documentpageservice CDK | Rails NgePageService |
| Other NGE service URLs | NGE CDK stacks | Rails service clients |
7. AWS Secrets Manager¶
Shared secrets accessed by both Legacy and NGE:
| Secret | Used By |
|---|---|
| Database credentials | Rails (direct), NGE modules (direct) |
| API signing key | Rails (HMAC verification), workers (HMAC signing) |
| Nutrient API key | Rails (nextpoint_nutrient.rb), documentuploader |
| S3 encryption keys | Both systems |
8. Cognito (User Identity)¶
Direction: Rails manages → NGE trusts
Rails auto-provisions Cognito users on creation (create_cognito_user callback).
NGE modules don't directly interact with Cognito — they operate as internal
services authenticated by Lambda execution roles or ECS task roles.
Integration by NGE Module¶
documentextractor (pipeline entry point)¶
- Trigger: User uploads to S3 →
Batch::AsUploadcreated →ProcessorApi.import()HTTP POST to documentextractor API (POST /import). Cancellation viaDELETE /import/{case}/{job}/{batch}. - Role: Entry point for all NGE imports. Assigns workers from DynamoDB pool, extracts content (text, metadata, file conversion via Hyland Filters). Publishes SNS events as documents are processed — these fan out to downstream modules.
- SNS fan-out: documentextractor → SNS → subscribed by documentloader, documentuploader, and PSM (Firehose for event capture).
- Tracking:
processor_job_idstored on Batch record;ImportStatuscreated - Case routing:
Npcase#nge_enabled?determines NGE vs Legacy workers - Key classes:
ProcessorApi(lib/processor_api.rb),ProcessorApiHelper(app/helpers/processor_api_helper.rb),Batch::AsUpload(app/models/batch/as_upload.rb)
documentloader¶
- Trigger: SNS events from documentextractor (subscribed via SQS)
- Role: Writes exhibits, attachments, and metadata to the per-case MySQL database. 11-step checkpoint pipeline. Handles deduplication, family linking, ES indexing.
- Shared state:
exhibits,attachments,batchestables; S3 files - Publishes: SNS events consumed by downstream modules and PSM (Firehose → Athena)
documentuploader¶
- Trigger: SNS events from documentextractor (subscribed via SQS)
- Shared state:
attachmentstable (page images, Nutrient IDs); S3 files - Integration: Nutrient (PSPDFKit) for PDF rendering
- Publishes: SNS events consumed by downstream modules and PSM (Firehose → Athena)
documentexporter¶
- Trigger:
NgeExportJob(Sidekiq) invokes Lambda async - Input: Export manifest ZIP on S3, volume configurations
- Output: Export ZIP volumes on S3
- Shared state:
exports,export_volumes,export_exhibitstables
documentexchanger¶
- Trigger: User initiates document exchange between cases
- Shared state: Cross-case database operations via AWS Glue ETL
- Integration: Dynamic Lambda + SQS provisioning per exchange
documentpageservice¶
- Trigger:
NgePageServiceHTTP call from Rails - Callback: HMAC-SHA1 authenticated POST back to Rails API
- URL discovery: SSM Parameter Store (
/nge/dps/{env}/api/apiUrl) - Operations: Page reorder, rotate, add, remove, split
unzipservice¶
- Trigger: ECS RunTask from Rails or other NGE module
- Input/Output: S3 paths via environment variables
- No database: Standalone file extraction service
Migration Strategy¶
NGE Feature Flag¶
The nge_enabled? flag on Npcase determines which processing path the case uses.
A case is permanently either NGE or Legacy — it is not a migration toggle.
nge_enabled? = true → NGE modules (Lambda/ECS)
nge_enabled? = false → Legacy workers (EC2 polling daemon)
The Legacy code has separate paths throughout based on this flag — from batch creation, through document processing, to event tracking and UI rendering. Legacy workers remain active for non-document jobs (video, transcripts, treatments) regardless of the flag.
Coexistence Requirements¶
For the migration period, both systems must:
- Write to the same database schema (synchronized migrations)
- Use the same S3 path conventions
- Share HMAC-SHA1 signing keys
- Both be able to read batch_processing_events for status tracking
Risk Areas¶
Schema Drift¶
- Rails migrations and NGE SQLAlchemy models must stay synchronized
- No automated validation exists today
- Recommendation: Add schema comparison CI check
S3 Path Changes¶
- Any change to S3 path conventions breaks both systems
- The
_unredacted_originalsuffix is particularly fragile - Recommendation: Document S3 path contract in shared patterns/
Athena Polling Latency¶
NgeCaseTrackerJobpolling introduces latency between NGE processing completion and Rails UI update- Future: Direct SNS → Rails webhook or EventBridge → Rails for real-time updates
Shared Secrets¶
- Both systems access the same Secrets Manager secrets
- Key rotation must be coordinated
- Recommendation: Use IAM role-based access where possible
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.