Skip to content

NGE vs Legacy Code Divergence Map

Overview

This document maps every location in the Rails codebase where execution diverges based on whether a case is NGE or Legacy. A case is permanently one or the other — nge_enabled? is set at case creation and never changes.

Three conditional flags control divergence: 1. Npcase#nge_enabled? — Is the case NGE? Used broadly across models, controllers, views, jobs. 2. Batch#nge_batch? — Returns processor_job_id.present?. A batch-level check (NGE batches have a processor job ID assigned by ProcessorApi.import()). 3. Exhibit#processing_in_nge — Boolean column. Set to true when a document is actively being processed by Nutrient/DPS. Gates toolbar actions in the UI.

Total divergence points: 85+ (43 in backend, 45+ in views/frontend)


1. Document Ingestion & Import

How documents enter the system — completely different pipelines.

Backend

# File Condition NGE Path Legacy Path
1 batches_controller.rb:297 nge_batch? Call nge_batch_process: check event dupes, queue NgeCaseTrackerJob (2 min delay) Call legacy_batch_process directly
2 batches_controller.rb:377 unless nge_batch? Skip setting next_check_for_complete_time_gmt Set next check time to 60s (polling-based completion)
3 batch.rb:845 Definition: nge_batch? = processor_job_id.present?
4 batch.rb:466 nge_batch? Include batch_process_details in meta update Exclude it
5 batch.rb:644 nge_batch? Update ImportStatus records with batch status Return immediately (no ImportStatus)
6 batch_pst.rb:7 nge_enabled? Call restructure_pst_info (JSON-ready format) ExtractedPstFolderRollupCollection

Views

File NGE Legacy
imports/_navigation.html.erb:1 Different import navigation Standard navigation
imports/select.html.erb:11 Mailbox: "OST, PST, MBOX" Mailbox: "PST, MBOX"
imports/select.html.erb:13 Production: "DAT, CSV, OPT, LOG" Production: "DAT, CSV"
imports/select.html.erb:41 Hidden additional import options Shows additional options
imports/select.html.erb:56 Button: "Start Import" Button: "Next"
batches/show.html.erb:12 Different batch detail rendering Standard rendering
batches/list.html.erb:12 Different batch list rendering Standard rendering
batches/_settings.html.erb:3 "Import Details: {name}" "Import Data Settings"
batches/dedupe.html.erb:10 Different dedup settings UI Standard dedup UI
general_settings/import.html.erb:11 Hidden Shows import settings

2. Batch Completion & Lifecycle

How batches are finalized — NGE skips all Legacy polling/retry machinery.

# File Condition NGE Path Legacy Path
7 batch_completion_job.rb:23 !nge_batch? Skip retrying failed batch split jobs Retry failed batch parts if max not reached
8 batch_completion_job.rb:26 unless nge_batch? Skip in-progress check and BatchCleanup retry Check for in-progress jobs; run BatchCleanup.process
9 batch_completion_job.rb:62 !nge_batch? Never reschedule next check; proceed to complete_batch Reschedule via update_batch_next_check_time
10 batch_completion_job.rb:69 unless nge_batch? Skip Exhibit.request_indexing Trigger ES reindex
11 batch_completion_job.rb:130 unless nge_batch? Skip backfill_email_family_id Backfill family_id on exhibits
12 batch_completion_job.rb:134 nge_enabled? Add non-imaged placeholders for containers (pst, ost, mbox, zip, 7z, tar, rar) Skip placeholder creation
13 batch_completion_job.rb:164 unless nge_batch? Skip processing_errors_array (errors come from Athena) Scan processing jobs, create BatchProcessingEvent records
14 batch_completion_job.rb:387 nge_batch? pdfs_to_create?false (Nutrient handles PDFs) Check search_hilite for PDF generation

Why so different: NGE handles retries via SQS, indexing via the loader pipeline, error tracking via Athena, and PDF rendering via Nutrient. The Legacy polling/retry loop in batch_completion_job is entirely replaced.


3. Batch Cancellation

# File Condition NGE Path Legacy Path
15 batch.rb:624 nge_batch? cancel_nge_importProcessorApi.cancel_import() (external API call, raises on failure) cancel_non_nge_import → local DB update + BatchStatusUpdateJob
16 batch_completion_job.rb:43 !nge_batch? On cancel: skip "still processing" check, cleanup immediately Reschedule check if jobs still in progress
17 batches_controller.rb:276 can_cancel_batch(nge_batch?) Permission check with nge=true Permission check with nge=false
18 batches/_batch_sidebar.html.erb:48 can_cancel_batch(nge_batch?) NGE permission check for cancel button visibility Legacy permission check

Tag / Custom Field Creation During Import

How tag values are deduplicated during load file imports — fundamentally different approaches.

Aspect NGE (documentloader) Legacy (Rails workers)
Dedup mechanism MySQL TagDedupe table with SHA256 hash + PK constraint Elasticsearch query per tag value
Race condition handling SAVEPOINT + IntegrityError catch + 3 retries ES acts as read-after-write cache
Tagging (exhibit↔tag) INSERT IGNORE bulk insert Per-row ActiveRecord create
ES in write path? No — ES indexed separately for search Yes — queried per value to check existence
DB access Direct via RDS Proxy (no Rails) Through ActiveRecord connection pool
Performance at scale Scales linearly with MySQL write throughput Degrades with ES query volume (millions of queries on large imports)

Key files: - NGE: documentloader/shell/tags_ops.py (insert_or_get_tag_id), shell/taggings_ops.py (INSERT IGNORE) - Legacy: Rails Tag model + Elasticsearch check before creation

Why Legacy uses ES: Parallel workers (fork-based) create race conditions on tag creation. ES check prevents duplicates across workers. This is intentional and correct for Legacy's architecture.

Why NGE doesn't need ES: MySQL atomic constraints (TagDedupe PK + SAVEPOINT rollback) handle concurrent Lambda invocations natively. SHA256 hash works around MySQL's varchar(255) key length limit for tag names up to varchar(2000).

For new modules (ADR-005, ADR-006): Follow documentloader's TagDedupe pattern. Never replicate Legacy's ES-check-before-write pattern in NGE modules.


Document Deduplication During Import

How duplicate documents are detected — insert-based (NGE) vs query-based (Legacy).

Aspect NGE (documentloader) Legacy (Rails)
Storage Separate doc_dedupe table (insert-based detection) No separate table — queries Exhibits table directly
Dedup key Composite PK: (npcase_id, message_id, bcc, md5, doc_type) Query by expansive_hash + email_message_id on Exhibits
Hash algorithm MD5 (content_hash or attachments_hash) MD5 → then SHA256 of MD5 hashes (expansive_hash)
When checked Before exhibit creation (PROCESS_STARTED checkpoint) After exhibit creation (post-creation merge)
Duplicate handling Skip creation, return existing exhibit IDs Create exhibit, then link as duplicate via ExhibitDedupeMerger
Race condition handling SAVEPOINT + IntegrityError + retry with backoff + force-create fallback Optimistic locking (ActiveRecord)
Email-specific logic message_id = email_message_id; bcc part of key (metadata-aware) email_message_id query + my_dupe?() metadata filtering (author, date, reply_id)
Attachment logic message_id = family_id (parent email); md5 = content_hash Grouped via expansive_hash (hash of all attachment hashes in family)
Scope Per-case (npcase_id in PK) Per-case (implicit in case DB schema)

Key files: - NGE: documentloader/shell/exhibit_ops.py (add_exhibit(), resolve_dedupe_fields(), find_dupes()) - NGE model: core/models/db_models.py (DocDedupe class) - Legacy: rails/app/models/exhibit.rb (dupe?(), merge_as_dupe(), ExhibitDedupeMerger)

Why NGE uses insert-based (DocDedupe table): Detecting duplicates via failed INSERT is atomic — no gap between "check" and "create" where a race condition can occur. The composite PK encodes document identity (content hash + email metadata + doc type). SAVEPOINT allows graceful rollback without aborting the entire transaction.

Why Legacy uses query-based: Legacy creates the exhibit first, then checks if it's a duplicate. This allows richer metadata-based dedup (my_dupe?() checks author, date, reply_id) but means duplicate exhibits are temporarily created and then merged.

Key behavioral difference: NGE prevents duplicate exhibits from being created. Legacy creates duplicates and then merges them. NGE approach is more efficient for large imports with high duplicate ratios — no wasted DB writes for documents that will be merged anyway.

Batch settings that control dedup: - allow_dupes on Batch — if true, skip DocDedupe insert entirely (both NGE and Legacy) - dedupe_using_message_id? — use email message_id for dedup - dedupe_using_meta? — use metadata fields for dedup - bcc_merge_on — include/exclude BCC in dedup key

For new modules: Follow documentloader's insert-based DocDedupe pattern. Pre-check dedup is cheaper than post-creation merge.


4. Document Viewer & Page Rendering

How documents are displayed — Nutrient (PSPDFKit) vs S3 page images.

Backend

# File Condition NGE Path Legacy Path
19 theater_processor.rb:316 nge_enabled Get image via NextpointNutrient.get_cached_filename_for_theater Get image via NextPointS3.get_cached_filename
20 theater_document_decorator.rb:51 nge_enabled preload = false (don't preload theater images) preload = true
21 exhibit.rb:2050 nge_doc? Spreadsheet URL: rewrite to /xlsx/{id}/content.xlsx (NGE S3 layout) Replace extension with .xlsx
22 documents_controller.rb:245 NGE-only regenerate_pdf: creates native_pdf_ocr_job, sets processing_in_nge = true No legacy equivalent
88 document_editor.rb:837 nge_enabled? process_generate_pdf_options: download PDF via Nutrient API Render generate_pdf_options modal for user
89 document_editor.rb:1174 nge_enabled select_page_heights: get heights from Nutrient API (nutrient_document_info) Query attachments table for page heights

Views

File NGE Legacy
documents/show.html.erb:57 Render _nge_document_toolbar partial Skip
documents/_nge_document_toolbar.html.erb Entire partial (NGE-only) N/A
documents/_scrollable_page_set.html.erb:2,33,83 Hidden (3 blocks) Page thumbnails with S3 images
documents/_action_bar.html.erb:52 "Wire" button hidden Shows "Wire" button
documents/_exhibit_page_content_preview.html.erb:179 Different preview rendering Standard preview
documents/show_document_files.html.erb:65 Hidden Shows document files section
documents/_js_templates.html.erb:227 Wire button hidden Shows wire button
page_notes/_sidebar_document_notes.html.erb:1 Hidden Shows sidebar notes

React Components

File NGE Legacy
NgeDocumentViewPdf.jsx:12 isProcessingInNge gates PDF viewer N/A
DocumentToolbar.jsx:87 isProcessingInNge disables toolbar Standard toolbar
DownloadControl.jsx:13 "Original files are not available" "Original File Removed"
documentViewPdfUtils.jsx:67,92 Different PDF rendering setup Standard rendering

5. Bates Stamping & Numbering

Page numbering — NGE validates against Nutrient page counts.

Backend

# File Condition NGE Path Legacy Path
23 bates_stamp_job.rb:33 @is_nge Send admin emails on Nutrient API failures or page count mismatches Skip ensure block
24 bates_stamp_job.rb:81 @is_nge Send bates_stamp_processing_finished_email, return Queue BatesStampCompletionJob, send notification
25 bates_stamp_job.rb:106 @is_nge Call NextpointNutrient.nutrient_document_info for page count; limit bates to Nutrient pages Use DB-based verified page count only
26 bates_stamp_job.rb:135 @is_nge Stop stamping at Nutrient page count; reset processing_in_nge Stamp all verified pages
27 bates_stamp_job.rb:146 nge_enabled? Reset processing_in_nge = false after stamping Do not touch flag
28 bates_management_controller.rb:12 has_attribute?(:processing_in_nge) Set processing_in_nge = true before processing Do not set flag
29 exhibit.rb:508 nge_enabled? After removing bates, reset processing_in_nge = false Only remove bates

Views

File NGE Legacy
label_bates/new.html.erb:158 Hidden additional options Shows options
general_settings/_exhibit_stamp_template.html.erb:13 Different stamp template Standard template
general_settings/update_stamp_format.html.erb:1,6 Different stamp format Standard format
production_endorsement_schemes/_form.html.erb:165,199 Different endorsement options Standard options
production_endorsement_schemes/placeholder_stamp_image.html.erb:25 Different placeholder stamp Standard stamp

Stamp Configuration (NGE-only fields)

# File Condition NGE Path Legacy Path
86 general_settings_controller.rb:12 nge_enabled Set stamp_placement (vertical/horizontal) + stamp_names (array of name/position) Skip — uses stamp_format + use_background_color_for_stamp only
87 general_settings_controller.rb:220 nge_enabled? Set confidentiality_stamps_position (left/right) on ConfidentialityCode Skip — not applicable for Legacy

6. Image Markups & Redactions

How annotations are applied — NGE uses Nutrient API, Legacy uses processing jobs.

# File Condition NGE Path Legacy Path
30 image_markups_controller.rb:50 nge_enabled? Auto-redaction: set processing_in_nge = true, queue AutoRedactionJob. Other markups: skip_processing_jobs: true Original markup logic with processing jobs
31 auto_redaction_job.rb:138 has_attribute?(:processing_in_nge) Reset processing_in_nge = false after completion N/A — entire job is NGE-only
32 sync_annotation_ids_job.rb:59 Always Reset processing_in_nge = false after sync N/A — entire job is NGE-only

Views

File NGE Legacy
image_markups/edit.html.erb:20 Different markup editor Standard editor

7. Toolbar & UI Locking

NGE locks toolbar actions while Nutrient is processing a document.

# File Condition When processing_in_nge = true When idle / Legacy
33 toolbar_permission_helper.rb:74 !processing_in_nge? Disable "add new page" Allow
34 toolbar_permission_helper.rb:86 !processing_in_nge? Disable "rotate/duplicate page" Allow
35 toolbar_permission_helper.rb:95 !processing_in_nge? Disable "split/delete page" Allow
36 toolbar_permission_helper.rb:109 !processing_in_nge? Disable "add/replace native" Allow
37 documents/_js_templates.html.erb:205 nge_enabled? Disable export during annotation jobs Allow

8. Non-Imaged Placeholders

How placeholder pages are created for container files and non-imaged documents.

# File Condition NGE Path Legacy Path
38 non_imaged_placeholder.rb:31 nge_enabled? create_instant_layer_for_nge, set S3 path directly setup_placeholder_file (create local file, upload to S3)

9. Family Linking & Batch Details

How email thread/family relationships are displayed.

# File Condition NGE Path Legacy Path
39 batch_family_linking.rb:7 nge_enabled? Return linked batches with merged username (hash) Raw AR relations
40 batch_family_linking.rb:20 nge_enabled? Merge username and docs count into hashes Raw AR query result
41 batch_family_linking.rb:61 nge_enabled? Return [{key:, value:}] array (JSON-ready) HTML string with <br/> separators

10. Export & Production

How document exports are rendered and delivered.

Backend

# File Condition NGE Path Legacy Path
42 batch_notification_decorator.rb:20 nge_batch? Show description for unknown events Show external text only
43 share_job_mixins.rb:73 Pass nge_enabled: flag to share job payload

Views

File NGE Legacy
exports/_export.html.erb:14 Different export rendering Standard
exports/_export.html.erb:40 size_of_nge_zips export_volumes.first.file_size
exports/show.html.erb:28,61 Different export detail rendering Standard
notification/shared_export_*.erb (4 files) Different export notification emails Standard emails
general_settings/edit_confidentiality_code.html.erb:24 Different confidentiality code editing Standard
application/_download_all.html.erb:5 Different download behavior Standard

11. Global UI & Layout

Platform-wide UI differences.

File NGE Legacy
layouts/application.html.erb:32 Body class: nge (enables global CSS) Body class: legacy
general/_js_support.html.erb:7 Sets NP['is_nge_enabled'] = 'true' 'false'
general/_current_case.html.erb:3,11 Container: nge-case_name_for_display_container + NGE indicator Standard container
general/_case_access_list.html.erb:59 Shows NGE indicator on case list No indicator
general/tab_bars/_import_export_center_tab_bar.html.erb:26 Different tab bar Standard tab bar
general/_banner_editor.html.erb:44 Different banner editor Standard
documents/_labels_editor.html.erb:51 Calls handleExhibitStamp on label save Standard save

12. Review & Coding (Shared — No Divergence in Logic)

Review and coding (applying labels, privilege designations, confidentiality codes, review status) work on both NGE and Legacy cases. The business logic is identical — the only difference is the rendering layer:

Aspect Legacy NGE
Document rendering Page images (TIFF/PNG) loaded from S3 PDF rendered from Nutrient with annotation overlays
Review controller Standard page data Sets Nutrient secrets for PDF viewer (reviews_controller.rb:129)
Label save Standard save Also calls handleExhibitStamp to update Nutrient bates overlay (_labels_editor.html.erb:51)
Bates/confidentiality display Baked into page images Nutrient overlay layers rendered on-demand
Coding overlay import Same logic Same logic (no nge_enabled? checks in coding_overlays_controller.rb)

This is an important distinction: review and coding are not divergent features — they are shared workflows where the underlying document representation differs (page images vs PDF). The business logic (label assignment, privilege tagging, review status, bulk coding) is identical across both systems.


13. Document Exchange (Wire)

"Wire" is the internal codebase name; "Exchange" is the user-facing product name (configured via $global_config[:wire_product_name] = 'Exchange' in nextpoint_global.yml). Same operation — transferring documents between cases.

This is a Legacy-only feature — hidden in NGE UI. The wire system itself has minimal NGE-specific code, but a cross-type validation prevents transfers between NGE and Legacy cases.

# File Condition NGE Path Legacy Path
90 general_settings_controller.rb:294 nge_enabled != target.nge_enabled Block wire transfer with db_type_mismatch error Same — prevents cross-type transfers in both directions
File NGE Legacy
documents/_action_bar.html.erb:52 "Exchange" button hidden Shows "Exchange" button
documents/_js_templates.html.erb:227 Wire button hidden in toolbar Shows wire button

Legacy wire transfer architecture:

Multi-phase approval workflow for transferring documents between cases:

OutgoingWire phases: initial_setup → loadfile → work_order → target_approval → fully_approved

Models: OutgoingWire (source), IncomingWire (destination), ExhibitOutgoingWire (join)

Jobs: - WireSetupJob — Links exhibits, deduplicates, advances phases - DocumentShareGenerationJob — Creates SQLite DB + CSV loadfile of selected exhibits - WireConfirmationJob — Cross-case/cross-account approval handshake - DocumentShareJob — Executes transfer: creates IncomingWire + batch (wire_transfer type) in target case, iterates SQLite DB, copies exhibits/attachments/S3 files per document - DirectDocumentShareJob — Shortcut for intra-account transfers - DepositionShareJob — Deposition-specific transfers

Flow: User selects exhibits → WireSetupJobDocumentShareGenerationJob (loadfile) → optional loadfile review → optional work order approval → cross-account approval via WireConfirmationJobDocumentShareJob copies documents to target case DB + S3.

NGE interaction: Only nextpoint_share_job_mixins.rb:73 — when wire creates a new destination case, it propagates nge_enabled from source case.

Legacy wire vs NGE documentexchanger:

Aspect Legacy Wire NGE documentexchanger
Trigger User clicks Exchange → WireSetupJob API Gateway (sync) + SQS (async) dual entry point
Approval workflow Multi-phase (OutgoingWire: initial_setup → loadfile → work_order → target_approval → fully_approved) Not yet integrated into Rails
DB transfer DocumentShareJob iterates SQLite DB, copies exhibits one by one via document.transfer! AWS Glue ETL for bulk database copy
S3 file copy Per-document S3 copy inside DocumentShareJob Per-document Lambda processors via dynamic SQS queues
Annotations Copies page images (bates/confidentiality already baked in) Must process via Nutrient (PDF overlays, no page images)
Infrastructure Sidekiq jobs on shared Redis queue Dynamic Lambda + SQS provisioned per exchange, torn down on completion
OCR Copies existing search text Re-OCR via Hyland Filters (source is PDF, not page images)

NGE status: documentexchanger is built but not yet integrated into the Rails app. The Legacy wire buttons are simply hidden via nge_enabled? in views.


NGE-Only Code (No Legacy Equivalent)

These entire components exist only for NGE cases:

Component Type Purpose
NgeCaseTrackerJob Sidekiq job Polls Athena for NGE processing events
NgeExportJob Sidekiq job Invokes documentexporter Lambda
AutoRedactionJob Sidekiq job Nutrient-based auto-redaction
SyncAnnotationIdsJob Sidekiq job Reconciles Nutrient annotation IDs
NgePageService Service HTTP client for documentpageservice
ProcessorApi Lib HTTP client for NGE Processor API
ProcessorApiHelper Helper Import payload builder for NGE
Batch::AsUpload workflow Model before_commitinitiate_import for NGE
mixins/nge_batches.rb Controller NGE batch listing, Athena queries
_nge_document_toolbar.html.erb View NGE document toolbar partial
NgeDocumentViewPdf.jsx React NGE PDF viewer component

Legacy Functionality Not Yet Modularized into NGE

The Nextpoint platform has two suites: Discovery (document processing, search, review) and Litigation (video, depositions, transcripts, treatments). Many features are common across both suites. NGE modularized the common processing workflows (import, export, exchange).

Common Functionality (Both Discovery and Litigation)

These features work in both case types (trial_prep = Litigation, review = Discovery):

Area Key Components NGE Status
Import/Upload ImportsController, BatchesController, S3UploadController, CaseFolderController Modularized (extractor → loader → uploader)
Export/Production ExportsController, ProductionTemplatesController, DocumentExportJob Modularized (documentexporter)
Exchange/Wire OutgoingWiresController, IncomingWiresController, wire jobs Built (documentexchanger, not yet integrated)
Document Viewer DocumentsController, DocumentPagesController, AttachmentsController Shared — page images (Legacy) vs Nutrient PDF (NGE)
Search SearchController, SearchAggregationController, Elasticsearch 7.4 Legacy only
Labels/Coding LabelsController, CodingOverlaysController, BulkLabelsController Shared — same logic, different rendering
Bates/Stamps BatesManagementController, ExhibitStampingController, BatesStampJob Shared — Nutrient overlays (NGE) vs page images (Legacy)
Markups/Redactions ImageMarkupsController, HighlightsController, PageNotesController Shared — Nutrient API (NGE) vs processing jobs (Legacy)
Custodians CustodiansController, CustodianExhibitsController Legacy only
Custom Fields/Grid CustomFieldsController, GridColumnsController, GridTemplatesController Legacy only
Family Linking FamilyLinkingsController, FamilyLinkingJob Legacy only (NGE handles during ingestion)
Reporting CustomReportController, UserActivityReportsController, AnalyticsController Legacy only
Case Management NpcasesController, CasePermissionsController, CaseNotesController Legacy only (core platform)
User/Account AccountsController, UsersManagementController, UserLicensesController Legacy only (core platform)
AI AiAssistantController, ChatbotController (Bedrock) Legacy only
Bulk Operations BulkDeleteJob, BulkRestoreJob, BulkActionJob, BatchLabelJob Legacy only

Additional common features not listed above:

Area Components NGE Status
Document Review ReviewsController, sub-review assignments Shared — same logic, different rendering
Chronology ChronologyController — timeline view Legacy only
File Room FileRoomController — virtual binders Legacy only
Search Hit Reports SearchHitReportController, searcher/post-processing jobs Legacy only

Litigation-Specific Features (Legacy Only)

Feature Components
Evidence Dashboard EvidenceController (verify_trial_prep required)
Theater/Presentation TheaterController, Theater::PagesController

Litigation Suite — Processing (Separate Domain, Entirely Legacy)

The Litigation processing workflows handle video, depositions, and trial presentation — none have NGE equivalents.

EC2 Workers:

Worker Function
TranscodeWorker Video transcoding via FFmpeg
VideoStitchWorker Multi-segment video stitching
VideoSyncWorker Video synchronization
FlvConversionWorker FLV format conversion
UpdateVideoAspectRatioWorker Video metadata update
TranscriptParseWorker Deposition transcript parsing (LEF, PTX, CMS formats)
DepositionZipWorker Deposition package extraction
TreatmentWorker Litigation presentation images (callout/highlight composites)

Sidekiq Jobs:

Job Function
DesignationVideoJob Video designation processing
DepositionPdfJob Deposition PDF generation
DepositionTextJob Deposition text extraction
DepositionShareJob Deposition sharing between cases
DepositionSummaryReportJob Deposition summary reports
TranscriptMetadataReportJob Transcript metadata reports
DepositionDesignationMergeJob Merge deposition designations
DepositionVolumeExhibitsInFolderLinkerJob Link exhibits in deposition folders

Discovery Suite — Not Yet Modularized

These Discovery features run in Legacy Rails/Sidekiq with no NGE module equivalent:

Category Components Description
Search SearchHitReportSearcherJob, SearchHitReportPostProcessingJob, SearchHitReportDeletionJob, Elasticsearch 7.4, custom Parslet query DSL Full-text search, hit reports
Review/Coding Review UI, theater, coding overlays, labels, privilege — shared across NGE/Legacy Core review workflow (same logic, different rendering)
Bulk Operations BulkDeleteJob, BulkRestoreJob, BulkLabelActivationJob, BulkLabelDestroyJob, BulkActionJob, BulkSubreviewAssignmentJob, BatchLabelJob, SubreviewSplitJob Mass document operations
Bates/Stamps BatesStampJob, BatesRemovalJob, BatesStampCompletionJob, BatesRemovalCompletionJob, ConfidentialityStatusJob, RedactAnnotationsJob Runs on both NGE and Legacy (uses Nutrient for NGE)
Document Operations PageDeleteJob, SplitDocumentOnFlagsJob, DocumentPdfCompletionJob, CodingOverlayJob, FamilyLinkingJob, CustodianUpdateJob, CustodianDestroyerJob, NearDupeTrackerJob Individual document manipulation
Wire/Exchange WireSetupJob, WireConfirmationJob, DocumentShareJob, DocumentShareGenerationJob, DirectDocumentShareJob, RemoteWireConfirmationJob Legacy wire system (documentexchanger built but not integrated)
Reports CustomReportJob, UserActivityReportJob, ReviewLogJob, PageCountReportForRelevancyJob, GridDataExportJob Custom reports, user activity
Export Utilities ExportCopyJob, PdfLambdaJob, CaseFolderImportJob Export duplication, legacy import

EC2 Workers (Discovery — Legacy only):

Worker Function
SpreadsheetConversionWorker XLS/XLSB/CSV → XLSX for spreadsheet viewer
DocumentPropertiesUpdateWorker Document metadata extraction
FileEmailWorker Email document files to users
DownloadExhibitPdfWorker PDF download generation

Platform Services — Entirely in Rails

Service Description
Authentication Cognito SRP + session-based + HMAC-SHA1 API auth
Authorization RBAC via Action/Role/RoleNpcase tables
User Management Account/User/NpcaseUser CRUD, Cognito provisioning
Case Management Create/archive/delete cases, per-case DB provisioning
Billing Account billing, ingestion limits, plan management
AI Features Bedrock agent, AI summaries, chatbot
Notifications DelayedEmailJob, BannerAlertJob, email, alerts, audit logging
Admin/Ops Background job management, EC2 monitoring, ES indexing, DatabaseArchiveJob

Summary: What's Modularized vs What's Not

MODULARIZED:                                  NOT MODULARIZED:
────────────                                  ────────────────
Processing (Stage 5):                         LITIGATION SUITE (separate domain):
  ✓ Document ingestion (extractor)              ✗ Video (transcode, stitch, sync)
  ✓ Content extraction (extractor)              ✗ Treatments (presentation images)
  ✓ DB writes + batch lifecycle (loader)
  ✓ Page image generation (uploader)           COMMON / DISCOVERY (still in Legacy):
  ✓ Page manipulation (pageservice)              ✗ Bulk operations (delete, label, restore)
  ✓ Archive extraction (unzipservice)            ✗ Bates/confidentiality (shared, Nutrient for NGE)
                                                 ✗ Wire approval workflow (exchanger not integrated)
Analysis (Stage 7):                              ✗ Reporting (custom, user activity)
  ✓ Search query parser (QLE — production)
  ◐ Search hit reports (SHR — prototype)        PLATFORM (core Rails):
  ✓ AI transcript summaries (nextpoint-ai)       ✗ Auth + RBAC + case mgmt + billing
                                                 ✗ Notifications + admin tooling
Production (Stage 8):
  ✓ Export/production (exporter)               SEPARATE PRODUCT:
                                                 ◆ Data Mining (eda + eda-front-end)
Cross-stage:                                       Own architecture, own AWS accounts
  ◐ Document exchange (exchanger, not live)

✓ = production   ◐ = built/prototype   ◆ = separate product

Summary

Functional Area Backend Views Total Core Difference
Document ingestion 6 10 16 ProcessorApi HTTP vs Legacy workers
Batch completion 8 0 8 External (Athena/Nutrient) vs internal polling/retry
Batch cancellation 3 1 4 External API cancel vs local DB update
Document viewer 4 12 16 Nutrient/PSPDFKit vs S3 page images
Bates stamping 7 5 12 Nutrient page count validation vs DB count
Markups/redactions 3 1 4 Nutrient API + AutoRedactionJob vs processing jobs
Toolbar locking 4 1 5 processing_in_nge flag gates actions
Placeholders 1 0 1 Instant Nutrient layer vs local file upload
Family linking 3 0 3 JSON hashes vs AR objects/HTML
Export/production 2 6 8 Different size calc, rendering, emails
Global UI 0 7 7 Body class, JS global, indicators
Wire transfer 0 2 2 Hidden in NGE (uses documentexchanger)
TOTAL 43 45 86

EDRM Mapping

The EDRM (Electronic Discovery Reference Model) defines 9 stages for how digital data flows through litigation. Here's how the Nextpoint platform maps to each stage, and what's modularized vs Legacy.

EDRM Stage Nextpoint Coverage Suite NGE Status Legacy Components
1. Information Governance Not directly covered N/A N/A N/A (pre-litigation)
2. Identification Custodian management spans Collection (assignment at import) and Review (reassignment via bulk ops) — not a separate stage in Nextpoint Common Legacy only CustodiansController (part of stages 4 + 6)
3. Preservation S3 storage, PendingDelete (deletion prevention) Common Legacy only S3 lifecycle, case folder management
4. Collection Upload files to File Room, S3 case folder, cloud sources; custodian assignment Common Legacy only FileRoomController, S3UploadController, CaseFolderController, DropboxController
5. Processing Import pipeline: file extraction, OCR, dedup, de-NISTing, format conversion, family linking, page image generation Common Modularized ImportsController, BatchesController; NGE: extractor → loader → uploader; Legacy: EC2 workers (Preprocess → Container → Conversion → Page)
6. Review Document viewer, coding, labels, privilege, confidentiality, sub-reviews, bulk operations, custodian reassignment Common Shared logic (Nutrient vs page images) ReviewsController, LabelsController, CodingOverlaysController, CustodiansController, bulk jobs
7. Analysis Search, search hit reports, analytics, chronology, AI summaries, near-dupe detection Common Partially modularized query-language-engine (search parser — production), search-hit-report-backend (hit reports — prototype), nextpoint-ai (transcript summaries — production). Legacy: SearchController, AnalyticsController, ChronologyController
8. Production Export with bates stamps, confidentiality codes, load files, productions Common Modularized ExportsController, ProductionTemplatesController, Legacy ExhibitZipVolumeWorker
9. Presentation Theater mode, treatments, video depositions, designations Litigation Legacy only TheaterController, TreatmentsController, DepositionsController

NGE Modules by EDRM Stage

EDRM Stage          Module(s)                           What They Replace/Add
──────────          ─────────                           ────────────────────
4. Collection       (no module — File Room,             File upload and case folder
                     S3 upload remain in Legacy)         management stay in Rails

5. Processing  ──→  documentextractor (entry point +    PreprocessWorker, ContainerWorker,
                      file conversion)                   ConversionWorker (LibreOffice, Tika)
                    unzipservice (archive extraction)    ContainerWorker for ZIP/RAR/7Z
                    documentloader (DB writes, dedup)    BatchCompletionJob, family linking
                    documentuploader (page images)       PageWorker (image gen, Nutrient)
                    documentpageservice (page ops)       Page manipulation workers

7. Analysis    ──→  query-language-engine (search        Legacy Ruby/Parslet parser in
                      parser, TypeScript ECS)             lib/search/ (production)
                    search-hit-report-backend (hit       SearchHitReportSearcherJob
                      reports, Ruby Lambda)               (prototype)
                    nextpoint-ai (AI transcript          New capability — Bedrock Claude
                      summaries, Python Lambda)           summaries (production)

8. Production  ──→  documentexporter (Lambda + Step Fn)  ExhibitZipVolumeWorker + LoadfileWorker

Cross-stage    ──→  documentexchanger (not integrated)   Wire/exchange system (stage 4→8)

What This Tells Us

NGE tackled the compute-heavy stages first: Processing (5) and Production (8) are where the heavy lifting happens — file conversion, OCR, image generation, PDF rendering, ZIP assembly. These benefit most from Lambda/ECS auto-scaling.

Analysis (7) is now being modularized: Three new services are extracting functionality from the Rails monolith: - query-language-engine — production, replaces Legacy Parslet parser - search-hit-report-backend — prototype, offloads ES search + Parquet/Athena analytics - nextpoint-ai — production, adds new AI summarization capability (Bedrock Claude)

The human-intensive stages remain in the monolith: Review (6) and Presentation (9) are interactive, UI-driven workflows where the bottleneck is human decision-making, not compute. These are well-served by the Rails monolith + Sidekiq.

Complete Platform Map by EDRM Stage

NEXTPOINT PLATFORM — EDRM Stage Mapping
═══════════════════════════════════════════

EDRM Stage 1: Information Governance
  (Not covered — pre-litigation)

EDRM Stage 2: Identification
  Custodian management spans stages 4 + 6 (not a separate stage in Nextpoint)

EDRM Stage 3: Preservation
  Legacy: S3 storage, PendingDelete (deletion prevention)

EDRM Stage 4: Collection
  Legacy: FileRoomController, S3UploadController, CaseFolderController,
          DropboxController, custodian assignment at upload time

EDRM Stage 5: Processing ─── MODULARIZED
  NGE:    ✓ documentextractor  — pipeline entry point, file conversion (Hyland)
          ✓ documentloader     — DB writes, batch lifecycle, dedup, family linking
          ✓ documentuploader   — Nutrient page images (PDF, no page images)
          ✓ documentpageservice — page reorder/rotate/add/remove/split (PDFBox)
          ✓ unzipservice       — archive extraction (ZIP/RAR/7Z/TAR/GZIP/BZIP2)
  Legacy: PreprocessWorker → ContainerWorker → ConversionWorker → PageWorker

EDRM Stage 6: Review ─── SHARED (same logic, different rendering)
  Legacy Rails (both NGE + Legacy cases):
          ReviewsController, LabelsController, CodingOverlaysController,
          CustodiansController, bulk ops (delete/restore/label/subreview)
  Rendering: Legacy = page images (TIFF/PNG) from S3
             NGE = PDF from Nutrient with annotation overlays

EDRM Stage 7: Analysis ─── PARTIALLY MODULARIZED
  NGE:    ✓ query-language-engine     — search query parser (TypeScript ECS)
          ◐ search-hit-report-backend — hit reports (Ruby Lambda, prototype)
          ✓ nextpoint-ai              — AI transcript summaries (Bedrock)
          ◐ neardupe                  — near-dupe detection (PySpark EMR, POC)
  Legacy: SearchController, AnalyticsController, ChronologyController,
          Elasticsearch 7.4, near-dupe (Databricks production), custom reports

EDRM Stage 8: Production ─── MODULARIZED
  NGE:    ✓ documentexporter — Step Functions + ECS Fargate
  Legacy: ExhibitZipVolumeWorker + ExhibitLoadfileWorker
  Key:    Legacy downloads pre-stamped page images from S3
          NGE renders from Nutrient PDF with bates/confidentiality overlays

EDRM Stage 9: Presentation ─── LEGACY ONLY (Litigation suite)
  Legacy: TheaterController, TreatmentsController, DepositionsController,
          video transcoding/stitching/sync, transcript parsing (LEF/PTX/CMS)

Cross-Stage:
  NGE:    ◐ documentexchanger — document exchange (built, not integrated)
  Legacy: OutgoingWire/IncomingWire, WireSetupJob, DocumentShareJob

SEPARATE PRODUCT — Data Mining (own AWS accounts):
  ◆ eda           — Ruby Lambda + Batch + dtSearch (Stages 4-8)
  ◆ eda-front-end — TypeScript SPA + 53 Lambda API + DynamoDB

✓ = production   ◐ = built/prototype   ◆ = separate product

Stages 1-3 are thin: Information Governance, Identification, and Preservation are lightly covered — Nextpoint focuses on stages 4-9 (Collection through Presentation).


Mapping Divergences to NGE Service Modules

Each functional divergence area maps to one or more NGE service modules that replaced the Legacy behavior. Some modules handle multiple functional areas.

By NGE Module

documentextractor (via ProcessorApi) — Pipeline Entry Point

Handles: Ingestion trigger + Cancellation + Pipeline orchestration

Functional Area How documentextractor handles it
Document ingestion (16 pts) ProcessorApi.import() calls documentextractor's POST /import endpoint. documentextractor assigns a worker, extracts content (text, metadata, file conversion via Hyland Filters), and publishes SNS events. These fan out to documentloader (DB writes via SQS), documentuploader (page images via SQS), and PSM (event capture via Firehose). Replaces Legacy's PreprocessWorkerContainerWorkerConversionWorkerPageWorker chain.
Batch cancellation (4 pts) ProcessorApi.cancel_import() calls DELETE /import/{case}/{job}/{batch}. documentextractor tears down the processing pipeline. Replaces Legacy's local DB status update + BatchStatusUpdateJob.

Key insight: documentextractor is the NGE entry point from Rails — it's the service that ProcessorApi talks to. It publishes SNS events that fan out to documentloader (DB writes), documentuploader (page images), and PSM (Firehose event capture). Each downstream module also publishes its own events for further subscribers.

documentloader (downstream from documentextractor)

Handles: Batch lifecycle + DB writes + Family linking

Functional Area How documentloader handles it
Batch completion (8 pts) Job processor manages batch lifecycle — creates SQS/Lambda per batch, monitors queue depth, does multi-pass DLQ redrive, atomic teardown. Replaces Legacy's polling loop (next_check_for_complete_time_gmt), BatchCleanup.process, and Exhibit.request_indexing.
Family linking (3 pts) documentloader assigns family_id during ingestion via email thread detection. Replaces Legacy's backfill_email_family_id post-processing.

Key insight: documentloader's job processor replaces the Legacy batch polling/retry/completion/cleanup machinery. Combined with documentextractor's ingestion trigger, these two modules account for 31 of the 86 divergence points.

documentuploader (Nutrient/PSPDFKit infrastructure)

Handles: Document viewing + Placeholders + Provides infrastructure for bates/markups

Functional Area How documentuploader handles it
Document viewer (16 pts) This is the fundamental rendering shift. Legacy stores individual page images (TIFF/PNG) on S3 — bates stamps, confidentiality codes, and coding are applied directly to those image files. NGE has only the PDF in Nutrient — no individual page images exist. All annotations (bates, confidentiality, coding) are Nutrient overlays rendered on-demand. documentuploader provisions the Nutrient document; Rails reads via NextpointNutrient.get_cached_filename_for_theater.
Placeholders (1 pt) NGE creates Nutrient layers directly (create_instant_layer_for_nge) instead of generating local placeholder files and uploading to S3.
Bates/markups infrastructure documentuploader provisions the Nutrient document (nutrient_id = document_{case}_{batch}_{nge_doc}_{id}) that bates stamping and markups operate against.

Key insight: documentuploader doesn't just upload — it establishes the Nutrient document that the entire NGE document viewing, stamping, and annotation stack depends on. Every NextpointNutrient.* call in the divergence map exists because documentuploader set up the Nutrient document.

documentpageservice (via NgePageService)

Handles: Page manipulation + triggers bates/OCR workflows

Functional Area How documentpageservice handles it
Document viewer — page operations NgePageService.process_nge_page_job() for reorder, rotate, add, remove, split. Called from attachment.rb:627 and document_natives_controller.rb:57. Sets processing_in_nge = true while operating.
Bates stamping (12 pts) Rails calls NextpointNutrient.nutrient_document_info() for page counts (set up by documentuploader), then stamps via ExhibitNutrientAction concern. documentpageservice handles the underlying page manipulation when pages need OCR or regeneration (native_pdf_ocr_job).
Toolbar locking (5 pts) The processing_in_nge flag is set whenever documentpageservice is processing a document. This gates all toolbar actions (add/rotate/split/delete/replace) in the UI until the operation completes.

Key insight: documentpageservice is the reason processing_in_nge exists. Every toolbar lock and unlock in the divergence map is triggered by a documentpageservice operation starting or completing.

documentexporter (via NgeExportJob)

Handles: Export/production

Functional Area How documentexporter handles it
Export/production (8 pts) NgeExportJob invokes {region}-{env}-nge-export-lambda async. Step Functions + ECS Fargate handle image conversion, PDF rendering, ZIP assembly. Replaces Legacy's ExhibitZipVolumeWorker + ExhibitLoadfileWorker. Different export size calculation (size_of_nge_zips vs export_volumes.first.file_size) and notification emails.

Key difference — page images vs PDF: Legacy stores individual page images (TIFF/PNG) on S3 with bates stamps, confidentiality codes, and coding applied directly to the image files. NGE only has the PDF document in Nutrient — no individual page images exist. So documentexporter must use Nutrient to overlay bates/confidentiality/coding annotations onto the PDF when generating export images. This is why the export rendering and size calculation differ between NGE and Legacy.

documentexchanger

Handles: Wire transfer replacement

Functional Area How documentexchanger handles it
Wire transfer (2 pts) documentexchanger with dynamic Lambda+SQS provisioning per exchange and AWS Glue ETL replaces the Legacy wire transfer system. The Legacy wire buttons are simply hidden in NGE UI.

unzipservice

Part of the ingestion pipeline, invoked by documentextractor:

Module Role in divergence
unzipservice Archive extraction during ingestion. Replaces Legacy's ContainerWorker for ZIP/RAR/7Z.

Nutrient (PSPDFKit) — Cross-Cutting Service

Nutrient is not an NGE module but a SaaS dependency that multiple modules and Rails itself call directly. It appears in 37 of the 86 divergence points:

Caller Nutrient Usage
documentuploader Provisions documents, generates page images
Rails — ExhibitNutrientAction Bates stamping, confidentiality stamps, page labels
Rails — BatesStampJob Page count validation against Nutrient
Rails — AutoRedactionJob Term and pattern redactions
Rails — SyncAnnotationIdsJob Annotation ID reconciliation
Rails — SplitDocumentOnFlagsJob Document splitting via Nutrient API
Rails — theater_processor Theater view image retrieval
Rails — NativePlaceholder Non-imaged placeholder layer creation

Summary: Module → Functional Areas

documentextractor ───────┬── Document ingestion (16)   ← entry point via ProcessorApi
(ProcessorApi)           └── Batch cancellation (4)
                                                        Total: 20 points

documentloader ──────────┬── Batch completion (8)       ← downstream from extractor
(job processor)          └── Family linking (3)
                                                        Total: 11 points

documentuploader ────────┬── Document viewer (16)
(Nutrient provisioning)  └── Placeholders (1)
                                                        Total: 17 points

documentpageservice ─────┬── Bates stamping (12)
(NgePageService)         ├── Toolbar locking (5)
                         └── Markups/redactions (4)
                                                        Total: 21 points

documentexporter ────────── Export/production (8)
(NgeExportJob → Lambda)                                 Total: 8 points

documentexchanger ───────── Wire transfer (2)
                                                        Total: 2 points

No NGE module ───────────── Global UI (7)
(CSS/JS only)                                           Total: 7 points

Note: Rails only talks to documentextractor (via ProcessorApi) and documentloader events come back via Athena. unzipservice is invoked by documentextractor for archive extraction and is transparent to Rails.

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.