NGE vs Legacy Code Divergence Map¶
Overview¶
This document maps every location in the Rails codebase where execution diverges
based on whether a case is NGE or Legacy. A case is permanently one or the other —
nge_enabled? is set at case creation and never changes.
Three conditional flags control divergence:
1. Npcase#nge_enabled? — Is the case NGE? Used broadly across models, controllers, views, jobs.
2. Batch#nge_batch? — Returns processor_job_id.present?. A batch-level check (NGE batches have a processor job ID assigned by ProcessorApi.import()).
3. Exhibit#processing_in_nge — Boolean column. Set to true when a document is actively being processed by Nutrient/DPS. Gates toolbar actions in the UI.
Total divergence points: 85+ (43 in backend, 45+ in views/frontend)
1. Document Ingestion & Import¶
How documents enter the system — completely different pipelines.
Backend¶
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 1 | batches_controller.rb:297 |
nge_batch? |
Call nge_batch_process: check event dupes, queue NgeCaseTrackerJob (2 min delay) |
Call legacy_batch_process directly |
| 2 | batches_controller.rb:377 |
unless nge_batch? |
Skip setting next_check_for_complete_time_gmt |
Set next check time to 60s (polling-based completion) |
| 3 | batch.rb:845 |
— | Definition: nge_batch? = processor_job_id.present? |
— |
| 4 | batch.rb:466 |
nge_batch? |
Include batch_process_details in meta update |
Exclude it |
| 5 | batch.rb:644 |
nge_batch? |
Update ImportStatus records with batch status |
Return immediately (no ImportStatus) |
| 6 | batch_pst.rb:7 |
nge_enabled? |
Call restructure_pst_info (JSON-ready format) |
ExtractedPstFolderRollupCollection |
Views¶
| File | NGE | Legacy |
|---|---|---|
imports/_navigation.html.erb:1 |
Different import navigation | Standard navigation |
imports/select.html.erb:11 |
Mailbox: "OST, PST, MBOX" | Mailbox: "PST, MBOX" |
imports/select.html.erb:13 |
Production: "DAT, CSV, OPT, LOG" | Production: "DAT, CSV" |
imports/select.html.erb:41 |
Hidden additional import options | Shows additional options |
imports/select.html.erb:56 |
Button: "Start Import" | Button: "Next" |
batches/show.html.erb:12 |
Different batch detail rendering | Standard rendering |
batches/list.html.erb:12 |
Different batch list rendering | Standard rendering |
batches/_settings.html.erb:3 |
"Import Details: {name}" | "Import Data Settings" |
batches/dedupe.html.erb:10 |
Different dedup settings UI | Standard dedup UI |
general_settings/import.html.erb:11 |
Hidden | Shows import settings |
2. Batch Completion & Lifecycle¶
How batches are finalized — NGE skips all Legacy polling/retry machinery.
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 7 | batch_completion_job.rb:23 |
!nge_batch? |
Skip retrying failed batch split jobs | Retry failed batch parts if max not reached |
| 8 | batch_completion_job.rb:26 |
unless nge_batch? |
Skip in-progress check and BatchCleanup retry |
Check for in-progress jobs; run BatchCleanup.process |
| 9 | batch_completion_job.rb:62 |
!nge_batch? |
Never reschedule next check; proceed to complete_batch |
Reschedule via update_batch_next_check_time |
| 10 | batch_completion_job.rb:69 |
unless nge_batch? |
Skip Exhibit.request_indexing |
Trigger ES reindex |
| 11 | batch_completion_job.rb:130 |
unless nge_batch? |
Skip backfill_email_family_id |
Backfill family_id on exhibits |
| 12 | batch_completion_job.rb:134 |
nge_enabled? |
Add non-imaged placeholders for containers (pst, ost, mbox, zip, 7z, tar, rar) | Skip placeholder creation |
| 13 | batch_completion_job.rb:164 |
unless nge_batch? |
Skip processing_errors_array (errors come from Athena) |
Scan processing jobs, create BatchProcessingEvent records |
| 14 | batch_completion_job.rb:387 |
nge_batch? |
pdfs_to_create? → false (Nutrient handles PDFs) |
Check search_hilite for PDF generation |
Why so different: NGE handles retries via SQS, indexing via the loader pipeline,
error tracking via Athena, and PDF rendering via Nutrient. The Legacy polling/retry
loop in batch_completion_job is entirely replaced.
3. Batch Cancellation¶
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 15 | batch.rb:624 |
nge_batch? |
cancel_nge_import → ProcessorApi.cancel_import() (external API call, raises on failure) |
cancel_non_nge_import → local DB update + BatchStatusUpdateJob |
| 16 | batch_completion_job.rb:43 |
!nge_batch? |
On cancel: skip "still processing" check, cleanup immediately | Reschedule check if jobs still in progress |
| 17 | batches_controller.rb:276 |
can_cancel_batch(nge_batch?) |
Permission check with nge=true |
Permission check with nge=false |
| 18 | batches/_batch_sidebar.html.erb:48 |
can_cancel_batch(nge_batch?) |
NGE permission check for cancel button visibility | Legacy permission check |
Tag / Custom Field Creation During Import¶
How tag values are deduplicated during load file imports — fundamentally different approaches.
| Aspect | NGE (documentloader) | Legacy (Rails workers) |
|---|---|---|
| Dedup mechanism | MySQL TagDedupe table with SHA256 hash + PK constraint |
Elasticsearch query per tag value |
| Race condition handling | SAVEPOINT + IntegrityError catch + 3 retries | ES acts as read-after-write cache |
| Tagging (exhibit↔tag) | INSERT IGNORE bulk insert |
Per-row ActiveRecord create |
| ES in write path? | No — ES indexed separately for search | Yes — queried per value to check existence |
| DB access | Direct via RDS Proxy (no Rails) | Through ActiveRecord connection pool |
| Performance at scale | Scales linearly with MySQL write throughput | Degrades with ES query volume (millions of queries on large imports) |
Key files:
- NGE: documentloader/shell/tags_ops.py (insert_or_get_tag_id), shell/taggings_ops.py (INSERT IGNORE)
- Legacy: Rails Tag model + Elasticsearch check before creation
Why Legacy uses ES: Parallel workers (fork-based) create race conditions on tag creation. ES check prevents duplicates across workers. This is intentional and correct for Legacy's architecture.
Why NGE doesn't need ES: MySQL atomic constraints (TagDedupe PK + SAVEPOINT rollback) handle concurrent Lambda invocations natively. SHA256 hash works around MySQL's varchar(255) key length limit for tag names up to varchar(2000).
For new modules (ADR-005, ADR-006): Follow documentloader's TagDedupe pattern. Never replicate Legacy's ES-check-before-write pattern in NGE modules.
Document Deduplication During Import¶
How duplicate documents are detected — insert-based (NGE) vs query-based (Legacy).
| Aspect | NGE (documentloader) | Legacy (Rails) |
|---|---|---|
| Storage | Separate doc_dedupe table (insert-based detection) |
No separate table — queries Exhibits table directly |
| Dedup key | Composite PK: (npcase_id, message_id, bcc, md5, doc_type) |
Query by expansive_hash + email_message_id on Exhibits |
| Hash algorithm | MD5 (content_hash or attachments_hash) |
MD5 → then SHA256 of MD5 hashes (expansive_hash) |
| When checked | Before exhibit creation (PROCESS_STARTED checkpoint) | After exhibit creation (post-creation merge) |
| Duplicate handling | Skip creation, return existing exhibit IDs | Create exhibit, then link as duplicate via ExhibitDedupeMerger |
| Race condition handling | SAVEPOINT + IntegrityError + retry with backoff + force-create fallback | Optimistic locking (ActiveRecord) |
| Email-specific logic | message_id = email_message_id; bcc part of key (metadata-aware) |
email_message_id query + my_dupe?() metadata filtering (author, date, reply_id) |
| Attachment logic | message_id = family_id (parent email); md5 = content_hash |
Grouped via expansive_hash (hash of all attachment hashes in family) |
| Scope | Per-case (npcase_id in PK) |
Per-case (implicit in case DB schema) |
Key files:
- NGE: documentloader/shell/exhibit_ops.py (add_exhibit(), resolve_dedupe_fields(), find_dupes())
- NGE model: core/models/db_models.py (DocDedupe class)
- Legacy: rails/app/models/exhibit.rb (dupe?(), merge_as_dupe(), ExhibitDedupeMerger)
Why NGE uses insert-based (DocDedupe table): Detecting duplicates via failed INSERT is atomic — no gap between "check" and "create" where a race condition can occur. The composite PK encodes document identity (content hash + email metadata + doc type). SAVEPOINT allows graceful rollback without aborting the entire transaction.
Why Legacy uses query-based: Legacy creates the exhibit first, then checks if it's a duplicate. This allows richer metadata-based dedup (my_dupe?() checks author, date, reply_id) but means duplicate exhibits are temporarily created and then merged.
Key behavioral difference: NGE prevents duplicate exhibits from being created. Legacy creates duplicates and then merges them. NGE approach is more efficient for large imports with high duplicate ratios — no wasted DB writes for documents that will be merged anyway.
Batch settings that control dedup:
- allow_dupes on Batch — if true, skip DocDedupe insert entirely (both NGE and Legacy)
- dedupe_using_message_id? — use email message_id for dedup
- dedupe_using_meta? — use metadata fields for dedup
- bcc_merge_on — include/exclude BCC in dedup key
For new modules: Follow documentloader's insert-based DocDedupe pattern. Pre-check dedup is cheaper than post-creation merge.
4. Document Viewer & Page Rendering¶
How documents are displayed — Nutrient (PSPDFKit) vs S3 page images.
Backend¶
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 19 | theater_processor.rb:316 |
nge_enabled |
Get image via NextpointNutrient.get_cached_filename_for_theater |
Get image via NextPointS3.get_cached_filename |
| 20 | theater_document_decorator.rb:51 |
nge_enabled |
preload = false (don't preload theater images) |
preload = true |
| 21 | exhibit.rb:2050 |
nge_doc? |
Spreadsheet URL: rewrite to /xlsx/{id}/content.xlsx (NGE S3 layout) |
Replace extension with .xlsx |
| 22 | documents_controller.rb:245 |
NGE-only | regenerate_pdf: creates native_pdf_ocr_job, sets processing_in_nge = true |
No legacy equivalent |
| 88 | document_editor.rb:837 |
nge_enabled? |
process_generate_pdf_options: download PDF via Nutrient API |
Render generate_pdf_options modal for user |
| 89 | document_editor.rb:1174 |
nge_enabled |
select_page_heights: get heights from Nutrient API (nutrient_document_info) |
Query attachments table for page heights |
Views¶
| File | NGE | Legacy |
|---|---|---|
documents/show.html.erb:57 |
Render _nge_document_toolbar partial |
Skip |
documents/_nge_document_toolbar.html.erb |
Entire partial (NGE-only) | N/A |
documents/_scrollable_page_set.html.erb:2,33,83 |
Hidden (3 blocks) | Page thumbnails with S3 images |
documents/_action_bar.html.erb:52 |
"Wire" button hidden | Shows "Wire" button |
documents/_exhibit_page_content_preview.html.erb:179 |
Different preview rendering | Standard preview |
documents/show_document_files.html.erb:65 |
Hidden | Shows document files section |
documents/_js_templates.html.erb:227 |
Wire button hidden | Shows wire button |
page_notes/_sidebar_document_notes.html.erb:1 |
Hidden | Shows sidebar notes |
React Components¶
| File | NGE | Legacy |
|---|---|---|
NgeDocumentViewPdf.jsx:12 |
isProcessingInNge gates PDF viewer |
N/A |
DocumentToolbar.jsx:87 |
isProcessingInNge disables toolbar |
Standard toolbar |
DownloadControl.jsx:13 |
"Original files are not available" | "Original File Removed" |
documentViewPdfUtils.jsx:67,92 |
Different PDF rendering setup | Standard rendering |
5. Bates Stamping & Numbering¶
Page numbering — NGE validates against Nutrient page counts.
Backend¶
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 23 | bates_stamp_job.rb:33 |
@is_nge |
Send admin emails on Nutrient API failures or page count mismatches | Skip ensure block |
| 24 | bates_stamp_job.rb:81 |
@is_nge |
Send bates_stamp_processing_finished_email, return |
Queue BatesStampCompletionJob, send notification |
| 25 | bates_stamp_job.rb:106 |
@is_nge |
Call NextpointNutrient.nutrient_document_info for page count; limit bates to Nutrient pages |
Use DB-based verified page count only |
| 26 | bates_stamp_job.rb:135 |
@is_nge |
Stop stamping at Nutrient page count; reset processing_in_nge |
Stamp all verified pages |
| 27 | bates_stamp_job.rb:146 |
nge_enabled? |
Reset processing_in_nge = false after stamping |
Do not touch flag |
| 28 | bates_management_controller.rb:12 |
has_attribute?(:processing_in_nge) |
Set processing_in_nge = true before processing |
Do not set flag |
| 29 | exhibit.rb:508 |
nge_enabled? |
After removing bates, reset processing_in_nge = false |
Only remove bates |
Views¶
| File | NGE | Legacy |
|---|---|---|
label_bates/new.html.erb:158 |
Hidden additional options | Shows options |
general_settings/_exhibit_stamp_template.html.erb:13 |
Different stamp template | Standard template |
general_settings/update_stamp_format.html.erb:1,6 |
Different stamp format | Standard format |
production_endorsement_schemes/_form.html.erb:165,199 |
Different endorsement options | Standard options |
production_endorsement_schemes/placeholder_stamp_image.html.erb:25 |
Different placeholder stamp | Standard stamp |
Stamp Configuration (NGE-only fields)¶
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 86 | general_settings_controller.rb:12 |
nge_enabled |
Set stamp_placement (vertical/horizontal) + stamp_names (array of name/position) |
Skip — uses stamp_format + use_background_color_for_stamp only |
| 87 | general_settings_controller.rb:220 |
nge_enabled? |
Set confidentiality_stamps_position (left/right) on ConfidentialityCode |
Skip — not applicable for Legacy |
6. Image Markups & Redactions¶
How annotations are applied — NGE uses Nutrient API, Legacy uses processing jobs.
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 30 | image_markups_controller.rb:50 |
nge_enabled? |
Auto-redaction: set processing_in_nge = true, queue AutoRedactionJob. Other markups: skip_processing_jobs: true |
Original markup logic with processing jobs |
| 31 | auto_redaction_job.rb:138 |
has_attribute?(:processing_in_nge) |
Reset processing_in_nge = false after completion |
N/A — entire job is NGE-only |
| 32 | sync_annotation_ids_job.rb:59 |
Always | Reset processing_in_nge = false after sync |
N/A — entire job is NGE-only |
Views¶
| File | NGE | Legacy |
|---|---|---|
image_markups/edit.html.erb:20 |
Different markup editor | Standard editor |
7. Toolbar & UI Locking¶
NGE locks toolbar actions while Nutrient is processing a document.
| # | File | Condition | When processing_in_nge = true |
When idle / Legacy |
|---|---|---|---|---|
| 33 | toolbar_permission_helper.rb:74 |
!processing_in_nge? |
Disable "add new page" | Allow |
| 34 | toolbar_permission_helper.rb:86 |
!processing_in_nge? |
Disable "rotate/duplicate page" | Allow |
| 35 | toolbar_permission_helper.rb:95 |
!processing_in_nge? |
Disable "split/delete page" | Allow |
| 36 | toolbar_permission_helper.rb:109 |
!processing_in_nge? |
Disable "add/replace native" | Allow |
| 37 | documents/_js_templates.html.erb:205 |
nge_enabled? |
Disable export during annotation jobs | Allow |
8. Non-Imaged Placeholders¶
How placeholder pages are created for container files and non-imaged documents.
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 38 | non_imaged_placeholder.rb:31 |
nge_enabled? |
create_instant_layer_for_nge, set S3 path directly |
setup_placeholder_file (create local file, upload to S3) |
9. Family Linking & Batch Details¶
How email thread/family relationships are displayed.
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 39 | batch_family_linking.rb:7 |
nge_enabled? |
Return linked batches with merged username (hash) |
Raw AR relations |
| 40 | batch_family_linking.rb:20 |
nge_enabled? |
Merge username and docs count into hashes |
Raw AR query result |
| 41 | batch_family_linking.rb:61 |
nge_enabled? |
Return [{key:, value:}] array (JSON-ready) |
HTML string with <br/> separators |
10. Export & Production¶
How document exports are rendered and delivered.
Backend¶
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 42 | batch_notification_decorator.rb:20 |
nge_batch? |
Show description for unknown events | Show external text only |
| 43 | share_job_mixins.rb:73 |
— | Pass nge_enabled: flag to share job payload |
— |
Views¶
| File | NGE | Legacy |
|---|---|---|
exports/_export.html.erb:14 |
Different export rendering | Standard |
exports/_export.html.erb:40 |
size_of_nge_zips |
export_volumes.first.file_size |
exports/show.html.erb:28,61 |
Different export detail rendering | Standard |
notification/shared_export_*.erb (4 files) |
Different export notification emails | Standard emails |
general_settings/edit_confidentiality_code.html.erb:24 |
Different confidentiality code editing | Standard |
application/_download_all.html.erb:5 |
Different download behavior | Standard |
11. Global UI & Layout¶
Platform-wide UI differences.
| File | NGE | Legacy |
|---|---|---|
layouts/application.html.erb:32 |
Body class: nge (enables global CSS) |
Body class: legacy |
general/_js_support.html.erb:7 |
Sets NP['is_nge_enabled'] = 'true' |
'false' |
general/_current_case.html.erb:3,11 |
Container: nge-case_name_for_display_container + NGE indicator |
Standard container |
general/_case_access_list.html.erb:59 |
Shows NGE indicator on case list | No indicator |
general/tab_bars/_import_export_center_tab_bar.html.erb:26 |
Different tab bar | Standard tab bar |
general/_banner_editor.html.erb:44 |
Different banner editor | Standard |
documents/_labels_editor.html.erb:51 |
Calls handleExhibitStamp on label save |
Standard save |
12. Review & Coding (Shared — No Divergence in Logic)¶
Review and coding (applying labels, privilege designations, confidentiality codes, review status) work on both NGE and Legacy cases. The business logic is identical — the only difference is the rendering layer:
| Aspect | Legacy | NGE |
|---|---|---|
| Document rendering | Page images (TIFF/PNG) loaded from S3 | PDF rendered from Nutrient with annotation overlays |
| Review controller | Standard page data | Sets Nutrient secrets for PDF viewer (reviews_controller.rb:129) |
| Label save | Standard save | Also calls handleExhibitStamp to update Nutrient bates overlay (_labels_editor.html.erb:51) |
| Bates/confidentiality display | Baked into page images | Nutrient overlay layers rendered on-demand |
| Coding overlay import | Same logic | Same logic (no nge_enabled? checks in coding_overlays_controller.rb) |
This is an important distinction: review and coding are not divergent features — they are shared workflows where the underlying document representation differs (page images vs PDF). The business logic (label assignment, privilege tagging, review status, bulk coding) is identical across both systems.
13. Document Exchange (Wire)¶
"Wire" is the internal codebase name; "Exchange" is the user-facing product name
(configured via $global_config[:wire_product_name] = 'Exchange' in nextpoint_global.yml).
Same operation — transferring documents between cases.
This is a Legacy-only feature — hidden in NGE UI. The wire system itself has minimal NGE-specific code, but a cross-type validation prevents transfers between NGE and Legacy cases.
| # | File | Condition | NGE Path | Legacy Path |
|---|---|---|---|---|
| 90 | general_settings_controller.rb:294 |
nge_enabled != target.nge_enabled |
Block wire transfer with db_type_mismatch error |
Same — prevents cross-type transfers in both directions |
| File | NGE | Legacy |
|---|---|---|
documents/_action_bar.html.erb:52 |
"Exchange" button hidden | Shows "Exchange" button |
documents/_js_templates.html.erb:227 |
Wire button hidden in toolbar | Shows wire button |
Legacy wire transfer architecture:
Multi-phase approval workflow for transferring documents between cases:
Models: OutgoingWire (source), IncomingWire (destination), ExhibitOutgoingWire (join)
Jobs:
- WireSetupJob — Links exhibits, deduplicates, advances phases
- DocumentShareGenerationJob — Creates SQLite DB + CSV loadfile of selected exhibits
- WireConfirmationJob — Cross-case/cross-account approval handshake
- DocumentShareJob — Executes transfer: creates IncomingWire + batch (wire_transfer type)
in target case, iterates SQLite DB, copies exhibits/attachments/S3 files per document
- DirectDocumentShareJob — Shortcut for intra-account transfers
- DepositionShareJob — Deposition-specific transfers
Flow: User selects exhibits → WireSetupJob → DocumentShareGenerationJob (loadfile) →
optional loadfile review → optional work order approval → cross-account approval via
WireConfirmationJob → DocumentShareJob copies documents to target case DB + S3.
NGE interaction: Only nextpoint_share_job_mixins.rb:73 — when wire creates a
new destination case, it propagates nge_enabled from source case.
Legacy wire vs NGE documentexchanger:
| Aspect | Legacy Wire | NGE documentexchanger |
|---|---|---|
| Trigger | User clicks Exchange → WireSetupJob |
API Gateway (sync) + SQS (async) dual entry point |
| Approval workflow | Multi-phase (OutgoingWire: initial_setup → loadfile → work_order → target_approval → fully_approved) |
Not yet integrated into Rails |
| DB transfer | DocumentShareJob iterates SQLite DB, copies exhibits one by one via document.transfer! |
AWS Glue ETL for bulk database copy |
| S3 file copy | Per-document S3 copy inside DocumentShareJob |
Per-document Lambda processors via dynamic SQS queues |
| Annotations | Copies page images (bates/confidentiality already baked in) | Must process via Nutrient (PDF overlays, no page images) |
| Infrastructure | Sidekiq jobs on shared Redis queue | Dynamic Lambda + SQS provisioned per exchange, torn down on completion |
| OCR | Copies existing search text | Re-OCR via Hyland Filters (source is PDF, not page images) |
NGE status: documentexchanger is built but not yet integrated into the Rails
app. The Legacy wire buttons are simply hidden via nge_enabled? in views.
NGE-Only Code (No Legacy Equivalent)¶
These entire components exist only for NGE cases:
| Component | Type | Purpose |
|---|---|---|
NgeCaseTrackerJob |
Sidekiq job | Polls Athena for NGE processing events |
NgeExportJob |
Sidekiq job | Invokes documentexporter Lambda |
AutoRedactionJob |
Sidekiq job | Nutrient-based auto-redaction |
SyncAnnotationIdsJob |
Sidekiq job | Reconciles Nutrient annotation IDs |
NgePageService |
Service | HTTP client for documentpageservice |
ProcessorApi |
Lib | HTTP client for NGE Processor API |
ProcessorApiHelper |
Helper | Import payload builder for NGE |
Batch::AsUpload workflow |
Model | before_commit → initiate_import for NGE |
mixins/nge_batches.rb |
Controller | NGE batch listing, Athena queries |
_nge_document_toolbar.html.erb |
View | NGE document toolbar partial |
NgeDocumentViewPdf.jsx |
React | NGE PDF viewer component |
Legacy Functionality Not Yet Modularized into NGE¶
The Nextpoint platform has two suites: Discovery (document processing, search, review) and Litigation (video, depositions, transcripts, treatments). Many features are common across both suites. NGE modularized the common processing workflows (import, export, exchange).
Common Functionality (Both Discovery and Litigation)¶
These features work in both case types (trial_prep = Litigation, review = Discovery):
| Area | Key Components | NGE Status |
|---|---|---|
| Import/Upload | ImportsController, BatchesController, S3UploadController, CaseFolderController |
Modularized (extractor → loader → uploader) |
| Export/Production | ExportsController, ProductionTemplatesController, DocumentExportJob |
Modularized (documentexporter) |
| Exchange/Wire | OutgoingWiresController, IncomingWiresController, wire jobs |
Built (documentexchanger, not yet integrated) |
| Document Viewer | DocumentsController, DocumentPagesController, AttachmentsController |
Shared — page images (Legacy) vs Nutrient PDF (NGE) |
| Search | SearchController, SearchAggregationController, Elasticsearch 7.4 |
Legacy only |
| Labels/Coding | LabelsController, CodingOverlaysController, BulkLabelsController |
Shared — same logic, different rendering |
| Bates/Stamps | BatesManagementController, ExhibitStampingController, BatesStampJob |
Shared — Nutrient overlays (NGE) vs page images (Legacy) |
| Markups/Redactions | ImageMarkupsController, HighlightsController, PageNotesController |
Shared — Nutrient API (NGE) vs processing jobs (Legacy) |
| Custodians | CustodiansController, CustodianExhibitsController |
Legacy only |
| Custom Fields/Grid | CustomFieldsController, GridColumnsController, GridTemplatesController |
Legacy only |
| Family Linking | FamilyLinkingsController, FamilyLinkingJob |
Legacy only (NGE handles during ingestion) |
| Reporting | CustomReportController, UserActivityReportsController, AnalyticsController |
Legacy only |
| Case Management | NpcasesController, CasePermissionsController, CaseNotesController |
Legacy only (core platform) |
| User/Account | AccountsController, UsersManagementController, UserLicensesController |
Legacy only (core platform) |
| AI | AiAssistantController, ChatbotController (Bedrock) |
Legacy only |
| Bulk Operations | BulkDeleteJob, BulkRestoreJob, BulkActionJob, BatchLabelJob |
Legacy only |
Additional common features not listed above:
| Area | Components | NGE Status |
|---|---|---|
| Document Review | ReviewsController, sub-review assignments |
Shared — same logic, different rendering |
| Chronology | ChronologyController — timeline view |
Legacy only |
| File Room | FileRoomController — virtual binders |
Legacy only |
| Search Hit Reports | SearchHitReportController, searcher/post-processing jobs |
Legacy only |
Litigation-Specific Features (Legacy Only)¶
| Feature | Components |
|---|---|
| Evidence Dashboard | EvidenceController (verify_trial_prep required) |
| Theater/Presentation | TheaterController, Theater::PagesController |
Litigation Suite — Processing (Separate Domain, Entirely Legacy)¶
The Litigation processing workflows handle video, depositions, and trial presentation — none have NGE equivalents.
EC2 Workers:
| Worker | Function |
|---|---|
TranscodeWorker |
Video transcoding via FFmpeg |
VideoStitchWorker |
Multi-segment video stitching |
VideoSyncWorker |
Video synchronization |
FlvConversionWorker |
FLV format conversion |
UpdateVideoAspectRatioWorker |
Video metadata update |
TranscriptParseWorker |
Deposition transcript parsing (LEF, PTX, CMS formats) |
DepositionZipWorker |
Deposition package extraction |
TreatmentWorker |
Litigation presentation images (callout/highlight composites) |
Sidekiq Jobs:
| Job | Function |
|---|---|
DesignationVideoJob |
Video designation processing |
DepositionPdfJob |
Deposition PDF generation |
DepositionTextJob |
Deposition text extraction |
DepositionShareJob |
Deposition sharing between cases |
DepositionSummaryReportJob |
Deposition summary reports |
TranscriptMetadataReportJob |
Transcript metadata reports |
DepositionDesignationMergeJob |
Merge deposition designations |
DepositionVolumeExhibitsInFolderLinkerJob |
Link exhibits in deposition folders |
Discovery Suite — Not Yet Modularized¶
These Discovery features run in Legacy Rails/Sidekiq with no NGE module equivalent:
| Category | Components | Description |
|---|---|---|
| Search | SearchHitReportSearcherJob, SearchHitReportPostProcessingJob, SearchHitReportDeletionJob, Elasticsearch 7.4, custom Parslet query DSL |
Full-text search, hit reports |
| Review/Coding | Review UI, theater, coding overlays, labels, privilege — shared across NGE/Legacy | Core review workflow (same logic, different rendering) |
| Bulk Operations | BulkDeleteJob, BulkRestoreJob, BulkLabelActivationJob, BulkLabelDestroyJob, BulkActionJob, BulkSubreviewAssignmentJob, BatchLabelJob, SubreviewSplitJob |
Mass document operations |
| Bates/Stamps | BatesStampJob, BatesRemovalJob, BatesStampCompletionJob, BatesRemovalCompletionJob, ConfidentialityStatusJob, RedactAnnotationsJob |
Runs on both NGE and Legacy (uses Nutrient for NGE) |
| Document Operations | PageDeleteJob, SplitDocumentOnFlagsJob, DocumentPdfCompletionJob, CodingOverlayJob, FamilyLinkingJob, CustodianUpdateJob, CustodianDestroyerJob, NearDupeTrackerJob |
Individual document manipulation |
| Wire/Exchange | WireSetupJob, WireConfirmationJob, DocumentShareJob, DocumentShareGenerationJob, DirectDocumentShareJob, RemoteWireConfirmationJob |
Legacy wire system (documentexchanger built but not integrated) |
| Reports | CustomReportJob, UserActivityReportJob, ReviewLogJob, PageCountReportForRelevancyJob, GridDataExportJob |
Custom reports, user activity |
| Export Utilities | ExportCopyJob, PdfLambdaJob, CaseFolderImportJob |
Export duplication, legacy import |
EC2 Workers (Discovery — Legacy only):
| Worker | Function |
|---|---|
SpreadsheetConversionWorker |
XLS/XLSB/CSV → XLSX for spreadsheet viewer |
DocumentPropertiesUpdateWorker |
Document metadata extraction |
FileEmailWorker |
Email document files to users |
DownloadExhibitPdfWorker |
PDF download generation |
Platform Services — Entirely in Rails¶
| Service | Description |
|---|---|
| Authentication | Cognito SRP + session-based + HMAC-SHA1 API auth |
| Authorization | RBAC via Action/Role/RoleNpcase tables |
| User Management | Account/User/NpcaseUser CRUD, Cognito provisioning |
| Case Management | Create/archive/delete cases, per-case DB provisioning |
| Billing | Account billing, ingestion limits, plan management |
| AI Features | Bedrock agent, AI summaries, chatbot |
| Notifications | DelayedEmailJob, BannerAlertJob, email, alerts, audit logging |
| Admin/Ops | Background job management, EC2 monitoring, ES indexing, DatabaseArchiveJob |
Summary: What's Modularized vs What's Not¶
MODULARIZED: NOT MODULARIZED:
──────────── ────────────────
Processing (Stage 5): LITIGATION SUITE (separate domain):
✓ Document ingestion (extractor) ✗ Video (transcode, stitch, sync)
✓ Content extraction (extractor) ✗ Treatments (presentation images)
✓ DB writes + batch lifecycle (loader)
✓ Page image generation (uploader) COMMON / DISCOVERY (still in Legacy):
✓ Page manipulation (pageservice) ✗ Bulk operations (delete, label, restore)
✓ Archive extraction (unzipservice) ✗ Bates/confidentiality (shared, Nutrient for NGE)
✗ Wire approval workflow (exchanger not integrated)
Analysis (Stage 7): ✗ Reporting (custom, user activity)
✓ Search query parser (QLE — production)
◐ Search hit reports (SHR — prototype) PLATFORM (core Rails):
✓ AI transcript summaries (nextpoint-ai) ✗ Auth + RBAC + case mgmt + billing
✗ Notifications + admin tooling
Production (Stage 8):
✓ Export/production (exporter) SEPARATE PRODUCT:
◆ Data Mining (eda + eda-front-end)
Cross-stage: Own architecture, own AWS accounts
◐ Document exchange (exchanger, not live)
✓ = production ◐ = built/prototype ◆ = separate product
Summary¶
| Functional Area | Backend | Views | Total | Core Difference |
|---|---|---|---|---|
| Document ingestion | 6 | 10 | 16 | ProcessorApi HTTP vs Legacy workers |
| Batch completion | 8 | 0 | 8 | External (Athena/Nutrient) vs internal polling/retry |
| Batch cancellation | 3 | 1 | 4 | External API cancel vs local DB update |
| Document viewer | 4 | 12 | 16 | Nutrient/PSPDFKit vs S3 page images |
| Bates stamping | 7 | 5 | 12 | Nutrient page count validation vs DB count |
| Markups/redactions | 3 | 1 | 4 | Nutrient API + AutoRedactionJob vs processing jobs |
| Toolbar locking | 4 | 1 | 5 | processing_in_nge flag gates actions |
| Placeholders | 1 | 0 | 1 | Instant Nutrient layer vs local file upload |
| Family linking | 3 | 0 | 3 | JSON hashes vs AR objects/HTML |
| Export/production | 2 | 6 | 8 | Different size calc, rendering, emails |
| Global UI | 0 | 7 | 7 | Body class, JS global, indicators |
| Wire transfer | 0 | 2 | 2 | Hidden in NGE (uses documentexchanger) |
| TOTAL | 43 | 45 | 86 |
EDRM Mapping¶
The EDRM (Electronic Discovery Reference Model) defines 9 stages for how digital data flows through litigation. Here's how the Nextpoint platform maps to each stage, and what's modularized vs Legacy.
| EDRM Stage | Nextpoint Coverage | Suite | NGE Status | Legacy Components |
|---|---|---|---|---|
| 1. Information Governance | Not directly covered | N/A | N/A | N/A (pre-litigation) |
| 2. Identification | Custodian management spans Collection (assignment at import) and Review (reassignment via bulk ops) — not a separate stage in Nextpoint | Common | Legacy only | CustodiansController (part of stages 4 + 6) |
| 3. Preservation | S3 storage, PendingDelete (deletion prevention) |
Common | Legacy only | S3 lifecycle, case folder management |
| 4. Collection | Upload files to File Room, S3 case folder, cloud sources; custodian assignment | Common | Legacy only | FileRoomController, S3UploadController, CaseFolderController, DropboxController |
| 5. Processing | Import pipeline: file extraction, OCR, dedup, de-NISTing, format conversion, family linking, page image generation | Common | Modularized | ImportsController, BatchesController; NGE: extractor → loader → uploader; Legacy: EC2 workers (Preprocess → Container → Conversion → Page) |
| 6. Review | Document viewer, coding, labels, privilege, confidentiality, sub-reviews, bulk operations, custodian reassignment | Common | Shared logic (Nutrient vs page images) | ReviewsController, LabelsController, CodingOverlaysController, CustodiansController, bulk jobs |
| 7. Analysis | Search, search hit reports, analytics, chronology, AI summaries, near-dupe detection | Common | Partially modularized | query-language-engine (search parser — production), search-hit-report-backend (hit reports — prototype), nextpoint-ai (transcript summaries — production). Legacy: SearchController, AnalyticsController, ChronologyController |
| 8. Production | Export with bates stamps, confidentiality codes, load files, productions | Common | Modularized | ExportsController, ProductionTemplatesController, Legacy ExhibitZipVolumeWorker |
| 9. Presentation | Theater mode, treatments, video depositions, designations | Litigation | Legacy only | TheaterController, TreatmentsController, DepositionsController |
NGE Modules by EDRM Stage¶
EDRM Stage Module(s) What They Replace/Add
────────── ───────── ────────────────────
4. Collection (no module — File Room, File upload and case folder
S3 upload remain in Legacy) management stay in Rails
5. Processing ──→ documentextractor (entry point + PreprocessWorker, ContainerWorker,
file conversion) ConversionWorker (LibreOffice, Tika)
unzipservice (archive extraction) ContainerWorker for ZIP/RAR/7Z
documentloader (DB writes, dedup) BatchCompletionJob, family linking
documentuploader (page images) PageWorker (image gen, Nutrient)
documentpageservice (page ops) Page manipulation workers
7. Analysis ──→ query-language-engine (search Legacy Ruby/Parslet parser in
parser, TypeScript ECS) lib/search/ (production)
search-hit-report-backend (hit SearchHitReportSearcherJob
reports, Ruby Lambda) (prototype)
nextpoint-ai (AI transcript New capability — Bedrock Claude
summaries, Python Lambda) summaries (production)
8. Production ──→ documentexporter (Lambda + Step Fn) ExhibitZipVolumeWorker + LoadfileWorker
Cross-stage ──→ documentexchanger (not integrated) Wire/exchange system (stage 4→8)
What This Tells Us¶
NGE tackled the compute-heavy stages first: Processing (5) and Production (8) are where the heavy lifting happens — file conversion, OCR, image generation, PDF rendering, ZIP assembly. These benefit most from Lambda/ECS auto-scaling.
Analysis (7) is now being modularized: Three new services are extracting functionality from the Rails monolith: - query-language-engine — production, replaces Legacy Parslet parser - search-hit-report-backend — prototype, offloads ES search + Parquet/Athena analytics - nextpoint-ai — production, adds new AI summarization capability (Bedrock Claude)
The human-intensive stages remain in the monolith: Review (6) and Presentation (9) are interactive, UI-driven workflows where the bottleneck is human decision-making, not compute. These are well-served by the Rails monolith + Sidekiq.
Complete Platform Map by EDRM Stage¶
NEXTPOINT PLATFORM — EDRM Stage Mapping
═══════════════════════════════════════════
EDRM Stage 1: Information Governance
(Not covered — pre-litigation)
EDRM Stage 2: Identification
Custodian management spans stages 4 + 6 (not a separate stage in Nextpoint)
EDRM Stage 3: Preservation
Legacy: S3 storage, PendingDelete (deletion prevention)
EDRM Stage 4: Collection
Legacy: FileRoomController, S3UploadController, CaseFolderController,
DropboxController, custodian assignment at upload time
EDRM Stage 5: Processing ─── MODULARIZED
NGE: ✓ documentextractor — pipeline entry point, file conversion (Hyland)
✓ documentloader — DB writes, batch lifecycle, dedup, family linking
✓ documentuploader — Nutrient page images (PDF, no page images)
✓ documentpageservice — page reorder/rotate/add/remove/split (PDFBox)
✓ unzipservice — archive extraction (ZIP/RAR/7Z/TAR/GZIP/BZIP2)
Legacy: PreprocessWorker → ContainerWorker → ConversionWorker → PageWorker
EDRM Stage 6: Review ─── SHARED (same logic, different rendering)
Legacy Rails (both NGE + Legacy cases):
ReviewsController, LabelsController, CodingOverlaysController,
CustodiansController, bulk ops (delete/restore/label/subreview)
Rendering: Legacy = page images (TIFF/PNG) from S3
NGE = PDF from Nutrient with annotation overlays
EDRM Stage 7: Analysis ─── PARTIALLY MODULARIZED
NGE: ✓ query-language-engine — search query parser (TypeScript ECS)
◐ search-hit-report-backend — hit reports (Ruby Lambda, prototype)
✓ nextpoint-ai — AI transcript summaries (Bedrock)
◐ neardupe — near-dupe detection (PySpark EMR, POC)
Legacy: SearchController, AnalyticsController, ChronologyController,
Elasticsearch 7.4, near-dupe (Databricks production), custom reports
EDRM Stage 8: Production ─── MODULARIZED
NGE: ✓ documentexporter — Step Functions + ECS Fargate
Legacy: ExhibitZipVolumeWorker + ExhibitLoadfileWorker
Key: Legacy downloads pre-stamped page images from S3
NGE renders from Nutrient PDF with bates/confidentiality overlays
EDRM Stage 9: Presentation ─── LEGACY ONLY (Litigation suite)
Legacy: TheaterController, TreatmentsController, DepositionsController,
video transcoding/stitching/sync, transcript parsing (LEF/PTX/CMS)
Cross-Stage:
NGE: ◐ documentexchanger — document exchange (built, not integrated)
Legacy: OutgoingWire/IncomingWire, WireSetupJob, DocumentShareJob
SEPARATE PRODUCT — Data Mining (own AWS accounts):
◆ eda — Ruby Lambda + Batch + dtSearch (Stages 4-8)
◆ eda-front-end — TypeScript SPA + 53 Lambda API + DynamoDB
✓ = production ◐ = built/prototype ◆ = separate product
Stages 1-3 are thin: Information Governance, Identification, and Preservation are lightly covered — Nextpoint focuses on stages 4-9 (Collection through Presentation).
Mapping Divergences to NGE Service Modules¶
Each functional divergence area maps to one or more NGE service modules that replaced the Legacy behavior. Some modules handle multiple functional areas.
By NGE Module¶
documentextractor (via ProcessorApi) — Pipeline Entry Point¶
Handles: Ingestion trigger + Cancellation + Pipeline orchestration
| Functional Area | How documentextractor handles it |
|---|---|
| Document ingestion (16 pts) | ProcessorApi.import() calls documentextractor's POST /import endpoint. documentextractor assigns a worker, extracts content (text, metadata, file conversion via Hyland Filters), and publishes SNS events. These fan out to documentloader (DB writes via SQS), documentuploader (page images via SQS), and PSM (event capture via Firehose). Replaces Legacy's PreprocessWorker → ContainerWorker → ConversionWorker → PageWorker chain. |
| Batch cancellation (4 pts) | ProcessorApi.cancel_import() calls DELETE /import/{case}/{job}/{batch}. documentextractor tears down the processing pipeline. Replaces Legacy's local DB status update + BatchStatusUpdateJob. |
Key insight: documentextractor is the NGE entry point from Rails — it's the
service that ProcessorApi talks to. It publishes SNS events that fan out to
documentloader (DB writes), documentuploader (page images), and PSM (Firehose
event capture). Each downstream module also publishes its own events for
further subscribers.
documentloader (downstream from documentextractor)¶
Handles: Batch lifecycle + DB writes + Family linking
| Functional Area | How documentloader handles it |
|---|---|
| Batch completion (8 pts) | Job processor manages batch lifecycle — creates SQS/Lambda per batch, monitors queue depth, does multi-pass DLQ redrive, atomic teardown. Replaces Legacy's polling loop (next_check_for_complete_time_gmt), BatchCleanup.process, and Exhibit.request_indexing. |
| Family linking (3 pts) | documentloader assigns family_id during ingestion via email thread detection. Replaces Legacy's backfill_email_family_id post-processing. |
Key insight: documentloader's job processor replaces the Legacy batch polling/retry/completion/cleanup machinery. Combined with documentextractor's ingestion trigger, these two modules account for 31 of the 86 divergence points.
documentuploader (Nutrient/PSPDFKit infrastructure)¶
Handles: Document viewing + Placeholders + Provides infrastructure for bates/markups
| Functional Area | How documentuploader handles it |
|---|---|
| Document viewer (16 pts) | This is the fundamental rendering shift. Legacy stores individual page images (TIFF/PNG) on S3 — bates stamps, confidentiality codes, and coding are applied directly to those image files. NGE has only the PDF in Nutrient — no individual page images exist. All annotations (bates, confidentiality, coding) are Nutrient overlays rendered on-demand. documentuploader provisions the Nutrient document; Rails reads via NextpointNutrient.get_cached_filename_for_theater. |
| Placeholders (1 pt) | NGE creates Nutrient layers directly (create_instant_layer_for_nge) instead of generating local placeholder files and uploading to S3. |
| Bates/markups infrastructure | documentuploader provisions the Nutrient document (nutrient_id = document_{case}_{batch}_{nge_doc}_{id}) that bates stamping and markups operate against. |
Key insight: documentuploader doesn't just upload — it establishes the Nutrient
document that the entire NGE document viewing, stamping, and annotation stack
depends on. Every NextpointNutrient.* call in the divergence map exists because
documentuploader set up the Nutrient document.
documentpageservice (via NgePageService)¶
Handles: Page manipulation + triggers bates/OCR workflows
| Functional Area | How documentpageservice handles it |
|---|---|
| Document viewer — page operations | NgePageService.process_nge_page_job() for reorder, rotate, add, remove, split. Called from attachment.rb:627 and document_natives_controller.rb:57. Sets processing_in_nge = true while operating. |
| Bates stamping (12 pts) | Rails calls NextpointNutrient.nutrient_document_info() for page counts (set up by documentuploader), then stamps via ExhibitNutrientAction concern. documentpageservice handles the underlying page manipulation when pages need OCR or regeneration (native_pdf_ocr_job). |
| Toolbar locking (5 pts) | The processing_in_nge flag is set whenever documentpageservice is processing a document. This gates all toolbar actions (add/rotate/split/delete/replace) in the UI until the operation completes. |
Key insight: documentpageservice is the reason processing_in_nge exists.
Every toolbar lock and unlock in the divergence map is triggered by a
documentpageservice operation starting or completing.
documentexporter (via NgeExportJob)¶
Handles: Export/production
| Functional Area | How documentexporter handles it |
|---|---|
| Export/production (8 pts) | NgeExportJob invokes {region}-{env}-nge-export-lambda async. Step Functions + ECS Fargate handle image conversion, PDF rendering, ZIP assembly. Replaces Legacy's ExhibitZipVolumeWorker + ExhibitLoadfileWorker. Different export size calculation (size_of_nge_zips vs export_volumes.first.file_size) and notification emails. |
Key difference — page images vs PDF: Legacy stores individual page images (TIFF/PNG) on S3 with bates stamps, confidentiality codes, and coding applied directly to the image files. NGE only has the PDF document in Nutrient — no individual page images exist. So documentexporter must use Nutrient to overlay bates/confidentiality/coding annotations onto the PDF when generating export images. This is why the export rendering and size calculation differ between NGE and Legacy.
documentexchanger¶
Handles: Wire transfer replacement
| Functional Area | How documentexchanger handles it |
|---|---|
| Wire transfer (2 pts) | documentexchanger with dynamic Lambda+SQS provisioning per exchange and AWS Glue ETL replaces the Legacy wire transfer system. The Legacy wire buttons are simply hidden in NGE UI. |
unzipservice¶
Part of the ingestion pipeline, invoked by documentextractor:
| Module | Role in divergence |
|---|---|
| unzipservice | Archive extraction during ingestion. Replaces Legacy's ContainerWorker for ZIP/RAR/7Z. |
Nutrient (PSPDFKit) — Cross-Cutting Service¶
Nutrient is not an NGE module but a SaaS dependency that multiple modules and Rails itself call directly. It appears in 37 of the 86 divergence points:
| Caller | Nutrient Usage |
|---|---|
| documentuploader | Provisions documents, generates page images |
| Rails — ExhibitNutrientAction | Bates stamping, confidentiality stamps, page labels |
| Rails — BatesStampJob | Page count validation against Nutrient |
| Rails — AutoRedactionJob | Term and pattern redactions |
| Rails — SyncAnnotationIdsJob | Annotation ID reconciliation |
| Rails — SplitDocumentOnFlagsJob | Document splitting via Nutrient API |
| Rails — theater_processor | Theater view image retrieval |
| Rails — NativePlaceholder | Non-imaged placeholder layer creation |
Summary: Module → Functional Areas¶
documentextractor ───────┬── Document ingestion (16) ← entry point via ProcessorApi
(ProcessorApi) └── Batch cancellation (4)
Total: 20 points
documentloader ──────────┬── Batch completion (8) ← downstream from extractor
(job processor) └── Family linking (3)
Total: 11 points
documentuploader ────────┬── Document viewer (16)
(Nutrient provisioning) └── Placeholders (1)
Total: 17 points
documentpageservice ─────┬── Bates stamping (12)
(NgePageService) ├── Toolbar locking (5)
└── Markups/redactions (4)
Total: 21 points
documentexporter ────────── Export/production (8)
(NgeExportJob → Lambda) Total: 8 points
documentexchanger ───────── Wire transfer (2)
Total: 2 points
No NGE module ───────────── Global UI (7)
(CSS/JS only) Total: 7 points
Note: Rails only talks to documentextractor (via ProcessorApi) and
documentloader events come back via Athena. unzipservice is invoked by
documentextractor for archive extraction and is transparent to Rails.
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.