Reference Implementation: Rails Monolith (Legacy)¶
Overview¶
The Rails monolith is the core Nextpoint eDiscovery and Litigation platform. It serves as the web application, API backend, authentication provider, case manager, search engine, document viewer, and job orchestrator. All NGE modules ultimately integrate back to this application.
Stack: Rails 7.0.8, Ruby 3.1.4, MySQL (per-case databases), Elasticsearch 7.4, Sidekiq 7.3.9, Redis, AWS (SNS, SQS, Lambda, ECS, Cognito, Athena, S3, SSM, Secrets Manager, EventBridge, Bedrock).
Architecture¶
rails/
├── app/
│ ├── controllers/ # 130+ controllers
│ │ ├── application_controller.rb # 43KB — auth, session, HMAC, case switching
│ │ ├── admin/ # Admin panel controllers
│ │ └── api/ # API namespace (Nutrient JWT, etc.)
│ ├── models/ # 140+ models
│ │ ├── per_case_model.rb # Multi-tenancy foundation — per-case DB switching
│ │ ├── npcase.rb # Case entity (trial_prep/review)
│ │ ├── user.rb # User with Cognito integration
│ │ ├── exhibit.rb # Document record (39KB, 39 associations)
│ │ ├── attachment.rb # Document pages/files
│ │ ├── batch.rb # Import batch (41KB)
│ │ ├── batch_processing_event.rb # NGE event tracking
│ │ ├── account.rb # Organization/tenant
│ │ └── concerns/ # Searchable, Exportable, NutrientAction, etc.
│ ├── sidekiq/ # 70+ background job classes
│ │ ├── background_processing.rb # Base class — case connection, progress, logging
│ │ ├── nge_case_tracker_job.rb # Polls Athena for NGE events
│ │ ├── nge_export_job.rb # Invokes documentexporter Lambda
│ │ └── ... # Bulk ops, reports, exports, AI jobs
│ ├── services/ # 40+ service objects
│ │ ├── nge_page_service.rb # HTTP client for documentpageservice
│ │ ├── event_bridge_publisher.rb # AWS EventBridge integration
│ │ ├── chatbot_service.rb # AI chatbot integration
│ │ └── ...
│ ├── decorators/ # Draper pattern for view decoration
│ └── views/ # ERB templates + React components
├── config/
│ ├── routes.rb # URL routing
│ ├── database.yml # DB config with writer + reader_host
│ ├── nextpoint_global.yml # Platform configuration
│ └── initializers/ # 32+ initializers (sidekiq, ES, session, etc.)
├── db/
│ └── schema.rb # 2082 lines — shared + per-case tables
├── lib/
│ ├── search/ # Elasticsearch integration (query DSL, parsing)
│ ├── shared/ # Symlink to shared_libs repo
│ └── authorization_helper.rb # Role-based access control
└── deploy/ # Deployment configuration (Capistrano)
Pattern Mapping¶
| Pattern | Rails Implementation | NGE Equivalent |
|---|---|---|
| Multi-tenancy | PerCaseModel — thread-safe connection switching via Thread.current.object_id |
Per-case DB schema: {RDS_DBNAME}_case_{case_id} |
| Authentication | Session-based (web) + HMAC-SHA1 (API) + Cognito (identity) | Lambda execution role; HMAC-SHA1 for DPS API |
| Authorization | Database-driven RBAC: Action → Role → RoleNpcase per case |
N/A — NGE modules are internal services |
| Background jobs | Sidekiq 7.3.9 with Redis; 70+ job classes; BackgroundProcessing base |
Lambda functions (auto-invoked by SQS/SNS) |
| Search | Elasticsearch 7.4 with custom query DSL (Parslet grammar) | N/A — search is Legacy-only |
| NGE event tracking | NgeCaseTrackerJob polls Athena → BatchProcessingEvent table |
SNS events published by each NGE module |
| NGE export | NgeExportJob invokes Lambda async ({prefix}-nge-export-lambda) |
documentexporter Step Functions + ECS |
| NGE page service | NgePageService calls HTTP API (URL from SSM parameter) |
documentpageservice Java ECS Fargate |
| S3 operations | NextPointS3 from shared_libs (via lib/shared/ symlink) |
shell/utils/s3_ops.py |
| Database sessions | PerCaseModel.set_case(id) → establish_connection(db_name) |
writer_session() / reader_session() context managers |
| Read replicas | reader_host in database.yml; set_case(id, use_reader: true) |
Separate RDS reader proxy endpoint |
| Event publishing | EventBridgePublisher service (new) + Sidekiq jobs |
SNS EventType enum |
| Configuration | YAML files + $global_config hash + env vars |
Env vars + Secrets Manager + SSM Parameter Store |
| Logging | Lograge (JSON) + Logstash-logger | CloudWatch JSON structured logging |
| PDF generation | Prawn gem + Nutrient (PSPDFKit) | Apache PDFBox (documentpageservice) |
| AI integration | Bedrock Agent SDK; ai_jobs, ai_job_chunks tables |
N/A — AI features are in Legacy |
Key Design Decisions¶
Multi-Tenancy: PerCaseModel¶
The multi-tenancy approach is the architectural foundation of the entire platform:
# app/models/per_case_model.rb
class PerCaseModel < ApplicationRecord
self.abstract_class = true
def self.set_case(case_id, use_reader: false)
db_name = "#{base_database}_case_#{case_id}"
establish_connection(
# merges base config with case-specific database name
# uses reader_host if use_reader: true
)
end
end
Thread safety: Uses Thread.current.object_id to track which thread holds
which database connection. This is critical for Sidekiq workers processing
multiple cases concurrently. Class variables @@case_id, @@db_name, @@use_reader
are all keyed by thread object ID.
Database naming: {base_db}_case_{case_id} (e.g., nextpoint_production_case_42)
Reader/writer switching: set_case(id, use_reader: true) connects to reader_host
from database.yml. weaponize(weaponized: true) toggles mid-operation from reader
to writer. temporarily_set_case provides block-scoped case switching with automatic
restore.
Schema caching: In production/staging/QA, copies the schema cache from the core database to avoid per-case schema introspection overhead.
NGE alignment: NGE modules use the exact same naming convention. The Python
database.py constructs {RDS_DBNAME}_case_{case_id} for SQLAlchemy sessions.
This is a shared contract between Legacy and NGE.
Authentication: Three-Layer System¶
-
Web sessions —
activerecord-session_storeinusers_sessionstable. 30-minute idle timeout. Session cookie:lt-{deployment_id}-{env}. -
API authentication — HMAC-SHA1 for service-to-service calls:
Used by: Legacy workers (viacompare_string = "#{request.method}#{request.path}#{@args[:user_id]}" compare_hash = NextPointAPI.sign(compare_string) valid if compare_hash == auth_hashNextPointAPI), NGE documentpageservice. -
AWS Cognito — Identity provider for user management. New users auto-provisioned via
create_cognito_usercallback. Supports SAML SSO via identity providers.
Sidekiq Job Framework¶
70+ job classes inheriting from BackgroundProcessing base:
# app/sidekiq/background_processing.rb
class BackgroundProcessing
include Sidekiq::Worker
def set_connection(case_id)
PerCaseModel.set_case(case_id)
end
# Progress tracking: job_id, percent, current, total, status
# S3 logging: upload job logs after completion
# Error handling: admin notification on failure
end
Queue structure: Redis-backed. Queues: normal, long_running.
Default retry: 5 with exponential backoff.
Key job categories:
- NGE integration: NgeCaseTrackerJob, NgeExportJob
- Document processing: DocumentExportJob, DocumentShareJob
- Bulk operations: BulkLabelJob, BulkDeleteJob, CodingOverlayJob
- Reporting: CustomReportJob, UserActivityReportJob
- Database: DatabaseArchiveJob (archives old case databases)
- AI: AI summary and chatbot jobs (new)
- Deposition: PDF generation, transcript parsing, video processing
Elasticsearch Integration¶
Per-case Elasticsearch indexes with a custom query DSL:
Search pipeline:
1. User enters search query in UI
2. NextpointControllerSearchFactory orchestrates search
3. Query parsed by Parslet grammar into AST
4. DocumentParsedSearchHashTransforms converts AST to Elasticsearch DSL
5. ES returns results with highlighting
6. Results decorated and paginated
Index structure: Per-case indexes for document isolation.
Search fields: Full text, author, email, dates, custom fields, privilege status, review status.
Reindexing: ElasticsearchIndexer model tracks reindex jobs; Reindexable concern.
NGE Integration Architecture¶
The Rails app is the orchestration layer for NGE modules:
User uploads documents to S3 case folder
│
▼
Batch::AsUpload created (before_commit triggers NGE)
│
▼
ProcessorApi.import() — HTTP POST to documentextractor API
│ (sends case_id, batch_id, import_type, files, settings)
│ (stores processor_job_id on Batch record)
▼
NGE documentextractor (ECS Fargate + DynamoDB worker pool)
│ Assigns worker, extracts content (text, metadata, file conversion)
│ Publishes SNS events as documents are processed
│
├──→ SNS ──→ documentloader (Lambda/SQS) — DB writes
│ └──→ publishes SNS events → downstream + PSM
├──→ SNS ──→ documentuploader (ECS/Nutrient) — page images
│ └──→ publishes SNS events → downstream + PSM
└──→ SNS ──→ PSM Firehose — captures events from ALL modules → S3 (Parquet) → Athena
│
▼
NgeCaseTrackerJob (Sidekiq) polls Athena for events
│
▼
Events persisted into BatchProcessingEvent table
│
▼
BatchCompletionJob (batch_end event) marks batch complete
│
▼
User can search/review documents in Rails UI
Key integration classes:
ProcessorApi(lib/processor_api.rb) — HTTP client for the documentextractor API. Endpoints:POST /import(create job),DELETE /import/{case_id}/{job_id}/{batch_id}(cancel). Token-based auth.ProcessorApiHelper(app/helpers/processor_api_helper.rb) — Mixed intoBatch. Builds import payloads, callsProcessorApi.import(), storesprocessor_job_id, creates PSM S3 prefixes for Athena tracking.Batch::AsUpload(app/models/batch/as_upload.rb) — The upload entry point.before_commit :initiate_workflow_checkcallsinitiate_importfor NGE cases.after_create :create_import_statuscreatesImportStatusfor tracking.
Important: Rails does not publish directly to SNS. The flow is: Rails → ProcessorApi HTTP → documentextractor API → SNS events fan out to documentloader, documentuploader, and PSM (Firehose). Rails reads results back via Athena (PSM) polling. Cancellation also goes through ProcessorApi to stop work.
Four NGE integration patterns:
- HTTP import trigger (documentextractor):
ProcessorApi.import()— HTTP POST to documentextractor API (POST /import)- Triggered by
Batch::AsUploadbefore_commitcallback - Payload: case_id, batch_id, import_type, files, settings
- Returns
processor_job_idstored on Batch record -
documentextractor publishes SNS events → subscribed by documentloader, documentuploader, PSM (Firehose)
-
Athena event polling (documentloader/extractor/uploader):
NgeCaseTrackerJobqueries Athena (Firehose → Parquet → Athena)- Event types:
DOCUMENT_LOADED,FILTERED_DUPLICATE,DUPLICATE_FOUND,ERROR,WARNING,NOTIFICATION - Events normalized into
BatchProcessingEventrecords -
ProcessorApiHelper#load_status_partitionloads Athena partitions -
Lambda invocation (documentexporter):
NgeExportJobcallsAws::Lambda::Client.invoke()with asyncEventmode- Lambda name:
{region}-{env}-nge-export-lambda -
Payload: bucket, export_id, manifest_zip, volumes
-
HTTP API (documentpageservice):
NgePageServiceservice object- URL from SSM Parameter Store:
/nge/dps/{env}/api/apiUrl - Operations: reorder, rotate, add, remove, split pages
Database Schema (2082 lines)¶
Shared database tables (all cases share):
| Table | Purpose |
|---|---|
users |
User accounts (UUID username, email_hash, Cognito link) |
accounts |
Organizations with billing, plans, ingestion limits |
account_users |
User → Account many-to-many |
npcases |
Case directory (active/archived/disabled) |
npcase_users |
User → Case access with role assignments |
roles, role_actions |
RBAC definitions |
processing_jobs |
Legacy job tracking |
ai_jobs, ai_job_chunks |
AI processing tracking |
elasticsearch_indexers |
ES reindex job tracking |
Per-case database tables (isolated per case):
| Table | Purpose |
|---|---|
exhibits |
Documents — metadata, status, coding, privileges |
attachments |
Document pages/files — S3 paths, verified page counts |
batches |
Import batches with status tracking |
batch_sources, batch_parts |
Batch structure |
batch_processing_events |
NGE event tracking |
labels, exh_designations |
Document coding/tagging |
exports, export_volumes, export_exhibits |
Export production |
depositions, deposition_volumes |
Depositions |
custom_fields |
User-defined metadata fields |
saved_searches |
Persisted search queries |
privilege_logs, confidentiality_logs |
Audit trails |
model_audits |
Change history |
Key Models¶
Exhibit (app/models/exhibit.rb — 39KB):
- 39 associations (labels, attachments, designations, exports, etc.)
- Denormalized search fields for Elasticsearch
- NGE support: nutrient_id, nge_file_hash, nge_enabled?
- Billing: billing_size, verified_page_count
- Concerns: ExhibitSearchable, ExhibitExportable, ExhibitNutrientAction
Batch (app/models/batch.rb — 41KB):
- Import types: native, produced, wire_transfer, split_result
- Statuses: preprocessing, processing, complete, error, cancelled
- NGE tracking: loader_status, loader_status_updated_at_gmt
- batch_processing_events association for NGE event tracking
Npcase (app/models/npcase.rb):
- Case types: trial_prep (1), review (2)
- Statuses: active, active_no_charge, archived, disabled, deleted, pending_archiving
- nge_enabled? flag controls Legacy vs NGE processing path
- Per-case Lambda function detection
Service Architecture¶
Service Objects (app/services/)¶
40+ service objects extract complex business logic from controllers/models:
| Service | Purpose |
|---|---|
NgePageService |
HTTP interface to documentpageservice |
EventBridgePublisher |
AWS EventBridge event publishing |
ChatbotService |
AI chatbot (Bedrock) integration |
DataSyncRunner |
AWS DataSync for data transfer |
CloneNpcaseService |
Case duplication |
GenerateInvoiceService |
Billing invoice generation |
ImageMarkupService |
Document markup operations |
BatchContentSummary |
Batch content analysis |
SendSupportRequestService |
Zendesk ticket creation |
Concerns (Model Mixins)¶
| Concern | Purpose |
|---|---|
ExhibitSearchable |
Elasticsearch queries for documents |
AttachmentSearchable |
Elasticsearch queries for attachments |
DepositionSearchable |
Deposition text search |
ExhibitExportable |
Export logic |
ExhibitNutrientAction |
Nutrient API integration |
GmtTimestampTouchable |
GMT timestamp maintenance |
CachedItem |
Redis caching |
Reindexable |
Elasticsearch reindexing |
Integration Points with NGE¶
Direct Integration¶
| Integration | Direction | Mechanism |
|---|---|---|
| Document ingestion | Rails → NGE | ProcessorApi.import() HTTP POST to documentextractor API |
| Event tracking | NGE → Rails | Athena polling via NgeCaseTrackerJob |
| Export | Rails → NGE | Lambda async invoke via NgeExportJob |
| Page manipulation | Rails → NGE | HTTP API via NgePageService |
| Database schema | Shared | Same {base}_case_{case_id} convention |
| S3 paths | Shared | Same /case_{id}/{type}/{uid}/{file} convention |
| HMAC-SHA1 auth | NGE → Rails | documentpageservice calls Rails API |
Shared State¶
| State | Location | Accessed By |
|---|---|---|
| Per-case database | MySQL/Aurora | Rails, NGE modules (same schema) |
| S3 files | S3 bucket | Rails, workers, all NGE modules |
| Batch status | batches table |
Rails (writes), NGE (reads via events) |
| Processing events | batch_processing_events table |
Rails (reads), NGE (writes via Athena) |
| NGE service URLs | SSM Parameter Store | Rails reads, NGE publishes |
| Secrets | AWS Secrets Manager | Both Legacy and NGE |
NGE Feature Flag¶
# app/models/npcase.rb
def nge_enabled?
# Returns true if case uses NGE processing pipeline
# A case is permanently either NGE or Legacy — not switchable
end
This flag is the primary routing decision for document processing. A case is created as either NGE or Legacy and remains that way — it is not a migration toggle. The Legacy code has separate paths throughout based on this flag.
Patterns to Preserve vs Deprecate¶
Preserve¶
- PerCaseModel multi-tenancy — the per-case database pattern is shared with NGE and is fundamental to data isolation
- HMAC-SHA1 API auth — still used by NGE services calling back to Rails
- Sidekiq for non-document jobs — bulk operations, reports, and admin tasks don't need NGE-level infrastructure
- Elasticsearch search pipeline — custom query DSL is a competitive feature
- RBAC authorization — database-driven roles scale well
- Service object pattern — clean separation of business logic
- Draper decorators — effective view-model separation
- S3 path conventions — shared contract with NGE (do not change)
- BatchProcessingEvent tracking — bridges Legacy UI with NGE processing
Deprecate¶
- Legacy document workers — replaced by NGE modules for all new cases
- XML-based worker API — replaced by direct DB access in NGE
- ApplicationController 43KB god class — should be decomposed
- Session-based auth for API — modern services use JWT/OAuth2
- YAML configuration files — moving to env vars + Secrets Manager
- Polling-based NGE events — future: direct SNS → Rails webhook or EventBridge
- Global admin flag on User model — should migrate to proper admin roles
Key File Locations¶
| File | Purpose |
|---|---|
app/models/per_case_model.rb |
Multi-tenancy foundation |
app/controllers/application_controller.rb |
Auth, session, HMAC (43KB) |
app/models/exhibit.rb |
Document model (39KB, 39 associations) |
app/models/batch.rb |
Import batch model (41KB) |
app/models/npcase.rb |
Case entity with nge_enabled? flag |
app/models/user.rb |
User with Cognito integration |
app/sidekiq/background_processing.rb |
Sidekiq base class |
app/sidekiq/nge_case_tracker_job.rb |
NGE event polling (Athena) |
app/sidekiq/nge_export_job.rb |
documentexporter Lambda invocation |
app/services/nge_page_service.rb |
documentpageservice HTTP client |
lib/processor_api.rb |
HTTP client for NGE Processor API (import trigger) |
app/helpers/processor_api_helper.rb |
NGE import payload builder (mixed into Batch) |
app/models/batch/as_upload.rb |
Upload entry point with NGE workflow trigger |
lib/nextpoint_cognito.rb |
Cognito SRP authentication |
lib/shared/nextpoint_api.rb |
HMAC-SHA1 API client (shared_libs) |
lib/shared/nextpoint_s3.rb |
S3 operations (shared_libs) |
lib/search/ |
Elasticsearch query DSL and transforms |
db/schema.rb |
Complete schema (2082 lines) |
config/database.yml |
DB config with writer + reader_host |
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.