Reference Implementation: Rails Monolith (Legacy)¶

Overview¶

The Rails monolith is the core Nextpoint eDiscovery and Litigation platform. It serves as the web application, API backend, authentication provider, case manager, search engine, document viewer, and job orchestrator. All NGE modules ultimately integrate back to this application.

Stack: Rails 7.0.8, Ruby 3.1.4, MySQL (per-case databases), Elasticsearch 7.4, Sidekiq 7.3.9, Redis, AWS (SNS, SQS, Lambda, ECS, Cognito, Athena, S3, SSM, Secrets Manager, EventBridge, Bedrock).

Architecture¶

rails/
├── app/
│   ├── controllers/                 # 130+ controllers
│   │   ├── application_controller.rb  # 43KB — auth, session, HMAC, case switching
│   │   ├── admin/                   # Admin panel controllers
│   │   └── api/                     # API namespace (Nutrient JWT, etc.)
│   ├── models/                      # 140+ models
│   │   ├── per_case_model.rb        # Multi-tenancy foundation — per-case DB switching
│   │   ├── npcase.rb                # Case entity (trial_prep/review)
│   │   ├── user.rb                  # User with Cognito integration
│   │   ├── exhibit.rb               # Document record (39KB, 39 associations)
│   │   ├── attachment.rb            # Document pages/files
│   │   ├── batch.rb                 # Import batch (41KB)
│   │   ├── batch_processing_event.rb # NGE event tracking
│   │   ├── account.rb               # Organization/tenant
│   │   └── concerns/                # Searchable, Exportable, NutrientAction, etc.
│   ├── sidekiq/                     # 70+ background job classes
│   │   ├── background_processing.rb # Base class — case connection, progress, logging
│   │   ├── nge_case_tracker_job.rb  # Polls Athena for NGE events
│   │   ├── nge_export_job.rb        # Invokes documentexporter Lambda
│   │   └── ...                      # Bulk ops, reports, exports, AI jobs
│   ├── services/                    # 40+ service objects
│   │   ├── nge_page_service.rb      # HTTP client for documentpageservice
│   │   ├── event_bridge_publisher.rb # AWS EventBridge integration
│   │   ├── chatbot_service.rb       # AI chatbot integration
│   │   └── ...
│   ├── decorators/                  # Draper pattern for view decoration
│   └── views/                       # ERB templates + React components
├── config/
│   ├── routes.rb                    # URL routing
│   ├── database.yml                 # DB config with writer + reader_host
│   ├── nextpoint_global.yml         # Platform configuration
│   └── initializers/                # 32+ initializers (sidekiq, ES, session, etc.)
├── db/
│   └── schema.rb                    # 2082 lines — shared + per-case tables
├── lib/
│   ├── search/                      # Elasticsearch integration (query DSL, parsing)
│   ├── shared/                      # Symlink to shared_libs repo
│   └── authorization_helper.rb      # Role-based access control
└── deploy/                          # Deployment configuration (Capistrano)

Pattern Mapping¶

Pattern	Rails Implementation	NGE Equivalent
Multi-tenancy	`PerCaseModel` — thread-safe connection switching via `Thread.current.object_id`	Per-case DB schema: `{RDS_DBNAME}_case_{case_id}`
Authentication	Session-based (web) + HMAC-SHA1 (API) + Cognito (identity)	Lambda execution role; HMAC-SHA1 for DPS API
Authorization	Database-driven RBAC: `Action` → `Role` → `RoleNpcase` per case	N/A — NGE modules are internal services
Background jobs	Sidekiq 7.3.9 with Redis; 70+ job classes; `BackgroundProcessing` base	Lambda functions (auto-invoked by SQS/SNS)
Search	Elasticsearch 7.4 with custom query DSL (Parslet grammar)	N/A — search is Legacy-only
NGE event tracking	`NgeCaseTrackerJob` polls Athena → `BatchProcessingEvent` table	SNS events published by each NGE module
NGE export	`NgeExportJob` invokes Lambda async (`{prefix}-nge-export-lambda`)	documentexporter Step Functions + ECS
NGE page service	`NgePageService` calls HTTP API (URL from SSM parameter)	documentpageservice Java ECS Fargate
S3 operations	`NextPointS3` from shared_libs (via `lib/shared/` symlink)	`shell/utils/s3_ops.py`
Database sessions	`PerCaseModel.set_case(id)` → `establish_connection(db_name)`	`writer_session()` / `reader_session()` context managers
Read replicas	`reader_host` in database.yml; `set_case(id, use_reader: true)`	Separate RDS reader proxy endpoint
Event publishing	`EventBridgePublisher` service (new) + Sidekiq jobs	SNS `EventType` enum
Configuration	YAML files + `$global_config` hash + env vars	Env vars + Secrets Manager + SSM Parameter Store
Logging	Lograge (JSON) + Logstash-logger	CloudWatch JSON structured logging
PDF generation	Prawn gem + Nutrient (PSPDFKit)	Apache PDFBox (documentpageservice)
AI integration	Bedrock Agent SDK; `ai_jobs`, `ai_job_chunks` tables	N/A — AI features are in Legacy

Key Design Decisions¶

Multi-Tenancy: PerCaseModel¶

The multi-tenancy approach is the architectural foundation of the entire platform:

# app/models/per_case_model.rb
class PerCaseModel < ApplicationRecord
  self.abstract_class = true

  def self.set_case(case_id, use_reader: false)
    db_name = "#{base_database}_case_#{case_id}"
    establish_connection(
      # merges base config with case-specific database name
      # uses reader_host if use_reader: true
    )
  end
end

Thread safety: Uses Thread.current.object_id to track which thread holds which database connection. This is critical for Sidekiq workers processing multiple cases concurrently. Class variables @@case_id, @@db_name, @@use_reader are all keyed by thread object ID.

Database naming: {base_db}_case_{case_id} (e.g., nextpoint_production_case_42)

Reader/writer switching: set_case(id, use_reader: true) connects to reader_host from database.yml. weaponize(weaponized: true) toggles mid-operation from reader to writer. temporarily_set_case provides block-scoped case switching with automatic restore.

Schema caching: In production/staging/QA, copies the schema cache from the core database to avoid per-case schema introspection overhead.

NGE alignment: NGE modules use the exact same naming convention. The Python database.py constructs {RDS_DBNAME}_case_{case_id} for SQLAlchemy sessions. This is a shared contract between Legacy and NGE.

Authentication: Three-Layer System¶

Web sessions — activerecord-session_store in users_sessions table. 30-minute idle timeout. Session cookie: lt-{deployment_id}-{env}.

API authentication — HMAC-SHA1 for service-to-service calls:

compare_string = "#{request.method}#{request.path}#{@args[:user_id]}"
compare_hash = NextPointAPI.sign(compare_string)
valid if compare_hash == auth_hash

Used by: Legacy workers (via NextPointAPI), NGE documentpageservice.

AWS Cognito — Identity provider for user management. New users auto-provisioned via create_cognito_user callback. Supports SAML SSO via identity providers.

Sidekiq Job Framework¶

70+ job classes inheriting from BackgroundProcessing base:

# app/sidekiq/background_processing.rb
class BackgroundProcessing
  include Sidekiq::Worker

  def set_connection(case_id)
    PerCaseModel.set_case(case_id)
  end

  # Progress tracking: job_id, percent, current, total, status
  # S3 logging: upload job logs after completion
  # Error handling: admin notification on failure
end

Queue structure: Redis-backed. Queues: normal, long_running. Default retry: 5 with exponential backoff.

Key job categories: - NGE integration: NgeCaseTrackerJob, NgeExportJob - Document processing: DocumentExportJob, DocumentShareJob - Bulk operations: BulkLabelJob, BulkDeleteJob, CodingOverlayJob - Reporting: CustomReportJob, UserActivityReportJob - Database: DatabaseArchiveJob (archives old case databases) - AI: AI summary and chatbot jobs (new) - Deposition: PDF generation, transcript parsing, video processing

Elasticsearch Integration¶

Per-case Elasticsearch indexes with a custom query DSL:

Search pipeline: 1. User enters search query in UI 2. NextpointControllerSearchFactory orchestrates search 3. Query parsed by Parslet grammar into AST 4. DocumentParsedSearchHashTransforms converts AST to Elasticsearch DSL 5. ES returns results with highlighting 6. Results decorated and paginated

Index structure: Per-case indexes for document isolation. Search fields: Full text, author, email, dates, custom fields, privilege status, review status. Reindexing: ElasticsearchIndexer model tracks reindex jobs; Reindexable concern.

NGE Integration Architecture¶

The Rails app is the orchestration layer for NGE modules:

User uploads documents to S3 case folder
    │
    ▼
Batch::AsUpload created (before_commit triggers NGE)
    │
    ▼
ProcessorApi.import() — HTTP POST to documentextractor API
    │  (sends case_id, batch_id, import_type, files, settings)
    │  (stores processor_job_id on Batch record)
    ▼
NGE documentextractor (ECS Fargate + DynamoDB worker pool)
    │  Assigns worker, extracts content (text, metadata, file conversion)
    │  Publishes SNS events as documents are processed
    │
    ├──→ SNS ──→ documentloader (Lambda/SQS) — DB writes
    │              └──→ publishes SNS events → downstream + PSM
    ├──→ SNS ──→ documentuploader (ECS/Nutrient) — page images
    │              └──→ publishes SNS events → downstream + PSM
    └──→ SNS ──→ PSM Firehose — captures events from ALL modules → S3 (Parquet) → Athena
                      │
                      ▼
              NgeCaseTrackerJob (Sidekiq) polls Athena for events
                      │
                      ▼
              Events persisted into BatchProcessingEvent table
                      │
                      ▼
              BatchCompletionJob (batch_end event) marks batch complete
                      │
                      ▼
              User can search/review documents in Rails UI

Key integration classes:

ProcessorApi (lib/processor_api.rb) — HTTP client for the documentextractor API. Endpoints: POST /import (create job), DELETE /import/{case_id}/{job_id}/{batch_id} (cancel). Token-based auth.
ProcessorApiHelper (app/helpers/processor_api_helper.rb) — Mixed into Batch. Builds import payloads, calls ProcessorApi.import(), stores processor_job_id, creates PSM S3 prefixes for Athena tracking.
Batch::AsUpload (app/models/batch/as_upload.rb) — The upload entry point. before_commit :initiate_workflow_check calls initiate_import for NGE cases. after_create :create_import_status creates ImportStatus for tracking.

Important: Rails does not publish directly to SNS. The flow is: Rails → ProcessorApi HTTP → documentextractor API → SNS events fan out to documentloader, documentuploader, and PSM (Firehose). Rails reads results back via Athena (PSM) polling. Cancellation also goes through ProcessorApi to stop work.

Four NGE integration patterns:

HTTP import trigger (documentextractor):
ProcessorApi.import() — HTTP POST to documentextractor API (POST /import)
Triggered by Batch::AsUpload before_commit callback
Payload: case_id, batch_id, import_type, files, settings
Returns processor_job_id stored on Batch record
documentextractor publishes SNS events → subscribed by documentloader, documentuploader, PSM (Firehose)
Athena event polling (documentloader/extractor/uploader):
NgeCaseTrackerJob queries Athena (Firehose → Parquet → Athena)
Event types: DOCUMENT_LOADED, FILTERED_DUPLICATE, DUPLICATE_FOUND, ERROR, WARNING, NOTIFICATION
Events normalized into BatchProcessingEvent records
ProcessorApiHelper#load_status_partition loads Athena partitions
Lambda invocation (documentexporter):
NgeExportJob calls Aws::Lambda::Client.invoke() with async Event mode
Lambda name: {region}-{env}-nge-export-lambda
Payload: bucket, export_id, manifest_zip, volumes
HTTP API (documentpageservice):
NgePageService service object
URL from SSM Parameter Store: /nge/dps/{env}/api/apiUrl
Operations: reorder, rotate, add, remove, split pages

Database Schema (2082 lines)¶

Shared database tables (all cases share):

Table	Purpose
`users`	User accounts (UUID username, email_hash, Cognito link)
`accounts`	Organizations with billing, plans, ingestion limits
`account_users`	User → Account many-to-many
`npcases`	Case directory (active/archived/disabled)
`npcase_users`	User → Case access with role assignments
`roles`, `role_actions`	RBAC definitions
`processing_jobs`	Legacy job tracking
`ai_jobs`, `ai_job_chunks`	AI processing tracking
`elasticsearch_indexers`	ES reindex job tracking

Per-case database tables (isolated per case):

Table	Purpose
`exhibits`	Documents — metadata, status, coding, privileges
`attachments`	Document pages/files — S3 paths, verified page counts
`batches`	Import batches with status tracking
`batch_sources`, `batch_parts`	Batch structure
`batch_processing_events`	NGE event tracking
`labels`, `exh_designations`	Document coding/tagging
`exports`, `export_volumes`, `export_exhibits`	Export production
`depositions`, `deposition_volumes`	Depositions
`custom_fields`	User-defined metadata fields
`saved_searches`	Persisted search queries
`privilege_logs`, `confidentiality_logs`	Audit trails
`model_audits`	Change history

Key Models¶

Exhibit (app/models/exhibit.rb — 39KB): - 39 associations (labels, attachments, designations, exports, etc.) - Denormalized search fields for Elasticsearch - NGE support: nutrient_id, nge_file_hash, nge_enabled? - Billing: billing_size, verified_page_count - Concerns: ExhibitSearchable, ExhibitExportable, ExhibitNutrientAction

Batch (app/models/batch.rb — 41KB): - Import types: native, produced, wire_transfer, split_result - Statuses: preprocessing, processing, complete, error, cancelled - NGE tracking: loader_status, loader_status_updated_at_gmt - batch_processing_events association for NGE event tracking

Npcase (app/models/npcase.rb): - Case types: trial_prep (1), review (2) - Statuses: active, active_no_charge, archived, disabled, deleted, pending_archiving - nge_enabled? flag controls Legacy vs NGE processing path - Per-case Lambda function detection

Service Architecture¶

Service Objects (`app/services/`)¶

40+ service objects extract complex business logic from controllers/models:

Service	Purpose
`NgePageService`	HTTP interface to documentpageservice
`EventBridgePublisher`	AWS EventBridge event publishing
`ChatbotService`	AI chatbot (Bedrock) integration
`DataSyncRunner`	AWS DataSync for data transfer
`CloneNpcaseService`	Case duplication
`GenerateInvoiceService`	Billing invoice generation
`ImageMarkupService`	Document markup operations
`BatchContentSummary`	Batch content analysis
`SendSupportRequestService`	Zendesk ticket creation

Concerns (Model Mixins)¶

Concern	Purpose
`ExhibitSearchable`	Elasticsearch queries for documents
`AttachmentSearchable`	Elasticsearch queries for attachments
`DepositionSearchable`	Deposition text search
`ExhibitExportable`	Export logic
`ExhibitNutrientAction`	Nutrient API integration
`GmtTimestampTouchable`	GMT timestamp maintenance
`CachedItem`	Redis caching
`Reindexable`	Elasticsearch reindexing

Integration Points with NGE¶

Direct Integration¶

Integration	Direction	Mechanism
Document ingestion	Rails → NGE	`ProcessorApi.import()` HTTP POST to documentextractor API
Event tracking	NGE → Rails	Athena polling via `NgeCaseTrackerJob`
Export	Rails → NGE	Lambda async invoke via `NgeExportJob`
Page manipulation	Rails → NGE	HTTP API via `NgePageService`
Database schema	Shared	Same `{base}_case_{case_id}` convention
S3 paths	Shared	Same `/case_{id}/{type}/{uid}/{file}` convention
HMAC-SHA1 auth	NGE → Rails	documentpageservice calls Rails API

Shared State¶

State	Location	Accessed By
Per-case database	MySQL/Aurora	Rails, NGE modules (same schema)
S3 files	S3 bucket	Rails, workers, all NGE modules
Batch status	`batches` table	Rails (writes), NGE (reads via events)
Processing events	`batch_processing_events` table	Rails (reads), NGE (writes via Athena)
NGE service URLs	SSM Parameter Store	Rails reads, NGE publishes
Secrets	AWS Secrets Manager	Both Legacy and NGE

NGE Feature Flag¶

# app/models/npcase.rb
def nge_enabled?
  # Returns true if case uses NGE processing pipeline
  # A case is permanently either NGE or Legacy — not switchable
end

This flag is the primary routing decision for document processing. A case is created as either NGE or Legacy and remains that way — it is not a migration toggle. The Legacy code has separate paths throughout based on this flag.

Patterns to Preserve vs Deprecate¶

Preserve¶

PerCaseModel multi-tenancy — the per-case database pattern is shared with NGE and is fundamental to data isolation
HMAC-SHA1 API auth — still used by NGE services calling back to Rails
Sidekiq for non-document jobs — bulk operations, reports, and admin tasks don't need NGE-level infrastructure
Elasticsearch search pipeline — custom query DSL is a competitive feature
RBAC authorization — database-driven roles scale well
Service object pattern — clean separation of business logic
Draper decorators — effective view-model separation
S3 path conventions — shared contract with NGE (do not change)
BatchProcessingEvent tracking — bridges Legacy UI with NGE processing

Deprecate¶

Legacy document workers — replaced by NGE modules for all new cases
XML-based worker API — replaced by direct DB access in NGE
ApplicationController 43KB god class — should be decomposed
Session-based auth for API — modern services use JWT/OAuth2
YAML configuration files — moving to env vars + Secrets Manager
Polling-based NGE events — future: direct SNS → Rails webhook or EventBridge
Global admin flag on User model — should migrate to proper admin roles

Key File Locations¶

File	Purpose
`app/models/per_case_model.rb`	Multi-tenancy foundation
`app/controllers/application_controller.rb`	Auth, session, HMAC (43KB)
`app/models/exhibit.rb`	Document model (39KB, 39 associations)
`app/models/batch.rb`	Import batch model (41KB)
`app/models/npcase.rb`	Case entity with `nge_enabled?` flag
`app/models/user.rb`	User with Cognito integration
`app/sidekiq/background_processing.rb`	Sidekiq base class
`app/sidekiq/nge_case_tracker_job.rb`	NGE event polling (Athena)
`app/sidekiq/nge_export_job.rb`	documentexporter Lambda invocation
`app/services/nge_page_service.rb`	documentpageservice HTTP client
`lib/processor_api.rb`	HTTP client for NGE Processor API (import trigger)
`app/helpers/processor_api_helper.rb`	NGE import payload builder (mixed into Batch)
`app/models/batch/as_upload.rb`	Upload entry point with NGE workflow trigger
`lib/nextpoint_cognito.rb`	Cognito SRP authentication
`lib/shared/nextpoint_api.rb`	HMAC-SHA1 API client (shared_libs)
`lib/shared/nextpoint_s3.rb`	S3 operations (shared_libs)
`lib/search/`	Elasticsearch query DSL and transforms
`db/schema.rb`	Complete schema (2082 lines)
`config/database.yml`	DB config with writer + reader_host

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.