Skip to content

Reference Implementation: Rails Monolith (Legacy)

Overview

The Rails monolith is the core Nextpoint eDiscovery and Litigation platform. It serves as the web application, API backend, authentication provider, case manager, search engine, document viewer, and job orchestrator. All NGE modules ultimately integrate back to this application.

Stack: Rails 7.0.8, Ruby 3.1.4, MySQL (per-case databases), Elasticsearch 7.4, Sidekiq 7.3.9, Redis, AWS (SNS, SQS, Lambda, ECS, Cognito, Athena, S3, SSM, Secrets Manager, EventBridge, Bedrock).

Architecture

rails/
├── app/
│   ├── controllers/                 # 130+ controllers
│   │   ├── application_controller.rb  # 43KB — auth, session, HMAC, case switching
│   │   ├── admin/                   # Admin panel controllers
│   │   └── api/                     # API namespace (Nutrient JWT, etc.)
│   ├── models/                      # 140+ models
│   │   ├── per_case_model.rb        # Multi-tenancy foundation — per-case DB switching
│   │   ├── npcase.rb                # Case entity (trial_prep/review)
│   │   ├── user.rb                  # User with Cognito integration
│   │   ├── exhibit.rb               # Document record (39KB, 39 associations)
│   │   ├── attachment.rb            # Document pages/files
│   │   ├── batch.rb                 # Import batch (41KB)
│   │   ├── batch_processing_event.rb # NGE event tracking
│   │   ├── account.rb               # Organization/tenant
│   │   └── concerns/                # Searchable, Exportable, NutrientAction, etc.
│   ├── sidekiq/                     # 70+ background job classes
│   │   ├── background_processing.rb # Base class — case connection, progress, logging
│   │   ├── nge_case_tracker_job.rb  # Polls Athena for NGE events
│   │   ├── nge_export_job.rb        # Invokes documentexporter Lambda
│   │   └── ...                      # Bulk ops, reports, exports, AI jobs
│   ├── services/                    # 40+ service objects
│   │   ├── nge_page_service.rb      # HTTP client for documentpageservice
│   │   ├── event_bridge_publisher.rb # AWS EventBridge integration
│   │   ├── chatbot_service.rb       # AI chatbot integration
│   │   └── ...
│   ├── decorators/                  # Draper pattern for view decoration
│   └── views/                       # ERB templates + React components
├── config/
│   ├── routes.rb                    # URL routing
│   ├── database.yml                 # DB config with writer + reader_host
│   ├── nextpoint_global.yml         # Platform configuration
│   └── initializers/                # 32+ initializers (sidekiq, ES, session, etc.)
├── db/
│   └── schema.rb                    # 2082 lines — shared + per-case tables
├── lib/
│   ├── search/                      # Elasticsearch integration (query DSL, parsing)
│   ├── shared/                      # Symlink to shared_libs repo
│   └── authorization_helper.rb      # Role-based access control
└── deploy/                          # Deployment configuration (Capistrano)

Pattern Mapping

Pattern Rails Implementation NGE Equivalent
Multi-tenancy PerCaseModel — thread-safe connection switching via Thread.current.object_id Per-case DB schema: {RDS_DBNAME}_case_{case_id}
Authentication Session-based (web) + HMAC-SHA1 (API) + Cognito (identity) Lambda execution role; HMAC-SHA1 for DPS API
Authorization Database-driven RBAC: ActionRoleRoleNpcase per case N/A — NGE modules are internal services
Background jobs Sidekiq 7.3.9 with Redis; 70+ job classes; BackgroundProcessing base Lambda functions (auto-invoked by SQS/SNS)
Search Elasticsearch 7.4 with custom query DSL (Parslet grammar) N/A — search is Legacy-only
NGE event tracking NgeCaseTrackerJob polls Athena → BatchProcessingEvent table SNS events published by each NGE module
NGE export NgeExportJob invokes Lambda async ({prefix}-nge-export-lambda) documentexporter Step Functions + ECS
NGE page service NgePageService calls HTTP API (URL from SSM parameter) documentpageservice Java ECS Fargate
S3 operations NextPointS3 from shared_libs (via lib/shared/ symlink) shell/utils/s3_ops.py
Database sessions PerCaseModel.set_case(id)establish_connection(db_name) writer_session() / reader_session() context managers
Read replicas reader_host in database.yml; set_case(id, use_reader: true) Separate RDS reader proxy endpoint
Event publishing EventBridgePublisher service (new) + Sidekiq jobs SNS EventType enum
Configuration YAML files + $global_config hash + env vars Env vars + Secrets Manager + SSM Parameter Store
Logging Lograge (JSON) + Logstash-logger CloudWatch JSON structured logging
PDF generation Prawn gem + Nutrient (PSPDFKit) Apache PDFBox (documentpageservice)
AI integration Bedrock Agent SDK; ai_jobs, ai_job_chunks tables N/A — AI features are in Legacy

Key Design Decisions

Multi-Tenancy: PerCaseModel

The multi-tenancy approach is the architectural foundation of the entire platform:

# app/models/per_case_model.rb
class PerCaseModel < ApplicationRecord
  self.abstract_class = true

  def self.set_case(case_id, use_reader: false)
    db_name = "#{base_database}_case_#{case_id}"
    establish_connection(
      # merges base config with case-specific database name
      # uses reader_host if use_reader: true
    )
  end
end

Thread safety: Uses Thread.current.object_id to track which thread holds which database connection. This is critical for Sidekiq workers processing multiple cases concurrently. Class variables @@case_id, @@db_name, @@use_reader are all keyed by thread object ID.

Database naming: {base_db}_case_{case_id} (e.g., nextpoint_production_case_42)

Reader/writer switching: set_case(id, use_reader: true) connects to reader_host from database.yml. weaponize(weaponized: true) toggles mid-operation from reader to writer. temporarily_set_case provides block-scoped case switching with automatic restore.

Schema caching: In production/staging/QA, copies the schema cache from the core database to avoid per-case schema introspection overhead.

NGE alignment: NGE modules use the exact same naming convention. The Python database.py constructs {RDS_DBNAME}_case_{case_id} for SQLAlchemy sessions. This is a shared contract between Legacy and NGE.

Authentication: Three-Layer System

  1. Web sessionsactiverecord-session_store in users_sessions table. 30-minute idle timeout. Session cookie: lt-{deployment_id}-{env}.

  2. API authentication — HMAC-SHA1 for service-to-service calls:

    compare_string = "#{request.method}#{request.path}#{@args[:user_id]}"
    compare_hash = NextPointAPI.sign(compare_string)
    valid if compare_hash == auth_hash
    
    Used by: Legacy workers (via NextPointAPI), NGE documentpageservice.

  3. AWS Cognito — Identity provider for user management. New users auto-provisioned via create_cognito_user callback. Supports SAML SSO via identity providers.

Sidekiq Job Framework

70+ job classes inheriting from BackgroundProcessing base:

# app/sidekiq/background_processing.rb
class BackgroundProcessing
  include Sidekiq::Worker

  def set_connection(case_id)
    PerCaseModel.set_case(case_id)
  end

  # Progress tracking: job_id, percent, current, total, status
  # S3 logging: upload job logs after completion
  # Error handling: admin notification on failure
end

Queue structure: Redis-backed. Queues: normal, long_running. Default retry: 5 with exponential backoff.

Key job categories: - NGE integration: NgeCaseTrackerJob, NgeExportJob - Document processing: DocumentExportJob, DocumentShareJob - Bulk operations: BulkLabelJob, BulkDeleteJob, CodingOverlayJob - Reporting: CustomReportJob, UserActivityReportJob - Database: DatabaseArchiveJob (archives old case databases) - AI: AI summary and chatbot jobs (new) - Deposition: PDF generation, transcript parsing, video processing

Elasticsearch Integration

Per-case Elasticsearch indexes with a custom query DSL:

Search pipeline: 1. User enters search query in UI 2. NextpointControllerSearchFactory orchestrates search 3. Query parsed by Parslet grammar into AST 4. DocumentParsedSearchHashTransforms converts AST to Elasticsearch DSL 5. ES returns results with highlighting 6. Results decorated and paginated

Index structure: Per-case indexes for document isolation. Search fields: Full text, author, email, dates, custom fields, privilege status, review status. Reindexing: ElasticsearchIndexer model tracks reindex jobs; Reindexable concern.

NGE Integration Architecture

The Rails app is the orchestration layer for NGE modules:

User uploads documents to S3 case folder
Batch::AsUpload created (before_commit triggers NGE)
ProcessorApi.import() — HTTP POST to documentextractor API
    │  (sends case_id, batch_id, import_type, files, settings)
    │  (stores processor_job_id on Batch record)
NGE documentextractor (ECS Fargate + DynamoDB worker pool)
    │  Assigns worker, extracts content (text, metadata, file conversion)
    │  Publishes SNS events as documents are processed
    ├──→ SNS ──→ documentloader (Lambda/SQS) — DB writes
    │              └──→ publishes SNS events → downstream + PSM
    ├──→ SNS ──→ documentuploader (ECS/Nutrient) — page images
    │              └──→ publishes SNS events → downstream + PSM
    └──→ SNS ──→ PSM Firehose — captures events from ALL modules → S3 (Parquet) → Athena
              NgeCaseTrackerJob (Sidekiq) polls Athena for events
              Events persisted into BatchProcessingEvent table
              BatchCompletionJob (batch_end event) marks batch complete
              User can search/review documents in Rails UI

Key integration classes:

  • ProcessorApi (lib/processor_api.rb) — HTTP client for the documentextractor API. Endpoints: POST /import (create job), DELETE /import/{case_id}/{job_id}/{batch_id} (cancel). Token-based auth.
  • ProcessorApiHelper (app/helpers/processor_api_helper.rb) — Mixed into Batch. Builds import payloads, calls ProcessorApi.import(), stores processor_job_id, creates PSM S3 prefixes for Athena tracking.
  • Batch::AsUpload (app/models/batch/as_upload.rb) — The upload entry point. before_commit :initiate_workflow_check calls initiate_import for NGE cases. after_create :create_import_status creates ImportStatus for tracking.

Important: Rails does not publish directly to SNS. The flow is: Rails → ProcessorApi HTTP → documentextractor API → SNS events fan out to documentloader, documentuploader, and PSM (Firehose). Rails reads results back via Athena (PSM) polling. Cancellation also goes through ProcessorApi to stop work.

Four NGE integration patterns:

  1. HTTP import trigger (documentextractor):
  2. ProcessorApi.import() — HTTP POST to documentextractor API (POST /import)
  3. Triggered by Batch::AsUpload before_commit callback
  4. Payload: case_id, batch_id, import_type, files, settings
  5. Returns processor_job_id stored on Batch record
  6. documentextractor publishes SNS events → subscribed by documentloader, documentuploader, PSM (Firehose)

  7. Athena event polling (documentloader/extractor/uploader):

  8. NgeCaseTrackerJob queries Athena (Firehose → Parquet → Athena)
  9. Event types: DOCUMENT_LOADED, FILTERED_DUPLICATE, DUPLICATE_FOUND, ERROR, WARNING, NOTIFICATION
  10. Events normalized into BatchProcessingEvent records
  11. ProcessorApiHelper#load_status_partition loads Athena partitions

  12. Lambda invocation (documentexporter):

  13. NgeExportJob calls Aws::Lambda::Client.invoke() with async Event mode
  14. Lambda name: {region}-{env}-nge-export-lambda
  15. Payload: bucket, export_id, manifest_zip, volumes

  16. HTTP API (documentpageservice):

  17. NgePageService service object
  18. URL from SSM Parameter Store: /nge/dps/{env}/api/apiUrl
  19. Operations: reorder, rotate, add, remove, split pages

Database Schema (2082 lines)

Shared database tables (all cases share):

Table Purpose
users User accounts (UUID username, email_hash, Cognito link)
accounts Organizations with billing, plans, ingestion limits
account_users User → Account many-to-many
npcases Case directory (active/archived/disabled)
npcase_users User → Case access with role assignments
roles, role_actions RBAC definitions
processing_jobs Legacy job tracking
ai_jobs, ai_job_chunks AI processing tracking
elasticsearch_indexers ES reindex job tracking

Per-case database tables (isolated per case):

Table Purpose
exhibits Documents — metadata, status, coding, privileges
attachments Document pages/files — S3 paths, verified page counts
batches Import batches with status tracking
batch_sources, batch_parts Batch structure
batch_processing_events NGE event tracking
labels, exh_designations Document coding/tagging
exports, export_volumes, export_exhibits Export production
depositions, deposition_volumes Depositions
custom_fields User-defined metadata fields
saved_searches Persisted search queries
privilege_logs, confidentiality_logs Audit trails
model_audits Change history

Key Models

Exhibit (app/models/exhibit.rb — 39KB): - 39 associations (labels, attachments, designations, exports, etc.) - Denormalized search fields for Elasticsearch - NGE support: nutrient_id, nge_file_hash, nge_enabled? - Billing: billing_size, verified_page_count - Concerns: ExhibitSearchable, ExhibitExportable, ExhibitNutrientAction

Batch (app/models/batch.rb — 41KB): - Import types: native, produced, wire_transfer, split_result - Statuses: preprocessing, processing, complete, error, cancelled - NGE tracking: loader_status, loader_status_updated_at_gmt - batch_processing_events association for NGE event tracking

Npcase (app/models/npcase.rb): - Case types: trial_prep (1), review (2) - Statuses: active, active_no_charge, archived, disabled, deleted, pending_archiving - nge_enabled? flag controls Legacy vs NGE processing path - Per-case Lambda function detection

Service Architecture

Service Objects (app/services/)

40+ service objects extract complex business logic from controllers/models:

Service Purpose
NgePageService HTTP interface to documentpageservice
EventBridgePublisher AWS EventBridge event publishing
ChatbotService AI chatbot (Bedrock) integration
DataSyncRunner AWS DataSync for data transfer
CloneNpcaseService Case duplication
GenerateInvoiceService Billing invoice generation
ImageMarkupService Document markup operations
BatchContentSummary Batch content analysis
SendSupportRequestService Zendesk ticket creation

Concerns (Model Mixins)

Concern Purpose
ExhibitSearchable Elasticsearch queries for documents
AttachmentSearchable Elasticsearch queries for attachments
DepositionSearchable Deposition text search
ExhibitExportable Export logic
ExhibitNutrientAction Nutrient API integration
GmtTimestampTouchable GMT timestamp maintenance
CachedItem Redis caching
Reindexable Elasticsearch reindexing

Integration Points with NGE

Direct Integration

Integration Direction Mechanism
Document ingestion Rails → NGE ProcessorApi.import() HTTP POST to documentextractor API
Event tracking NGE → Rails Athena polling via NgeCaseTrackerJob
Export Rails → NGE Lambda async invoke via NgeExportJob
Page manipulation Rails → NGE HTTP API via NgePageService
Database schema Shared Same {base}_case_{case_id} convention
S3 paths Shared Same /case_{id}/{type}/{uid}/{file} convention
HMAC-SHA1 auth NGE → Rails documentpageservice calls Rails API

Shared State

State Location Accessed By
Per-case database MySQL/Aurora Rails, NGE modules (same schema)
S3 files S3 bucket Rails, workers, all NGE modules
Batch status batches table Rails (writes), NGE (reads via events)
Processing events batch_processing_events table Rails (reads), NGE (writes via Athena)
NGE service URLs SSM Parameter Store Rails reads, NGE publishes
Secrets AWS Secrets Manager Both Legacy and NGE

NGE Feature Flag

# app/models/npcase.rb
def nge_enabled?
  # Returns true if case uses NGE processing pipeline
  # A case is permanently either NGE or Legacy — not switchable
end

This flag is the primary routing decision for document processing. A case is created as either NGE or Legacy and remains that way — it is not a migration toggle. The Legacy code has separate paths throughout based on this flag.

Patterns to Preserve vs Deprecate

Preserve

  • PerCaseModel multi-tenancy — the per-case database pattern is shared with NGE and is fundamental to data isolation
  • HMAC-SHA1 API auth — still used by NGE services calling back to Rails
  • Sidekiq for non-document jobs — bulk operations, reports, and admin tasks don't need NGE-level infrastructure
  • Elasticsearch search pipeline — custom query DSL is a competitive feature
  • RBAC authorization — database-driven roles scale well
  • Service object pattern — clean separation of business logic
  • Draper decorators — effective view-model separation
  • S3 path conventions — shared contract with NGE (do not change)
  • BatchProcessingEvent tracking — bridges Legacy UI with NGE processing

Deprecate

  • Legacy document workers — replaced by NGE modules for all new cases
  • XML-based worker API — replaced by direct DB access in NGE
  • ApplicationController 43KB god class — should be decomposed
  • Session-based auth for API — modern services use JWT/OAuth2
  • YAML configuration files — moving to env vars + Secrets Manager
  • Polling-based NGE events — future: direct SNS → Rails webhook or EventBridge
  • Global admin flag on User model — should migrate to proper admin roles

Key File Locations

File Purpose
app/models/per_case_model.rb Multi-tenancy foundation
app/controllers/application_controller.rb Auth, session, HMAC (43KB)
app/models/exhibit.rb Document model (39KB, 39 associations)
app/models/batch.rb Import batch model (41KB)
app/models/npcase.rb Case entity with nge_enabled? flag
app/models/user.rb User with Cognito integration
app/sidekiq/background_processing.rb Sidekiq base class
app/sidekiq/nge_case_tracker_job.rb NGE event polling (Athena)
app/sidekiq/nge_export_job.rb documentexporter Lambda invocation
app/services/nge_page_service.rb documentpageservice HTTP client
lib/processor_api.rb HTTP client for NGE Processor API (import trigger)
app/helpers/processor_api_helper.rb NGE import payload builder (mixed into Batch)
app/models/batch/as_upload.rb Upload entry point with NGE workflow trigger
lib/nextpoint_cognito.rb Cognito SRP authentication
lib/shared/nextpoint_api.rb HMAC-SHA1 API client (shared_libs)
lib/shared/nextpoint_s3.rb S3 operations (shared_libs)
lib/search/ Elasticsearch query DSL and transforms
db/schema.rb Complete schema (2082 lines)
config/database.yml DB config with writer + reader_host
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.