Skip to content

ADR-007: Extract Custom Reports to Lambda/Step Functions Service

Status

Proposed

Date

2026-03-19

Context

Reports are generated by 8+ Sidekiq jobs in the Rails monolith. Each job follows a similar pattern: query data, generate CSV, upload to S3, notify user via email.

Current Report Inventory

Job Data Source Output Delivery Complexity
CustomReportJob Core DB (Npcase, Account, UserActivity) CSV Email + S3 link Low
UserActivityReportJob Core DB + per-case DBs (cross-case) CSV Email attachment Medium
TranscriptMetadataReportJob Per-case DB (DepositionVolume) CSV Email attachment Low
DepositionSummaryReportJob Per-case DB (Deposition, Labels) CSV S3 + email link Low
PageCountReportForRelevancyJob Per-case DB (Exhibit) Email body Email inline Low
ReviewLogJob Per-case DB (Relevancy/Privilege/ConfidentialityLog) CSV S3 + email link Low
CodingOverlayJob Per-case DB + CSV load file CSV Email + S3 Medium
DepositionSearchPdfJob Elasticsearch + per-case DB PDF S3 + email + alert Medium
SearchHitReport pipeline ES (PIT) + Glue + Athena Parquet + CSV In-app download High
GridDataExportJob Per-case DB (Exhibit grid data) CSV Email + S3 Low

Key Patterns

  • All extend BackgroundProcessing (Sidekiq base)
  • All use PerCaseModel.set_case(case_id) for multi-tenant DB
  • All track progress via TrackedBackgroundJob
  • Output is always CSV (one exception: PDF for deposition search)
  • No XLSX output exists anywhere
  • 4 delivery mechanisms: email attachment, S3 + email link, email body, in-app download

Why Extract?

  1. Stateless and bounded — each report is a self-contained query → format → deliver pipeline
  2. No shared state — reports don't interact with each other
  3. Long-running reports block SidekiqUserActivityReportJob crosses multiple case DBs
  4. Natural fit for Lambda — query data, generate file, upload S3, send notification

Decision

Extract reports into a Lambda + Step Functions service for reports that need orchestration, and simple Lambda functions for single-query reports.

Architecture

Rails App
  ├── SNS: ReportRequested
  │     ├── type: user_activity | transcript_metadata | deposition_summary | ...
  │     ├── case_id, user_id, parameters
  │     │
  │     ▼
  │   SQS Queue → Lambda: ReportRouter
  │     │
  │     ├── Simple reports (single query → CSV → S3 → email)
  │     │   ├── TranscriptMetadataReport
  │     │   ├── DepositionSummaryReport
  │     │   ├── PageCountReport
  │     │   ├── ReviewLogReport
  │     │   └── GridDataExport
  │     │
  │     └── Complex reports (Step Functions)
  │         ├── UserActivityReport (cross-case DB queries)
  │         ├── CodingOverlayReport (load file parse + match + apply)
  │         └── SearchHitReport (ES + Glue + Athena — already partially extracted)
  │   Output: S3 + SES email notification
  │     │
  │     ▼
  │   SNS: ReportCompleted → PSM

Simple Reports (Lambda only)

Each simple report is a single Lambda function: 1. Receive event from SQS (case_id, user_id, parameters) 2. Connect to per-case MySQL (writer_session for core DB, reader for per-case) 3. Execute query, stream results to CSV 4. Upload CSV to S3 5. Send email notification via SES 6. Emit completion event to SNS

Complex Reports (Step Functions)

For reports that need orchestration:

UserActivityReport Step Function:
  1. Query core DB for account/user list
  2. Fan out: parallel Lambda per case (query per-case activity data)
  3. Aggregate: combine results into single CSV
  4. Upload to S3 + notify

SearchHitReport Integration

The existing search-hit-report-backend Lambda already handles the heaviest report. This ADR proposes wrapping it into the report service's event model:

ReportRequested (type: search_hit)
  → Lambda: invoke search-hit-report-backend Lambda
  → search-hit-report-backend does its existing ES + Glue + Athena pipeline
  → ReportCompleted event

What Stays in Rails

Component Why
CustomReport / CustomReportType models Report metadata, scheduling, CRUD UI
Report controllers UI for configuring and triggering reports
Export / ExportVolume records Download management in the existing UI
DepositionSearchPdfJob PDF generation is more complex (ES + Prawn PDF) — extract later
Client-specific reports (MaronMarvel) Too custom to generalize; stays in Rails

Consequences

Positive

  • Independent scaling — heavy reports don't block Sidekiq normal queue
  • Consistent delivery — all reports follow the same S3 + SES pattern
  • Step Functions visibility — complex report orchestration is visible in the AWS console
  • Natural parallelism — cross-case reports fan out to parallel Lambdas per case
  • SearchHitReport convergence — existing prototype gets a standard trigger mechanism

Negative

  • SES integration — current email delivery uses NextPointEmailer with pirate-speak staging subjects; Lambda needs equivalent
  • Cross-case queriesUserActivityReportJob calls PerCaseModel.temporarily_set_case across multiple cases; Lambda equivalent needs careful connection management
  • No XLSX — users may want XLSX; adding it later requires a new dependency (openpyxl in Python or similar)

Risks

  • Email formatting parity — Rails email templates have evolved over years with edge cases. Lambda SES emails must match formatting expectations.
  • Report schedulingCustomReportJob supports scheduled reports via create_background_job with delays. Lambda equivalent needs EventBridge Scheduler or similar.
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.