ADR-007: Extract Custom Reports to Lambda/Step Functions Service¶

Status¶

Proposed

Date¶

2026-03-19

Context¶

Reports are generated by 8+ Sidekiq jobs in the Rails monolith. Each job follows a similar pattern: query data, generate CSV, upload to S3, notify user via email.

Current Report Inventory¶

Job	Data Source	Output	Delivery	Complexity
`CustomReportJob`	Core DB (Npcase, Account, UserActivity)	CSV	Email + S3 link	Low
`UserActivityReportJob`	Core DB + per-case DBs (cross-case)	CSV	Email attachment	Medium
`TranscriptMetadataReportJob`	Per-case DB (DepositionVolume)	CSV	Email attachment	Low
`DepositionSummaryReportJob`	Per-case DB (Deposition, Labels)	CSV	S3 + email link	Low
`PageCountReportForRelevancyJob`	Per-case DB (Exhibit)	Email body	Email inline	Low
`ReviewLogJob`	Per-case DB (Relevancy/Privilege/ConfidentialityLog)	CSV	S3 + email link	Low
`CodingOverlayJob`	Per-case DB + CSV load file	CSV	Email + S3	Medium
`DepositionSearchPdfJob`	Elasticsearch + per-case DB	PDF	S3 + email + alert	Medium
`SearchHitReport pipeline`	ES (PIT) + Glue + Athena	Parquet + CSV	In-app download	High
`GridDataExportJob`	Per-case DB (Exhibit grid data)	CSV	Email + S3	Low

Key Patterns¶

All extend BackgroundProcessing (Sidekiq base)
All use PerCaseModel.set_case(case_id) for multi-tenant DB
All track progress via TrackedBackgroundJob
Output is always CSV (one exception: PDF for deposition search)
No XLSX output exists anywhere
4 delivery mechanisms: email attachment, S3 + email link, email body, in-app download

Why Extract?¶

Stateless and bounded — each report is a self-contained query → format → deliver pipeline
No shared state — reports don't interact with each other
Long-running reports block Sidekiq — UserActivityReportJob crosses multiple case DBs
Natural fit for Lambda — query data, generate file, upload S3, send notification

Decision¶

Extract reports into a Lambda + Step Functions service for reports that need orchestration, and simple Lambda functions for single-query reports.

Architecture¶

Rails App
  │
  ├── SNS: ReportRequested
  │     ├── type: user_activity | transcript_metadata | deposition_summary | ...
  │     ├── case_id, user_id, parameters
  │     │
  │     ▼
  │   SQS Queue → Lambda: ReportRouter
  │     │
  │     ├── Simple reports (single query → CSV → S3 → email)
  │     │   ├── TranscriptMetadataReport
  │     │   ├── DepositionSummaryReport
  │     │   ├── PageCountReport
  │     │   ├── ReviewLogReport
  │     │   └── GridDataExport
  │     │
  │     └── Complex reports (Step Functions)
  │         ├── UserActivityReport (cross-case DB queries)
  │         ├── CodingOverlayReport (load file parse + match + apply)
  │         └── SearchHitReport (ES + Glue + Athena — already partially extracted)
  │
  │   Output: S3 + SES email notification
  │     │
  │     ▼
  │   SNS: ReportCompleted → PSM

Simple Reports (Lambda only)¶

Each simple report is a single Lambda function: 1. Receive event from SQS (case_id, user_id, parameters) 2. Connect to per-case MySQL (writer_session for core DB, reader for per-case) 3. Execute query, stream results to CSV 4. Upload CSV to S3 5. Send email notification via SES 6. Emit completion event to SNS

Complex Reports (Step Functions)¶

For reports that need orchestration:

UserActivityReport Step Function:
  1. Query core DB for account/user list
  2. Fan out: parallel Lambda per case (query per-case activity data)
  3. Aggregate: combine results into single CSV
  4. Upload to S3 + notify

SearchHitReport Integration¶

The existing search-hit-report-backend Lambda already handles the heaviest report. This ADR proposes wrapping it into the report service's event model:

ReportRequested (type: search_hit)
  → Lambda: invoke search-hit-report-backend Lambda
  → search-hit-report-backend does its existing ES + Glue + Athena pipeline
  → ReportCompleted event

What Stays in Rails¶

Component	Why
`CustomReport` / `CustomReportType` models	Report metadata, scheduling, CRUD UI
Report controllers	UI for configuring and triggering reports
`Export` / `ExportVolume` records	Download management in the existing UI
`DepositionSearchPdfJob`	PDF generation is more complex (ES + Prawn PDF) — extract later
Client-specific reports (`MaronMarvel`)	Too custom to generalize; stays in Rails

Consequences¶

Positive¶

Independent scaling — heavy reports don't block Sidekiq normal queue
Consistent delivery — all reports follow the same S3 + SES pattern
Step Functions visibility — complex report orchestration is visible in the AWS console
Natural parallelism — cross-case reports fan out to parallel Lambdas per case
SearchHitReport convergence — existing prototype gets a standard trigger mechanism

Negative¶

SES integration — current email delivery uses NextPointEmailer with pirate-speak staging subjects; Lambda needs equivalent
Cross-case queries — UserActivityReportJob calls PerCaseModel.temporarily_set_case across multiple cases; Lambda equivalent needs careful connection management
No XLSX — users may want XLSX; adding it later requires a new dependency (openpyxl in Python or similar)

Risks¶

Email formatting parity — Rails email templates have evolved over years with edge cases. Lambda SES emails must match formatting expectations.
Report scheduling — CustomReportJob supports scheduled reports via create_background_job with delays. Lambda equivalent needs EventBridge Scheduler or similar.

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.