ADR-007: Extract Custom Reports to Lambda/Step Functions Service¶
Status¶
Proposed
Date¶
2026-03-19
Context¶
Reports are generated by 8+ Sidekiq jobs in the Rails monolith. Each job follows a similar pattern: query data, generate CSV, upload to S3, notify user via email.
Current Report Inventory¶
| Job | Data Source | Output | Delivery | Complexity |
|---|---|---|---|---|
CustomReportJob |
Core DB (Npcase, Account, UserActivity) | CSV | Email + S3 link | Low |
UserActivityReportJob |
Core DB + per-case DBs (cross-case) | CSV | Email attachment | Medium |
TranscriptMetadataReportJob |
Per-case DB (DepositionVolume) | CSV | Email attachment | Low |
DepositionSummaryReportJob |
Per-case DB (Deposition, Labels) | CSV | S3 + email link | Low |
PageCountReportForRelevancyJob |
Per-case DB (Exhibit) | Email body | Email inline | Low |
ReviewLogJob |
Per-case DB (Relevancy/Privilege/ConfidentialityLog) | CSV | S3 + email link | Low |
CodingOverlayJob |
Per-case DB + CSV load file | CSV | Email + S3 | Medium |
DepositionSearchPdfJob |
Elasticsearch + per-case DB | S3 + email + alert | Medium | |
SearchHitReport pipeline |
ES (PIT) + Glue + Athena | Parquet + CSV | In-app download | High |
GridDataExportJob |
Per-case DB (Exhibit grid data) | CSV | Email + S3 | Low |
Key Patterns¶
- All extend
BackgroundProcessing(Sidekiq base) - All use
PerCaseModel.set_case(case_id)for multi-tenant DB - All track progress via
TrackedBackgroundJob - Output is always CSV (one exception: PDF for deposition search)
- No XLSX output exists anywhere
- 4 delivery mechanisms: email attachment, S3 + email link, email body, in-app download
Why Extract?¶
- Stateless and bounded — each report is a self-contained query → format → deliver pipeline
- No shared state — reports don't interact with each other
- Long-running reports block Sidekiq —
UserActivityReportJobcrosses multiple case DBs - Natural fit for Lambda — query data, generate file, upload S3, send notification
Decision¶
Extract reports into a Lambda + Step Functions service for reports that need orchestration, and simple Lambda functions for single-query reports.
Architecture¶
Rails App
│
├── SNS: ReportRequested
│ ├── type: user_activity | transcript_metadata | deposition_summary | ...
│ ├── case_id, user_id, parameters
│ │
│ ▼
│ SQS Queue → Lambda: ReportRouter
│ │
│ ├── Simple reports (single query → CSV → S3 → email)
│ │ ├── TranscriptMetadataReport
│ │ ├── DepositionSummaryReport
│ │ ├── PageCountReport
│ │ ├── ReviewLogReport
│ │ └── GridDataExport
│ │
│ └── Complex reports (Step Functions)
│ ├── UserActivityReport (cross-case DB queries)
│ ├── CodingOverlayReport (load file parse + match + apply)
│ └── SearchHitReport (ES + Glue + Athena — already partially extracted)
│
│ Output: S3 + SES email notification
│ │
│ ▼
│ SNS: ReportCompleted → PSM
Simple Reports (Lambda only)¶
Each simple report is a single Lambda function: 1. Receive event from SQS (case_id, user_id, parameters) 2. Connect to per-case MySQL (writer_session for core DB, reader for per-case) 3. Execute query, stream results to CSV 4. Upload CSV to S3 5. Send email notification via SES 6. Emit completion event to SNS
Complex Reports (Step Functions)¶
For reports that need orchestration:
UserActivityReport Step Function:
1. Query core DB for account/user list
2. Fan out: parallel Lambda per case (query per-case activity data)
3. Aggregate: combine results into single CSV
4. Upload to S3 + notify
SearchHitReport Integration¶
The existing search-hit-report-backend Lambda already handles the heaviest report.
This ADR proposes wrapping it into the report service's event model:
ReportRequested (type: search_hit)
→ Lambda: invoke search-hit-report-backend Lambda
→ search-hit-report-backend does its existing ES + Glue + Athena pipeline
→ ReportCompleted event
What Stays in Rails¶
| Component | Why |
|---|---|
CustomReport / CustomReportType models |
Report metadata, scheduling, CRUD UI |
| Report controllers | UI for configuring and triggering reports |
Export / ExportVolume records |
Download management in the existing UI |
DepositionSearchPdfJob |
PDF generation is more complex (ES + Prawn PDF) — extract later |
Client-specific reports (MaronMarvel) |
Too custom to generalize; stays in Rails |
Consequences¶
Positive¶
- Independent scaling — heavy reports don't block Sidekiq normal queue
- Consistent delivery — all reports follow the same S3 + SES pattern
- Step Functions visibility — complex report orchestration is visible in the AWS console
- Natural parallelism — cross-case reports fan out to parallel Lambdas per case
- SearchHitReport convergence — existing prototype gets a standard trigger mechanism
Negative¶
- SES integration — current email delivery uses
NextPointEmailerwith pirate-speak staging subjects; Lambda needs equivalent - Cross-case queries —
UserActivityReportJobcallsPerCaseModel.temporarily_set_caseacross multiple cases; Lambda equivalent needs careful connection management - No XLSX — users may want XLSX; adding it later requires a new dependency (openpyxl in Python or similar)
Risks¶
- Email formatting parity — Rails email templates have evolved over years with edge cases. Lambda SES emails must match formatting expectations.
- Report scheduling —
CustomReportJobsupports scheduled reports viacreate_background_jobwith delays. Lambda equivalent needs EventBridge Scheduler or similar.
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.