Skip to content

ADR-005: Extract Bulk Operations to Lambda Service

Status

Proposed

Date

2026-03-19

Context

Bulk document operations are currently handled by Sidekiq jobs in the Rails monolith, dominated by a single 784-line BulkActionJob that handles 15+ distinct operations.

Current State

Job Lines What It Does
BulkActionJob 784 God object: labels, tags, fields, review status, privilege, confidentiality, custodians, placeholders, bates removal, redaction foldering
BulkDeleteJob 47 Trash/permanently delete exhibits (batches of 100)
BulkRestoreJob 39 Restore trashed exhibits + labels
BulkLabelActivationJob 16 Activate/deactivate labels
BulkLabelDestroyJob 28 Delete labels + archive search reports
BulkSubreviewAssignmentJob 23 Assign/unassign users to subreview folders

Key patterns in current code: - All jobs extend BackgroundProcessing (Sidekiq base class) - PerCaseModel.set_case(case_id) for multi-tenant DB connection - Exhibits processed in slices of 100-1000 - TrackedBackgroundJob for progress tracking (polled by frontend) - kick_off_indexing_for_exhibit_ids(slice) after each batch for ES reindexing - Manual retry logic for MySQL lock wait timeouts (3 retries, 1s sleep) - Redis for respawn attempt counting - BulkActionJob checks FieldLock, safe_to_modify_confidentiality?, Nutrient integration

Trigger point: DocumentsControllerMixins::DocumentBulkManipulationBulkActionJob.perform_async

Why Extract?

  1. BulkActionJob is a god object — 15+ operations in one 784-line file with complex branching
  2. Sidekiq contention — bulk ops compete with all other Sidekiq jobs on shared Redis queues
  3. No horizontal scaling — Sidekiq workers scale with the Rails deployment, not independently
  4. Progress tracking is poll-based — frontend polls TrackedBackgroundJob table
  5. NGE integration already exists — confidentiality updates already call update_nutrient_confidentiality

Decision

Extract bulk operations into a Lambda-based service following the NGE service module pattern.

Architecture

Rails App
  ├── POST /documents/bulk_update  (existing)
  │     │
  │     ▼
  │   SNS: BulkOperationRequested
  │     │
  │     ▼
  │   SQS Queue → Lambda Handler
  │     │
  │     ├── BulkLabelProcessor      (labels, designations, reorder)
  │     ├── BulkFieldProcessor      (shortcut, author, doc_type, notes, date)
  │     ├── BulkReviewProcessor     (review status, privilege, confidentiality)
  │     ├── BulkCustodianProcessor  (add/remove/update custodians)
  │     ├── BulkDeleteProcessor     (trash, permanent delete, restore)
  │     └── BulkTagProcessor        (add/remove tags)
  │           │
  │           ▼
  │       Per-case MySQL (writer_session)
  │           │
  │           ▼
  │       SNS: BulkOperationProgress / BulkOperationCompleted
  │           │
  │           ▼
  │       PSM (Athena) ← Rails polls for progress (existing pattern)
  └── Nutrient API (for NGE confidentiality/bates)

Phase 1: Decompose BulkActionJob (in Rails first)

Before extracting to Lambda, decompose the god object into focused service classes within Rails. This is zero-risk refactoring:

# app/services/bulk_operations/
├── label_processor.rb        # Labels, designations, reorder
├── field_processor.rb        # Shortcut, author, doc_type, notes, date
├── review_processor.rb       # Review status, privilege, confidentiality
├── custodian_processor.rb    # Add/remove/update custodians
├── tag_processor.rb          # Add/remove tags
├── delete_processor.rb       # Trash, delete, restore
├── placeholder_processor.rb  # Non-imaged placeholders, bates removal
└── base_processor.rb         # Shared: exhibit loading, slicing, progress, ES reindex

BulkActionJob becomes a thin dispatcher that routes to the correct processor.

Phase 2: Extract to Lambda

Move each processor to a Lambda function following the NGE hexagonal pattern: - core/ — pure business logic (processor classes) - shell/ — MySQL session, Nutrient client, ES reindex trigger - handlers/ — SQS event parsing, routing, error handling

Phase 3: Progress via PSM

Replace TrackedBackgroundJob polling with PSM events (same pattern as batch processing): - Lambda emits progress events to SNS - PSM captures all events via Firehose → Parquet → Athena - Rails polls Athena for progress (existing NgeCaseTrackerJob pattern)

What Stays in Rails

  • UI controllers — bulk edit modal, parameter validation, exhibit ID expansion
  • ES indexing trigger — the Lambda emits IndexingRequested events; existing indexer picks them up
  • Permission checkscan_bulk_update_folder? etc. stay in Rails authorization layer

Consequences

Positive

  • God object eliminated — 784-line BulkActionJob decomposed into focused processors
  • Independent scaling — Lambda scales with bulk operation volume, not Rails deployment
  • Consistent architecture — follows same SNS/SQS/Lambda/PSM pattern as document processing
  • Phase 1 is zero-risk — decomposition happens within Rails first, no infrastructure changes
  • Retry handling improves — SQS visibility timeout + DLQ replaces manual 3-retry MySQL lock logic

Negative

  • Two execution paths — during migration, some operations run in Rails, others in Lambda
  • Latency increase — SNS→SQS→Lambda adds ~1-2s vs direct Sidekiq enqueue
  • Multi-tenant complexity — Lambda needs same PerCaseModel.set_case equivalent in Python

Risks

  • BulkActionJob has hidden coupling — 784 lines likely contain edge cases not visible from structure analysis. Phase 1 (in-Rails decomposition) mitigates this by surfacing all coupling before extraction.
  • ES reindexing coordination — bulk ops trigger reindexing after each slice. Must ensure Lambda→ES reindex trigger works reliably.
  • Nutrient API calls from Lambda — confidentiality updates require Nutrient API access. Must validate network path from Lambda VPC to Nutrient service.
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.