Skip to content

Article Review: Group 3 — EventStorming & Event-Driven Architecture

Articles Reviewed

  1. Introducing EventStorming — Alberto Brandolini (book, 28M) — THE methodology for discovering domain events and designing event flows
  2. Event Storming, Black Magic or Real? — Alex Dorand / Medium — Practical EventStorming workshop guide
  3. Event Types — Event-Driven Architecture — Alex Dorand / Medium — Event type taxonomy and characteristics
  4. Mastering EDA Part 10: Choreography vs Orchestration — Rahul Krishnan / Medium — Coordination pattern comparison

All four articles read in full (text extracted via pdftotext for large files).

Key Concepts

EventStorming Methodology

EventStorming is a collaborative workshop technique for discovering domain events, commands, aggregates, and bounded contexts. The core workflow:

  1. Domain Events (orange stickies) — past-tense facts: "Document Uploaded", "Exhibit Created"
  2. Commands (blue stickies) — triggers: "Upload Document", "Create Exhibit"
  3. Aggregates (yellow stickies) — clusters of domain logic that handle commands
  4. Policies (lilac stickies) — reactive logic: "When X happens, do Y"
  5. Read Models (green stickies) — data views that inform decisions
  6. External Systems (pink stickies) — systems outside your boundary
  7. Hot Spots (red stickies) — areas of confusion, conflict, or risk

The output is a shared understanding of the domain and a natural map to bounded contexts (which become modules/services).

Three flavors (from Brandolini): - Big Picture — full business process, cross-silo, 2-3 hours, discover hot spots and bounded contexts - Design-Level — zoom into one bounded context, discover aggregates, commands→events flows - Value Stream — explore where value is created and destroyed in the process

Key patterns from the book: - "Conquer First, Divide Later" — don't prematurely partition into services; get the full picture first - "Domain Events are triggers for consequences" — events cause downstream reactions (our SNS→SQS model) - "Domain Events as state transitions" — events capture before/after state (our lifecycle events) - Composite Domain Event (Ch. 25) — aggregation of multiple events into a higher-level event (our BATCH_END_FINISHED) - Hot spots mark confusion, risk, and discussion points — use them to identify where to focus next

Practical guide highlights (Dorand): - Full walkthrough: Events → Commands → Systems → Actors → Input Data → Read Models → Policies → Hot Spots → Aggregates - Policy stickies (dark purple) = reactive logic ("when X happens, do Y") — maps directly to our SNS filter subscriptions - The progression naturally goes from business-readable stickies to DDD aggregates to code

Event Type Taxonomy (Dorand)

The article defines a hierarchy of event types:

Type Description NGE Example
Simple Events State change with previous + current state DOCUMENT_LOADED
Composite Events Multiple correlated events aggregated BATCH_END_FINISHED (aggregates all doc events)
Temporal Events Time-triggered PSM polling for batch completion
System Events Infrastructure state changes Lambda cold start, ECS task started
Business Events Business logic state changes IMPORT_CANCELLED
Error Events Errors from producing systems SNS publish with status=FAILED
Lifecycle Events Entity stage transitions JOB_STARTED → JOB_FINISHED
Transactional Events Part of a transaction, can be compensated Requeue with retry_count (compensation pattern)

The two fundamental categories that matter most: - Notification Events: "Something happened" (no payload beyond the fact) - Event-Carried State Transfer (ECST): "Something happened, here's the detail"

Event Characteristics

Characteristic Description Our Pattern
Stateful Event carries all data consumer needs Our SNS messages carry full context (caseId, batchId, documentId, eventDetail)
Stateless Event carries pointer to data Not used — we embed state
Delta Event carries only the change Not used

Choreography vs Orchestration

Aspect Choreography Orchestration
Coordination Each service reacts to events independently Central coordinator directs the flow
Coupling Services know only about events, not each other Coordinator knows all steps
Failure handling Each service handles its own failures Coordinator handles compensation/rollback
Visibility Hard to see the full flow Easy to see in the orchestrator
Scalability No bottleneck Coordinator can become a bottleneck
Complexity Grows with number of services Contained in the orchestrator
AWS implementation SNS/SQS fan-out Step Functions

Third pattern: Event Processing (from Krishnan) — real-time analysis of event streams for pattern recognition, anomaly detection, and derived insights. This is complementary to both choreography and orchestration — it's an analytics layer, not a coordination pattern. Implemented via stream processing engines (Kafka, Flink) or in our case, Athena queries over the PSM event stream.

Hybrid pattern: Use choreography for the happy path (event-driven, loosely coupled), orchestration for complex error handling/sagas, and event processing for observability and completion detection.

Mapping to NGE Architecture

What We Do Right

1. VALIDATED: Pure Choreography via SNS (ADR-001)

Our architecture uses choreography exclusively — modules publish events to SNS, subscribers react independently via SQS filter policies. This is the right choice for our document processing pipeline because: - Each step is independent (extract → load → upload are separate concerns) - Fan-out is natural (PSM observes all events without being in the critical path) - No single point of coordination failure - Adding a new consumer requires zero changes to producers

2. VALIDATED: Events Are Past-Tense Facts

Our naming convention (DOCUMENT_LOADED, JOB_FINISHED, BATCH_END_FINISHED) correctly uses past tense. Events describe what happened, not what should happen. This aligns with both EventStorming and the Event Types taxonomy.

3. VALIDATED: Event-Carried State Transfer Pattern

Our SNS messages carry full context (caseId, batchId, documentId, eventDetail, status, timestamp). Consumers don't need to call back to the producer to get state. This reduces coupling and makes events self-contained.

4. VALIDATED: Lifecycle Events for Batch Processing

Our event flow maps cleanly to the Lifecycle Events pattern:

JOB_STARTED → DOCUMENT_PROCESSED → DOCUMENT_LOADED → LOADER_FINISHED → BATCH_END_START → BATCH_END_FINISHED
Each event represents a stage transition in the batch/document lifecycle, with PSM capturing the full event stream for observability.

5. VALIDATED: PSM as Event Processing Layer

Krishnan's third pattern — Event Processing — describes a complementary layer that analyzes event streams for patterns and completion detection. Our PSM (ProcessingStatusManager) is exactly this: it observes all SNS events via Athena queries to detect batch completion, track progress, and trigger lifecycle transitions. PSM doesn't coordinate services (that's choreography); it processes the event stream for derived insights.

Potential Issues Found

1. MEDIUM: No Formal Event Schema / Contract

Our events follow a convention (documented in sns-event-publishing pattern) but there's no enforced schema. Producers and consumers are coupled through implicit knowledge of the message structure. The Event Types article highlights that schema-coupled systems are "loosely coupled" while schema-free systems are "somewhat decoupled" but fragile — changes in event structure can silently break consumers.

Current state: The message_json structure in sns_ops.py is the de facto schema, but nothing prevents a producer from omitting fields or changing the structure.

Recommendation: For ADRs 005-010 (new modules), consider adding a lightweight event schema validation — either a Pydantic model for the SNS message or SNS message schema validation. This becomes more important as the number of modules and event types grows.

2. MEDIUM: Missing Composite Events for Cross-Module Coordination

The Event Types article describes composite events — aggregations of multiple simple events. Our BATCH_END_FINISHED is close to this (it fires after all documents in a batch are processed), but the aggregation logic lives in PSM polling Athena, not in the event system itself.

For ADR-005 (Bulk Operations), composite events would be valuable: "All documents in case X have been bates-stamped" requires aggregating completion events from many individual stamps.

Recommendation: Document the composite event pattern as a first-class concept in our event patterns. PSM already implements this via Athena queries; formalize it.

3. LOW: No Event Catalog / Registry

As the number of modules grows (ADRs 005-010), tracking which events exist, who produces them, and who consumes them becomes increasingly difficult. EventStorming workshops produce this naturally, but it's not persisted.

Recommendation: Create an event-catalog.md in the architecture repo listing all event types, their producers, consumers, and payload schemas. This becomes the living equivalent of an EventStorming board.

4. INFO: Choreography Limits for ADR-005 and ADR-010

Pure choreography works well for our linear pipeline (extract → load → upload). But ADRs 005 (bulk operations) and 010 (deposition processing) may need orchestration for multi-step workflows with compensation logic: - Bulk operations: Apply action → verify → rollback on failure (saga pattern) - Deposition processing: Parse → OCR → align → theater (sequential dependencies)

Recommendation: Consider Step Functions for these specific ADRs while keeping choreography for the standard document pipeline. This is the hybrid pattern: choreography by default, orchestration when saga/compensation logic is needed.

5. INFO: EventStorming as a Design Tool for ADRs 005-010

The EventStorming methodology is directly applicable to designing the event flows for each new module. Before implementing ADRs 005-010, running an EventStorming exercise (even solo with sticky notes or a digital board) would: - Identify all domain events for the new module - Discover commands that trigger them - Find hot spots (areas of uncertainty) - Map bounded contexts (module boundaries) - Identify policies (reactive logic that becomes SNS subscriptions)

New Backlog Items

Item Priority Related
Add event schema validation (Pydantic model) for new modules MEDIUM ADRs 005-010
Create event-catalog.md listing all event types, producers, consumers LOW sns-event-publishing pattern
Evaluate Step Functions for ADR-005 (bulk ops saga) and ADR-010 (deposition orchestration) LOW ADR-005, ADR-010

Summary

Our NGE architecture aligns well with EDA best practices: pure choreography via SNS (ADR-001), past-tense event naming, event-carried state transfer, and lifecycle events for batch processing. The Event Types taxonomy validates our approach and provides vocabulary for categorizing our events. The primary gaps are operational: no formal event schema enforcement and no event catalog. For the upcoming modernization (ADRs 005-010), EventStorming is the recommended design method, and some modules (bulk operations, depositions) may benefit from hybrid choreography+orchestration using Step Functions for saga/compensation flows.

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.