Article Review: Group 3 — EventStorming & Event-Driven Architecture¶

Articles Reviewed¶

Introducing EventStorming — Alberto Brandolini (book, 28M) — THE methodology for discovering domain events and designing event flows
Event Storming, Black Magic or Real? — Alex Dorand / Medium — Practical EventStorming workshop guide
Event Types — Event-Driven Architecture — Alex Dorand / Medium — Event type taxonomy and characteristics
Mastering EDA Part 10: Choreography vs Orchestration — Rahul Krishnan / Medium — Coordination pattern comparison

All four articles read in full (text extracted via pdftotext for large files).

Key Concepts¶

EventStorming Methodology¶

EventStorming is a collaborative workshop technique for discovering domain events, commands, aggregates, and bounded contexts. The core workflow:

Domain Events (orange stickies) — past-tense facts: "Document Uploaded", "Exhibit Created"
Commands (blue stickies) — triggers: "Upload Document", "Create Exhibit"
Aggregates (yellow stickies) — clusters of domain logic that handle commands
Policies (lilac stickies) — reactive logic: "When X happens, do Y"
Read Models (green stickies) — data views that inform decisions
External Systems (pink stickies) — systems outside your boundary
Hot Spots (red stickies) — areas of confusion, conflict, or risk

The output is a shared understanding of the domain and a natural map to bounded contexts (which become modules/services).

Three flavors (from Brandolini): - Big Picture — full business process, cross-silo, 2-3 hours, discover hot spots and bounded contexts - Design-Level — zoom into one bounded context, discover aggregates, commands→events flows - Value Stream — explore where value is created and destroyed in the process

Key patterns from the book: - "Conquer First, Divide Later" — don't prematurely partition into services; get the full picture first - "Domain Events are triggers for consequences" — events cause downstream reactions (our SNS→SQS model) - "Domain Events as state transitions" — events capture before/after state (our lifecycle events) - Composite Domain Event (Ch. 25) — aggregation of multiple events into a higher-level event (our BATCH_END_FINISHED) - Hot spots mark confusion, risk, and discussion points — use them to identify where to focus next

Practical guide highlights (Dorand): - Full walkthrough: Events → Commands → Systems → Actors → Input Data → Read Models → Policies → Hot Spots → Aggregates - Policy stickies (dark purple) = reactive logic ("when X happens, do Y") — maps directly to our SNS filter subscriptions - The progression naturally goes from business-readable stickies to DDD aggregates to code

Event Type Taxonomy (Dorand)¶

The article defines a hierarchy of event types:

Type	Description	NGE Example
Simple Events	State change with previous + current state	DOCUMENT_LOADED
Composite Events	Multiple correlated events aggregated	BATCH_END_FINISHED (aggregates all doc events)
Temporal Events	Time-triggered	PSM polling for batch completion
System Events	Infrastructure state changes	Lambda cold start, ECS task started
Business Events	Business logic state changes	IMPORT_CANCELLED
Error Events	Errors from producing systems	SNS publish with status=FAILED
Lifecycle Events	Entity stage transitions	JOB_STARTED → JOB_FINISHED
Transactional Events	Part of a transaction, can be compensated	Requeue with retry_count (compensation pattern)

The two fundamental categories that matter most: - Notification Events: "Something happened" (no payload beyond the fact) - Event-Carried State Transfer (ECST): "Something happened, here's the detail"

Event Characteristics¶

Characteristic	Description	Our Pattern
Stateful	Event carries all data consumer needs	Our SNS messages carry full context (caseId, batchId, documentId, eventDetail)
Stateless	Event carries pointer to data	Not used — we embed state
Delta	Event carries only the change	Not used

Choreography vs Orchestration¶

Aspect	Choreography	Orchestration
Coordination	Each service reacts to events independently	Central coordinator directs the flow
Coupling	Services know only about events, not each other	Coordinator knows all steps
Failure handling	Each service handles its own failures	Coordinator handles compensation/rollback
Visibility	Hard to see the full flow	Easy to see in the orchestrator
Scalability	No bottleneck	Coordinator can become a bottleneck
Complexity	Grows with number of services	Contained in the orchestrator
AWS implementation	SNS/SQS fan-out	Step Functions

Third pattern: Event Processing (from Krishnan) — real-time analysis of event streams for pattern recognition, anomaly detection, and derived insights. This is complementary to both choreography and orchestration — it's an analytics layer, not a coordination pattern. Implemented via stream processing engines (Kafka, Flink) or in our case, Athena queries over the PSM event stream.

Hybrid pattern: Use choreography for the happy path (event-driven, loosely coupled), orchestration for complex error handling/sagas, and event processing for observability and completion detection.

Mapping to NGE Architecture¶

What We Do Right¶

Our architecture uses choreography exclusively — modules publish events to SNS, subscribers react independently via SQS filter policies. This is the right choice for our document processing pipeline because: - Each step is independent (extract → load → upload are separate concerns) - Fan-out is natural (PSM observes all events without being in the critical path) - No single point of coordination failure - Adding a new consumer requires zero changes to producers

2. VALIDATED: Events Are Past-Tense Facts¶

Our naming convention (DOCUMENT_LOADED, JOB_FINISHED, BATCH_END_FINISHED) correctly uses past tense. Events describe what happened, not what should happen. This aligns with both EventStorming and the Event Types taxonomy.

3. VALIDATED: Event-Carried State Transfer Pattern¶

Our SNS messages carry full context (caseId, batchId, documentId, eventDetail, status, timestamp). Consumers don't need to call back to the producer to get state. This reduces coupling and makes events self-contained.

4. VALIDATED: Lifecycle Events for Batch Processing¶

Our event flow maps cleanly to the Lifecycle Events pattern:

JOB_STARTED → DOCUMENT_PROCESSED → DOCUMENT_LOADED → LOADER_FINISHED → BATCH_END_START → BATCH_END_FINISHED

Each event represents a stage transition in the batch/document lifecycle, with PSM capturing the full event stream for observability.

5. VALIDATED: PSM as Event Processing Layer¶

Krishnan's third pattern — Event Processing — describes a complementary layer that analyzes event streams for patterns and completion detection. Our PSM (ProcessingStatusManager) is exactly this: it observes all SNS events via Athena queries to detect batch completion, track progress, and trigger lifecycle transitions. PSM doesn't coordinate services (that's choreography); it processes the event stream for derived insights.

Potential Issues Found¶

1. MEDIUM: No Formal Event Schema / Contract¶

Our events follow a convention (documented in sns-event-publishing pattern) but there's no enforced schema. Producers and consumers are coupled through implicit knowledge of the message structure. The Event Types article highlights that schema-coupled systems are "loosely coupled" while schema-free systems are "somewhat decoupled" but fragile — changes in event structure can silently break consumers.

Current state: The message_json structure in sns_ops.py is the de facto schema, but nothing prevents a producer from omitting fields or changing the structure.

Recommendation: For ADRs 005-010 (new modules), consider adding a lightweight event schema validation — either a Pydantic model for the SNS message or SNS message schema validation. This becomes more important as the number of modules and event types grows.

2. MEDIUM: Missing Composite Events for Cross-Module Coordination¶

The Event Types article describes composite events — aggregations of multiple simple events. Our BATCH_END_FINISHED is close to this (it fires after all documents in a batch are processed), but the aggregation logic lives in PSM polling Athena, not in the event system itself.

For ADR-005 (Bulk Operations), composite events would be valuable: "All documents in case X have been bates-stamped" requires aggregating completion events from many individual stamps.

Recommendation: Document the composite event pattern as a first-class concept in our event patterns. PSM already implements this via Athena queries; formalize it.

3. LOW: No Event Catalog / Registry¶

As the number of modules grows (ADRs 005-010), tracking which events exist, who produces them, and who consumes them becomes increasingly difficult. EventStorming workshops produce this naturally, but it's not persisted.

Recommendation: Create an event-catalog.md in the architecture repo listing all event types, their producers, consumers, and payload schemas. This becomes the living equivalent of an EventStorming board.

4. INFO: Choreography Limits for ADR-005 and ADR-010¶

Pure choreography works well for our linear pipeline (extract → load → upload). But ADRs 005 (bulk operations) and 010 (deposition processing) may need orchestration for multi-step workflows with compensation logic: - Bulk operations: Apply action → verify → rollback on failure (saga pattern) - Deposition processing: Parse → OCR → align → theater (sequential dependencies)

Recommendation: Consider Step Functions for these specific ADRs while keeping choreography for the standard document pipeline. This is the hybrid pattern: choreography by default, orchestration when saga/compensation logic is needed.

5. INFO: EventStorming as a Design Tool for ADRs 005-010¶

The EventStorming methodology is directly applicable to designing the event flows for each new module. Before implementing ADRs 005-010, running an EventStorming exercise (even solo with sticky notes or a digital board) would: - Identify all domain events for the new module - Discover commands that trigger them - Find hot spots (areas of uncertainty) - Map bounded contexts (module boundaries) - Identify policies (reactive logic that becomes SNS subscriptions)

New Backlog Items¶

Item	Priority	Related
Add event schema validation (Pydantic model) for new modules	MEDIUM	ADRs 005-010
Create event-catalog.md listing all event types, producers, consumers	LOW	sns-event-publishing pattern
Evaluate Step Functions for ADR-005 (bulk ops saga) and ADR-010 (deposition orchestration)	LOW	ADR-005, ADR-010

Summary¶

Our NGE architecture aligns well with EDA best practices: pure choreography via SNS (ADR-001), past-tense event naming, event-carried state transfer, and lifecycle events for batch processing. The Event Types taxonomy validates our approach and provides vocabulary for categorizing our events. The primary gaps are operational: no formal event schema enforcement and no event catalog. For the upcoming modernization (ADRs 005-010), EventStorming is the recommended design method, and some modules (bulk operations, depositions) may benefit from hybrid choreography+orchestration using Step Functions for saga/compensation flows.

Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.