Guardrails Pattern¶
Purpose¶
Every processing pipeline — whether deterministic document processing or probabilistic AI-powered analysis — needs guardrails that prevent bad data from entering, bad output from leaving, and ensure all actions are auditable. This pattern unifies four gates that already exist across our architecture into a single framework.
The Four Gates¶
Gate 1: Input Validation¶
Reject bad data before processing begins. Every handler validates incoming messages at the boundary before passing data to core logic.
Where we implement this:
- SQS handlers parse and validate required fields (caseId, batchId, jobId)
before processing — see patterns/sqs-handler.md
- API Gateway request validation via JSON schemas — see rules/api-design.rules.md
- Event envelope validation: eventType must match handler's SNS filter policy
- Unknown event types → PermanentFailureException (DLQ, no retry)
# shell/handlers/index.py — input validation gate
def handler(event, context):
for record in event["Records"]:
body = json.loads(record["body"])
message = json.loads(body["Message"])
# Gate 1: Validate required fields
case_id = message.get("caseId")
batch_id = message.get("batchId")
job_id = message.get("jobId")
if not all([case_id, batch_id, job_id]):
raise PermanentFailureException("Missing required fields")
event_type = message.get("eventType")
if event_type not in SUPPORTED_EVENT_TYPES:
raise PermanentFailureException(f"Unknown event type: {event_type}")
Gate 2: Output Verification¶
Verify that processing produced correct results before advancing the pipeline. Check outputs against expectations before emitting downstream events.
Where we implement this:
- Checkpoint validation: verify step completion before advancing the state
machine — see patterns/state-validation-and-reconciliation.md
- Document count reconciliation: expected vs. actual documents processed
- Elasticsearch index verification after bulk operations
- S3 object existence checks after file operations
# core/process.py — output verification gate
def advance_checkpoint(job, expected_count, actual_count, session):
# Gate 2: Verify output before advancing
if actual_count != expected_count:
raise RecoverableException(
f"Count mismatch: expected={expected_count}, actual={actual_count}"
)
job.checkpoint = next_checkpoint
session.commit()
Gate 3: Human Approval (When Required)¶
Most Nextpoint pipelines are fully automated — deterministic, rule-based, high-volume workflows where human approval would be a bottleneck. Human review enters at the product level (lawyers reviewing processed documents), not the infrastructure level.
When human approval IS needed: - AI-generated content presented to end users (nextpoint-ai transcript summaries) - Destructive operations on production data (case deletion, bulk purge) - Architecture decisions affecting multiple modules (captured as ADRs) - Deployment to production (manual approval gate in CI/CD)
When human approval is NOT needed: - Document ingestion, extraction, indexing (deterministic, resumable) - Event routing and message processing (rule-based, idempotent) - Retry and error handling (automated via exception hierarchy) - Alarm-triggered responses (reflex pattern — DLQ alerts, scaling)
Decision rule: If the workflow produces probabilistic output that influences human decisions (AI summaries, search relevance tuning), add a human approval gate. If the workflow is deterministic and resumable (document processing, data migration), skip it.
Gate 4: Audit Trails¶
Every action must be traceable. When something goes wrong, the audit trail answers: what happened, when, to which case, triggered by what event.
Where we implement this:
- Structured logging with required fields: timestamp, level, message,
moduleName, caseId, batchId, jobId — see patterns/structured-logging.md
- X-Ray tracing with subsegments for DB and external calls
- SNS event history (events are facts — immutable records of what happened)
- CloudWatch metrics: MessagesProcessed, MessagesFailed, ProcessingDuration
- DLQ as audit trail for failures — messages preserved with original context
Applying the Four Gates¶
For a New NGE Service Module¶
Every new module should implement all four gates:
| Gate | Implementation | Pattern Reference |
|---|---|---|
| Input validation | Handler validates SQS message fields | patterns/sqs-handler.md |
| Output verification | Checkpoint validates step completion | patterns/state-validation-and-reconciliation.md |
| Human approval | Skip for deterministic pipelines | See decision rule above |
| Audit trails | Structured logging + X-Ray + metrics | patterns/structured-logging.md, rules/observability.rules.md |
For an AI-Powered Feature¶
AI features (nextpoint-ai, pr-review, future capabilities) need stronger gates:
| Gate | Implementation | Why Stronger |
|---|---|---|
| Input validation | Same as NGE modules + prompt injection checks | AI models can be manipulated by crafted inputs |
| Output verification | Validate structure + factual grounding + confidence scoring | AI output is probabilistic, not deterministic |
| Human approval | Required before surfacing AI output to end users | 95-98% accuracy means 2-5% of outputs need correction |
| Audit trails | Same as NGE modules + model version + prompt template + token usage | Reproducibility requires knowing which model and prompt produced the output |
Key Rules¶
- Gates are ordered: input validation runs first, audit trail captures everything
- Gate failures route to the exception hierarchy: Recoverable (retry), Permanent (DLQ), Silent (skip)
- Never skip Gate 1 (input validation) or Gate 4 (audit trails) — these are mandatory for all workflows
- Gate 3 (human approval) is the only optional gate — see decision rule above
- Gate 2 (output verification) can be lightweight for simple handlers but must exist for checkpoint pipelines
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.