Skip to content

Guardrails Pattern

Purpose

Every processing pipeline — whether deterministic document processing or probabilistic AI-powered analysis — needs guardrails that prevent bad data from entering, bad output from leaving, and ensure all actions are auditable. This pattern unifies four gates that already exist across our architecture into a single framework.

The Four Gates

Gate 1: Input Validation

Reject bad data before processing begins. Every handler validates incoming messages at the boundary before passing data to core logic.

Where we implement this: - SQS handlers parse and validate required fields (caseId, batchId, jobId) before processing — see patterns/sqs-handler.md - API Gateway request validation via JSON schemas — see rules/api-design.rules.md - Event envelope validation: eventType must match handler's SNS filter policy - Unknown event types → PermanentFailureException (DLQ, no retry)

# shell/handlers/index.py — input validation gate
def handler(event, context):
    for record in event["Records"]:
        body = json.loads(record["body"])
        message = json.loads(body["Message"])

        # Gate 1: Validate required fields
        case_id = message.get("caseId")
        batch_id = message.get("batchId")
        job_id = message.get("jobId")
        if not all([case_id, batch_id, job_id]):
            raise PermanentFailureException("Missing required fields")

        event_type = message.get("eventType")
        if event_type not in SUPPORTED_EVENT_TYPES:
            raise PermanentFailureException(f"Unknown event type: {event_type}")

Gate 2: Output Verification

Verify that processing produced correct results before advancing the pipeline. Check outputs against expectations before emitting downstream events.

Where we implement this: - Checkpoint validation: verify step completion before advancing the state machine — see patterns/state-validation-and-reconciliation.md - Document count reconciliation: expected vs. actual documents processed - Elasticsearch index verification after bulk operations - S3 object existence checks after file operations

# core/process.py — output verification gate
def advance_checkpoint(job, expected_count, actual_count, session):
    # Gate 2: Verify output before advancing
    if actual_count != expected_count:
        raise RecoverableException(
            f"Count mismatch: expected={expected_count}, actual={actual_count}"
        )
    job.checkpoint = next_checkpoint
    session.commit()

Gate 3: Human Approval (When Required)

Most Nextpoint pipelines are fully automated — deterministic, rule-based, high-volume workflows where human approval would be a bottleneck. Human review enters at the product level (lawyers reviewing processed documents), not the infrastructure level.

When human approval IS needed: - AI-generated content presented to end users (nextpoint-ai transcript summaries) - Destructive operations on production data (case deletion, bulk purge) - Architecture decisions affecting multiple modules (captured as ADRs) - Deployment to production (manual approval gate in CI/CD)

When human approval is NOT needed: - Document ingestion, extraction, indexing (deterministic, resumable) - Event routing and message processing (rule-based, idempotent) - Retry and error handling (automated via exception hierarchy) - Alarm-triggered responses (reflex pattern — DLQ alerts, scaling)

Decision rule: If the workflow produces probabilistic output that influences human decisions (AI summaries, search relevance tuning), add a human approval gate. If the workflow is deterministic and resumable (document processing, data migration), skip it.

Gate 4: Audit Trails

Every action must be traceable. When something goes wrong, the audit trail answers: what happened, when, to which case, triggered by what event.

Where we implement this: - Structured logging with required fields: timestamp, level, message, moduleName, caseId, batchId, jobId — see patterns/structured-logging.md - X-Ray tracing with subsegments for DB and external calls - SNS event history (events are facts — immutable records of what happened) - CloudWatch metrics: MessagesProcessed, MessagesFailed, ProcessingDuration - DLQ as audit trail for failures — messages preserved with original context

Applying the Four Gates

For a New NGE Service Module

Every new module should implement all four gates:

Gate Implementation Pattern Reference
Input validation Handler validates SQS message fields patterns/sqs-handler.md
Output verification Checkpoint validates step completion patterns/state-validation-and-reconciliation.md
Human approval Skip for deterministic pipelines See decision rule above
Audit trails Structured logging + X-Ray + metrics patterns/structured-logging.md, rules/observability.rules.md

For an AI-Powered Feature

AI features (nextpoint-ai, pr-review, future capabilities) need stronger gates:

Gate Implementation Why Stronger
Input validation Same as NGE modules + prompt injection checks AI models can be manipulated by crafted inputs
Output verification Validate structure + factual grounding + confidence scoring AI output is probabilistic, not deterministic
Human approval Required before surfacing AI output to end users 95-98% accuracy means 2-5% of outputs need correction
Audit trails Same as NGE modules + model version + prompt template + token usage Reproducibility requires knowing which model and prompt produced the output

Key Rules

  1. Gates are ordered: input validation runs first, audit trail captures everything
  2. Gate failures route to the exception hierarchy: Recoverable (retry), Permanent (DLQ), Silent (skip)
  3. Never skip Gate 1 (input validation) or Gate 4 (audit trails) — these are mandatory for all workflows
  4. Gate 3 (human approval) is the only optional gate — see decision rule above
  5. Gate 2 (output verification) can be lightweight for simple handlers but must exist for checkpoint pipelines
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.