Skip to content

AI-Assisted Development Rules

Purpose

These rules ensure AI-assisted code generation produces consistent, maintainable code that follows Nextpoint's established architectural patterns. They prevent the four common anti-patterns of AI-assisted development: Black Box Blindness, Prompt Thrashing, Spaghetti Architecture, and Magic Wand Fallacy.

Rule 1: Architecture First, AI Second

Never skip the design phase because AI can generate code quickly.

Before writing any code for a new module or significant feature: 1. Check if an ADR exists for this decision — if not, create one 2. Review relevant patterns in patterns/ directory 3. Start from templates/service-module/ for new modules 4. Define data models, event types, and interfaces BEFORE prompting AI

BAD:  "Create a Lambda that processes documents and writes to MySQL"
GOOD: "Following the SQS handler pattern in patterns/sqs-handler.md and the
       database session pattern in patterns/database-session.md, create a
       handler for DocumentLoaded events that writes to the per-case database"

Rule 2: Reference Before Generate

Always search for existing patterns before generating new code.

Before asking AI to generate any code: 1. Search patterns/ for the relevant pattern (e.g., retry, database, events) 2. Search reference-implementations/ for how other modules solve the same problem 3. Check rules/ for constraints that apply (boundary rules, event rules, etc.) 4. Feed the relevant pattern file into the AI context

BAD:  "Write retry logic for this database operation"
GOOD: "Using the @retry_on_db_conflict decorator from patterns/retry-and-resilience.md,
       add retry handling to this write operation. Here's the pattern: [paste pattern]"

Rule 3: Understand Before Committing

Never commit AI-generated code you cannot explain line-by-line.

For every AI-generated code block: 1. Read every line — if you can't explain it, ask the AI to explain 2. Verify it follows the hexagonal boundary rules (core/ never imports from shell/) 3. Check that exception types match our hierarchy (Recoverable/Permanent/Silent) 4. Ensure idempotency — every handler must handle duplicate messages 5. Run the existing test patterns to verify behavior

Checklist before committing AI-generated code:
[ ] I can explain every line to a teammate
[ ] It follows core/ and shell/ boundary rules
[ ] Exception types are correct (Recoverable vs Permanent vs Silent)
[ ] Handler is idempotent (checked for duplicate processing)
[ ] Tests follow the pattern in the module's existing test suite
[ ] No AWS SDK imports in core/ directory
[ ] No secrets hardcoded — uses AWS Secrets Manager

Rule 4: Three Strikes, Read the Docs

If AI fails to solve an error twice, stop prompting and read the source.

The escalation path: 1. First attempt: Give AI the full error message with context (file, function, input) 2. Second attempt: Add your hypothesis about the cause and the relevant pattern file 3. Third attempt: STOP. Read the official documentation, the reference implementation, or the actual source code of the dependency

Common areas where AI struggles and docs are better: - AWS SDK behavior (read AWS docs, not AI guesses) - SQLAlchemy session management (read our database-session.md pattern) - SQS visibility timeout and retry behavior (read lambda-sqs-integration.md) - CDK construct configuration (read AWS CDK API reference)

Rule 5: Consistency Over Cleverness

AI-generated code must match existing patterns, even if the AI suggests a "better" alternative.

Our modules follow specific conventions: - Event names are past tense enums: DOCUMENT_LOADED, not loadDocument - Database sessions use writer_session() / reader_session() context managers - Config comes from environment variables via a centralized config.py - Logging uses structured format with correlation IDs - Tests mock at the shell boundary, never make real AWS calls

If AI generates code that works but uses a different pattern than what exists in the codebase, refactor it to match the existing pattern. Consistency across modules is more valuable than local optimization.

BAD:  AI generates a custom retry loop with sleep()
GOOD: Use @retry_on_db_conflict decorator (our established pattern)

BAD:  AI creates a new database engine in the handler
GOOD: Use writer_session() context manager (our established pattern)

BAD:  AI puts AWS SDK calls in core/process.py
GOOD: Move AWS calls to shell/, pass data to core/ (our boundary rule)

Rule 6: Context-Aware Prompting

Always provide architectural context when prompting AI for Nextpoint code.

Minimum context for any code generation prompt: 1. Which module you're working in 2. The relevant pattern file(s) from patterns/ 3. The module's CLAUDE.md (if it exists) 4. The specific rule constraints that apply

For Claude Code specifically, ensure your module's CLAUDE.md references this architecture repo:

# Module CLAUDE.md
## Architecture Reference
This module follows patterns defined in nextpoint-architecture/:
- Handler pattern: patterns/sqs-handler.md
- Database pattern: patterns/database-session.md
- Event pattern: patterns/sns-event-publishing.md
See rules/ for enforcement rules.

Rule 7: Review AI Output Against the Divergence Map

When working on code that touches both NGE and Legacy paths, always check the divergence map.

The reference-implementations/nge-legacy-divergence-map.md documents 85+ code points where Rails branches on nge_enabled?. Before AI generates code that touches any of these areas: 1. Check the divergence map for the relevant functionality 2. Verify the AI isn't conflating NGE and Legacy patterns 3. Ensure the correct rendering path is used (page images vs PDF/Nutrient)

Rule 8: Least Agency

Use the simplest AI approach that solves the problem.

Rule 9: Correctness Before Cleverness

Get it working simply first. Optimize only after tests pass.

AI models sometimes produce overly clever solutions — custom retry loops instead of @retry_on_db_conflict, hand-rolled connection pooling instead of writer_session(), premature async patterns for sequential workflows. The correct sequence:

  1. First pass: Simple, correct, readable code that follows existing patterns
  2. Verify: Tests pass, idempotency holds, boundary rules satisfied
  3. Then optimize: Only if profiling or load testing reveals an actual bottleneck
BAD:  Generate an optimized batch-processing pipeline with async generators
      before the basic sequential handler works
GOOD: Generate a straightforward handler using established patterns,
      verify it works, then optimize the hot path if needed

Signs you're optimizing too early: - Adding caching before measuring latency - Using async/concurrent patterns before confirming the sequential version is too slow - Replacing ORM queries with raw SQL without profiling evidence - Adding connection pooling beyond what writer_session() already provides

This applies to AI-generated code especially — models default to showcasing capability rather than matching the simplicity level of the surrounding codebase.

Prefer Over Why
Single prompt Multi-step workflow Less context waste, fewer failure points
Specific file read Broad codebase search Targeted context stays focused
Pattern reference Generating from scratch Proven code, consistent style
Slash command Freeform instruction Repeatable, team-shared workflow
Grep/Glob Agent sub-agent Faster, cheaper for known targets
One session Multiple parallel agents Unless tasks are genuinely independent

The minimum agentic level that solves the problem is the correct level: 1. Single LLM call (most tasks) 2. Augmented LLM with tools (exploration) 3. Orchestrated workflow (multi-module changes) 4. Bounded agent with constraints (unfamiliar codebase) 5. Autonomous agent (only when genuinely needed)

80%+ of production value comes from levels 1-3. Default there.

Rule 11: Every Harness Component Is an Assumption

Harness scaffolding should simplify as models improve.

Every architectural component around an LLM encodes an assumption about what the model can't do alone (Anthropic harness design paper, March 2026):

  • 5 parallel pr-review agents → assumes one agent can't catch all categories
  • Domain-specific chunkers → assumes generic chunking misses structure
  • Sprint decomposition → assumes the model loses coherence over long tasks
  • Generator-Evaluator separation → assumes models can't self-evaluate (still true)

When models update, re-test assumptions: 1. Run the same task with fewer scaffolding components 2. If quality holds, remove the component (it's now overhead) 3. If quality drops, keep it (the assumption still holds)

The goal is the minimum harness that produces reliable results, not the most complex architecture. Over-engineering scaffolding that will become overhead is as wasteful as under-engineering and letting the model fail.

The Architect's Role in AI-Assisted Development

AI generates code. The architect provides what AI cannot:

The architect provides: - WHY a pattern was chosen (captured in ADRs) - WHAT trade-offs were accepted (ADR Consequences section) - WHEN to deviate from patterns (context-specific judgment) - HOW modules interact (event catalog, divergence map, integration points)

AI provides: - Code generation within established patterns - Exploration and analysis of unfamiliar code - Consistency checking against rules - Boilerplate, scaffolding, and refactoring

LLMs produce generic, assumption-filled first drafts. The architect adds trade-off reasoning, phased implementation thinking, and domain-specific constraints. Rules, patterns, and ADRs encode this architectural knowledge so AI can follow it consistently.

For complex tasks, follow this progression (inspired by the SPARC methodology, extended with a self-review step):

  1. Explore — "Find all files related to X and summarize them"
  2. Plan — "Create a step-by-step plan for implementing Y"
  3. Code — "Implement according to the plan, following [pattern]"
  4. Review — Switch roles and critique the generated code before committing
  5. Commit — "Commit with message describing the change"

Never skip straight to Code. The planning phase is where the architect's judgment matters most.

The Review Step

After code generation and before commit, ask Claude to review its own output. This catches issues that generation misses — security gaps, pattern drift, edge cases.

Effective review prompts: - "Review this code for security issues — check for injection, unsafe deserialization, hardcoded secrets" - "Does this follow our hexagonal boundary rules? Check all imports in core/" - "What happens if this handler receives a duplicate message? Walk through the idempotency path" - "Identify potential performance bottlenecks — check for N+1 queries, unbounded loops, missing indexes"

For full architectural reviews, use the /review-architecture command which checks against all patterns and rules systematically. The self-review step here is lighter — a quick sanity check before commit, not a formal review.

Rule 10: Test Your Skills

Skills that work once are not skills that work reliably.

Claude Code's skill-creator supports eval mode (March 2026). Use it to verify skills aren't silently broken by model updates:

  1. Define eval cases — test prompts with expected outputs for each skill
  2. Run benchmarks after model updates — catch regressions before users do
  3. Compare with-skill vs without-skill — if identical, the skill adds no value
  4. Run trigger optimization — a skill that never fires is the same as no skill

Two failure modes to test for: - Capability uplift skills expire when the base model catches up (eval reveals this) - Encoded preference skills break when your process changes (eval catches drift)

Anti-Pattern Quick Reference

Anti-Pattern Symptom Prevention
Black Box Blindness Can't explain the code during review Rule 3: Understand before committing
Prompt Thrashing 3+ retries on the same error Rule 4: Three strikes, read the docs
Spaghetti Architecture New pattern where existing one exists Rule 2: Reference before generate, Rule 5: Consistency
Magic Wand Fallacy No ADR, no design, straight to code Rule 1: Architecture first
Over-Engineering Agency Sub-agents for a grep, Plan mode for one-line fix Rule 8: Least agency
Premature Optimization Async generators before sequential works, raw SQL without profiling Rule 9: Correctness before cleverness
Vibes-Based Skills Skill works Tuesday, breaks Thursday after model update, nobody notices Rule 10: Test your skills
Ossified Harness Scaffolding from weaker models never removed, adds cost and latency Rule 11: Every harness component is an assumption

Applying These Rules in Practice

Starting a new NGE module:

  1. Create ADR in adr/ (Rule 1)
  2. Copy templates/service-module/ (Rule 2)
  3. Reference relevant patterns in module CLAUDE.md (Rule 6)
  4. Generate code with pattern context (Rule 5)
  5. Review against checklist (Rule 3)

Adding a feature to an existing module:

  1. Check patterns/ for the relevant pattern (Rule 2)
  2. Check divergence map if it touches NGE/Legacy (Rule 7)
  3. Generate with full module context (Rule 6)
  4. Verify consistency with existing module code (Rule 5)
  5. Review against checklist (Rule 3)

Debugging with AI assistance:

  1. First prompt: full error + context + hypothesis (Rule 4)
  2. Second prompt: add pattern file + related code (Rule 4)
  3. Third failure: read docs/source directly (Rule 4)
Ask the Architecture ×

Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.