Article Review: Group 21 — Agentic AI Production Patterns¶
Articles Reviewed¶
- "Agentic AI for Investment Management: From Concept to Production" — Farhad Malik / Data Science Collective (Dec 2025) — Comprehensive guide to deploying agentic AI in financial services: firm structure mapping, agent anatomy, orchestration patterns, guardrail framework, and risk taxonomy. First article in a planned 9-part series.
Key Concepts¶
The Semi-Autonomous Principle¶
The article's foundational design constraint: agents are semi-autonomous, not fully autonomous. Humans set objectives, define constraints, approve actions, and retain final accountability. The agent handles execution within those boundaries.
This is the same principle our architecture embeds in handler design — handlers execute idempotently within defined constraints, but the pipeline (ProcessorApi → documentextractor → SNS → loader/uploader) is orchestrated by the system, not by individual handlers making autonomous decisions.
Five-Component Agent Architecture¶
Every production agent needs five components:
- Goal — The specific purpose (e.g., validate a trade, reconcile positions)
- LLM — The planning and reasoning engine
- Tools — External functions the agent calls (APIs, databases, code execution)
- Memory — Short-term (conversation context) + long-term (vector DB, persistent knowledge)
- Orchestration Layer — Manages multi-step processes, coordinates tools, handles action sequences
Six Orchestration Patterns¶
| Pattern | Description | When to Use |
|---|---|---|
| ReAct | Reason → Act → Observe loop | General-purpose tasks with dynamic tool selection |
| Plan-and-Execute | Create full plan first, then execute step-by-step | Complex multi-step workflows with dependencies |
| Reflex | Stimulus triggers immediate action, no planning | Simple threshold-based alerts and notifications |
| Hierarchical | Parent agent delegates to child agents | Complex tasks decomposable into independent sub-tasks |
| Multimodal | Process images, audio, video as inputs/outputs | Document processing, media analysis |
| Toolformer | Model learns when to call tools | Tasks where tool selection is non-obvious |
Four-Gate Guardrail Framework¶
Every agent action passes through four gates:
- Input validation — Catch bad data before processing
- Output verification — Check that results make sense
- Human approval workflows — For material decisions
- Detailed audit trails — Who did what and why
Three-Layer Deployment Strategy¶
The article maps deployment priority to organizational structure:
| Layer | Risk | ROI | Accuracy Tolerance | Start Here? |
|---|---|---|---|---|
| Back office (settlement, reconciliation) | Lowest | Clear | 95-98% sufficient | POC target |
| Middle office (risk, trade validation) | Medium | Highest | 95-98% sufficient | Scale target |
| Front office (research, alpha generation) | Highest | Variable | Requires sophisticated reasoning | Last |
When NOT to Use Agents (Six Boundaries)¶
- Tasks requiring 100% deterministic accuracy (NAV computation, trade reconciliation)
- Emergency scenarios requiring immediate human judgment
- Decisions with regulatory/legal ambiguity
- Workflows requiring subjective judgment or ethics
- Legacy systems with organizational resistance to integration
- Environments without centralized, governed data
Risk Taxonomy (Six Categories)¶
- Model risks — Hallucinations, overconfidence, misinterpretation
- Data risks — Poor quality inputs, stale data, unreconciled sources
- Operational risks — Cascading failures, API outages, hidden dependencies
- Human risks — Over-reliance, weak review controls, poor approval workflows
- Regulatory risks — Unexplainable decisions, lack of auditability
- Financial risks — Mispriced trades, incorrect risk calculations
Relevance to Our Architecture¶
Pattern Mapping: Agents to NGE Service Modules¶
The five-component agent architecture has a striking parallel to our service module anatomy:
| Agent Component | NGE Equivalent |
|---|---|
| Goal | Handler purpose (event type routing via SNS filter policies) |
| LLM | Core business logic (core/process.py) |
| Tools | Shell infrastructure (shell/ — S3, SQS, SNS, RDS, ES) |
| Memory | Database state (per-case MySQL) + checkpoint state (DynamoDB/S3) |
| Orchestration | Checkpoint pipeline state machine + SNS event routing |
Our service modules are not AI agents, but they follow the same structural decomposition: a clear goal, a reasoning/processing layer, external tool integrations, persistent state, and orchestration logic.
Orchestration Pattern Alignment¶
Our existing patterns map to the six orchestration patterns:
- Plan-and-Execute → Our checkpoint pipeline pattern. documentloader's 11-step state machine creates the full plan (checkpoint definitions), then executes step-by-step with resumability.
- Hierarchical → pr-review's multi-agent architecture. Orchestrator delegates to 5 specialized review agents, each working independently, then aggregates results.
- ReAct → Not directly used. Our handlers don't dynamically reason about which tool to call — routing is pre-determined by SNS filter policies.
- Reflex → Our alarm-based responses. DLQ depth > 0 triggers alerting without planning or reasoning.
Four-Gate Framework vs. Our Guardrails¶
| Gate | Our Implementation |
|---|---|
| Input validation | Handler-level: parse SQS message, validate required fields (caseId, batchId, jobId) |
| Output verification | Checkpoint validation: verify step completion before advancing state machine |
| Human approval | Not currently implemented — our pipelines are fully automated. The article suggests this is appropriate for deterministic, rule-based workflows. |
| Audit trails | CloudWatch structured logging, X-Ray tracing, SNS event history |
The absence of human approval in our pipeline is correct per the article's framework — eDiscovery document processing is high-volume, rule-based, and tolerates automated execution. Human review enters at the product level (lawyers reviewing processed documents), not the infrastructure level.
The Accuracy Tolerance Insight¶
The article's 95-98% accuracy threshold for agent suitability maps to our domain:
- Document processing — Tolerates imperfect extraction (OCR errors, metadata parsing edge cases). Humans review the final output. Suitable for agentic patterns.
- Bates stamping — Must be deterministic and sequential. Zero tolerance for gaps or duplicates. Not suitable for probabilistic agents.
- Search indexing — Tolerates some imprecision (relevance ranking is inherently fuzzy). Suitable.
- Billing/metering — Must be exact. Not suitable.
Potential Application: AI-Assisted Architecture Review¶
The article's multi-agent orchestration pattern (Quarterly Investor Report example: 4 parallel agents → orchestrator → assembler → human review → sender) maps directly to a potential enhancement of our pr-review service:
Current: 5 parallel review agents → verifier aggregator → PR comment Enhanced: Add domain-specific agents for eDiscovery compliance, data integrity, and multi-tenant safety. The hierarchical pattern supports this naturally.
Build vs. Buy Decision Framework¶
The article's build/buy/partner framework suggests: - Build when workflows are highly proprietary and the org can support 8+ GenAI/ML engineers - Buy when standardized solutions exist - Partner for specialized capabilities
For Nextpoint: eDiscovery document processing is highly proprietary (per-case isolation, chain of custody, legal hold requirements). The NGE architecture is a "build" decision. AI-assisted tooling (pr-review, nextpoint-ai) is a "build on top of vendor models" hybrid — we build the orchestration, use vendor LLMs for reasoning.
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.