Pattern: Circuit Breaker¶
Prevent cascading failures when calling downstream services by tracking failure rates and temporarily halting requests when a service is overwhelmed or unavailable. Protects both the caller (from wasting resources on retries) and the downstream service (from further overload).
When to Use¶
- Calling an external or cross-module service (Nutrient API, uploader queue, external API) that may be temporarily unavailable
- The downstream service has its own scaling constraints (connection pool limits, queue depth limits, rate limits)
- Failed requests consume significant resources (Lambda concurrency, SQS visibility timeouts, API rate limit quotas)
- You want to fail fast rather than queue up retries that will also fail
When NOT to Use¶
- Internal function calls within the same Lambda/container
- Database operations (use
@retry_on_db_conflictinstead) - SQS message processing (use DLQ + max receive count instead)
- One-off requests where retry is acceptable
Architecture¶
Caller (Lambda/Processor)
│
▼
┌─────────────────────────────┐
│ Circuit Breaker │
│ │
│ CLOSED ──→ OPEN ──→ HALF_OPEN
│ │ │ │
│ ▼ ▼ ▼
│ Pass all Reject Test one
│ requests all request
│ (fail fast)
└─────────┬───────────────────┘
│
▼
Downstream Service
(uploader queue, Nutrient API, etc.)
State Machine¶
┌─────────┐
│ CLOSED │ ← Normal operation
└────┬────┘
│ failure count >= threshold (5)
▼
┌─────────┐
│ OPEN │ ← All requests rejected immediately
└────┬────┘
│ timeout period elapsed (30s)
▼
┌───────────┐
│ HALF_OPEN │ ← Test with one request
└─────┬─────┘
│
┌────────┴────────┐
│ │
success failure
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ CLOSED │ │ OPEN │
└─────────┘ └─────────┘
Implementation¶
class CircuitBreaker {
private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
private failureCount = 0;
private lastFailureTime: number | null = null;
constructor(
private readonly failureThreshold: number = 5,
private readonly timeoutMs: number = 30_000,
) {}
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'OPEN') {
if (Date.now() - this.lastFailureTime! > this.timeoutMs) {
this.state = 'HALF_OPEN';
} else {
throw new CircuitOpenError('Circuit breaker is OPEN — request rejected');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess(): void {
this.failureCount = 0;
this.state = 'CLOSED';
}
private onFailure(): void {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
}
}
}
Configuration¶
| Parameter | Default | Purpose |
|---|---|---|
failureThreshold |
5 | Consecutive failures before opening circuit |
timeoutMs |
30,000 (30s) | Time in OPEN state before testing with HALF_OPEN |
Tuning guidance: - High-latency services (Nutrient API): Higher timeout (60s+) to give the service time to recover - Queue-based services (uploader SQS): Lower threshold (3) since queue failures indicate systemic issues - Rate-limited APIs: Align timeout with rate limit reset window
Design Considerations¶
-
Circuit breaker is in-memory — resets on Lambda cold start. This is acceptable because cold starts indicate a fresh execution context and the downstream service may have recovered.
-
Combine with DLQ — messages that fail due to an open circuit should still be retried via SQS visibility timeout or DLQ redrive after the service recovers.
-
Log state transitions — CLOSED→OPEN and OPEN→HALF_OPEN transitions should be logged at WARN level for operational visibility.
-
Don't use for idempotent retries — if the downstream service might have partially processed the request, the circuit breaker should not prevent retries. Use it only for requests that are safe to skip or defer.
Real-World Usage¶
- documentexchanger — UploaderCoordinator wraps Nutrient uploader queue
messaging in a circuit breaker (threshold: 5, timeout: 30s). Prevents the
exchanger from flooding the uploader when it's overwhelmed.
See
reference-implementations/documentexchanger.md.
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.