AI Agents Need Infrastructure, Not Prompts
The AI discourse is dominated by prompts. Better system messages. Chain-of-thought tricks. Temperature tuning. Prompt templates shared on Twitter like trading card collections.
None of this matters for production systems.
The hard problems in AI agents are not linguistic. They are infrastructural. Orchestration. State management. Failure recovery. Observability. Retry logic. Exactly the same problems that backend engineers have been solving in distributed systems for decades.
Prompt engineering is the least interesting part of building AI systems that actually work. The interesting part is everything that happens around the model call.
The Model Call Is the Easy Part
An AI agent makes a model call. The model returns a response. This takes a few hundred milliseconds to a few seconds. It is, architecturally, a function call with a slow and occasionally unreliable external dependency.
The rest of the system - deciding when to call the model, what context to provide, how to handle failures, how to manage state across multi-step workflows, how to observe what happened when things go wrong - is the actual engineering challenge.
# This is the part everyone focuses on
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=messages,
tools=tools,
)
# This is the part that actually matters
# - What if this call times out?
# - What if the response is malformed?
# - What if we need to retry with different context?
# - What if this is step 4 of 12 and step 3 failed?
# - What if we need to resume this workflow tomorrow?
# - What if we need to understand why this agent
# made a bad decision last Tuesday?
Every one of those questions has been answered before in workflow engines, transaction processing systems, and event driven architectures. The AI industry is rediscovering distributed systems problems and solving them from scratch because most AI engineers have never built a payment system or a fulfillment pipeline.
What AI Agents Actually Need
Orchestration
An AI agent that performs a multi-step task - researching a topic, analyzing data, drafting a report, requesting approval - is a workflow. It has steps. Steps have dependencies. Some steps can run in parallel. Some must run sequentially. Some steps depend on the output of previous steps.
This is exactly what workflow engines solve.
The workflow engine provides exactly what the agent needs: step management, dependency resolution, parallel execution, and durable state that survives process restarts. Building this from scratch for every AI agent is waste.
State Management
AI agents are stateful. A research agent accumulates context across multiple interactions. A coding agent tracks file changes, test results, and previous attempts. A customer support agent carries conversation history and customer state.
This state must survive failures. If the agent process crashes mid-workflow, the state must be recoverable. If the agent needs to pause and resume hours later (waiting for human approval, for example), the state must persist.
interface AgentState {
workflow_id: string;
current_step: string;
context: Record<string, unknown>;
tool_results: ToolResult[];
conversation_history: Message[];
retry_count: number;
created_at: string;
updated_at: string;
}
// State must be persisted durably, not held in memory
async function advanceAgent(state: AgentState): Promise<AgentState> {
const step = getStep(state.current_step);
const result = await step.execute(state.context);
const nextState = {
...state,
current_step: step.next(result),
context: { ...state.context, [step.name]: result },
updated_at: new Date().toISOString(),
};
// Persist before proceeding - crash recovery depends on this
await stateStore.save(nextState);
return nextState;
}
This is durable execution - the same pattern that Temporal, Inngest, and similar systems implement for distributed workflows. The agent's state machine is persisted after every transition, making the workflow resumable from any point.
Retry Logic
Model calls fail. APIs time out. Rate limits trigger. Context windows overflow. Tool calls return errors. Every external interaction in an agent workflow can fail, and each failure type requires a different response.
class AgentRetryPolicy:
def should_retry(self, error, attempt, step):
if isinstance(error, RateLimitError):
# Back off and retry - this is transient
return RetryDecision(
retry=True,
delay=error.retry_after or (2 ** attempt),
)
if isinstance(error, ContextOverflowError):
# Retry with summarized context - not transient,
# but recoverable with a different strategy
return RetryDecision(
retry=True,
delay=0,
modify_context=self.summarize_context,
)
if isinstance(error, ToolExecutionError):
# Retry the tool call, not the model call
if attempt < step.max_tool_retries:
return RetryDecision(retry=True, delay=1)
# Tool is broken - ask model to use alternative
return RetryDecision(
retry=True,
delay=0,
modify_context=self.disable_tool(error.tool),
)
if isinstance(error, MalformedResponseError):
# Model returned unparseable output - retry with
# explicit format instructions
if attempt < 3:
return RetryDecision(
retry=True,
delay=0,
modify_context=self.add_format_reminder,
)
return RetryDecision(retry=False)
AI agent retry logic is more complex than typical service retry logic because the recovery strategy often involves modifying the input, not just repeating the same call. This is closer to saga compensation than exponential backoff.
Context overflow recovery requires summarizing previous context. Tool failures might require disabling a tool and asking the model to use an alternative. Malformed responses need format reinforcement. Each of these is a retry with modified input - a pattern uncommon in traditional service retries but natural in workflow engines that support step-level compensation.
Observability
When an AI agent makes a bad decision in production, you need to understand why. This requires tracing every step of the workflow: what context was provided, what the model returned, what tools were called, what the results were, and how the agent decided to proceed.
Agent Workflow Trace ───────────────────────────────────────────── [step_1] gather_context 200ms ├─ tool: search_database 45ms ├─ tool: fetch_document 120ms └─ result: 3 documents found [step_2] analyze 2.4s ├─ model_call 2.1s │ ├─ input_tokens: 4,200 │ ├─ output_tokens: 890 │ ├─ model: claude-sonnet │ └─ cost: $0.018 ├─ tool: run_query 280ms └─ result: analysis complete [step_3] generate_report 3.1s ├─ model_call (attempt 1) 1.8s FAILED │ └─ error: malformed JSON ├─ model_call (attempt 2) 1.3s OK │ ├─ input_tokens: 5,100 │ ├─ output_tokens: 2,400 │ └─ context_modified: format_reminder added └─ result: report generated [step_4] send_report 50ms └─ result: delivered Total: 5.75s | Cost: $0.042 | Retries: 1 ─────────────────────────────────────────────
This trace shows something traditional request logs cannot: the model returned malformed JSON on the first attempt, the retry policy added format instructions, and the second attempt succeeded. Without this level of observability, debugging agent failures is guesswork.
The observability infrastructure needs to capture:
- Input and output for every model call, including token counts and cost
- Tool call parameters and results
- State transitions and context modifications
- Retry decisions and their reasoning
- End-to-end latency and cost per workflow
This is distributed tracing applied to AI workflows. The correlation ID that links all events in a transaction processing workflow serves the same purpose here - connecting every model call, tool execution, and state transition into a single debuggable trace.
Event Driven Architecture Is the Foundation
The most scalable AI agent architectures are event driven. The agent emits events for each significant action. Downstream systems consume these events for logging, monitoring, billing, and audit.
class AgentEventEmitter:
def emit_step_started(self, workflow_id, step, context):
self.stream.produce("agent.step.started", {
"workflow_id": workflow_id,
"step": step.name,
"context_size": len(json.dumps(context)),
"timestamp": datetime.utcnow().isoformat(),
})
def emit_model_call(self, workflow_id, step, request, response):
self.stream.produce("agent.model.called", {
"workflow_id": workflow_id,
"step": step.name,
"model": request.model,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"latency_ms": response.latency_ms,
"timestamp": datetime.utcnow().isoformat(),
})
def emit_tool_executed(self, workflow_id, step, tool, result):
self.stream.produce("agent.tool.executed", {
"workflow_id": workflow_id,
"step": step.name,
"tool": tool.name,
"success": result.success,
"latency_ms": result.latency_ms,
"timestamp": datetime.utcnow().isoformat(),
})
These events feed into the same infrastructure that powers transaction monitoring: real-time dashboards showing agent throughput, error rates, cost per workflow, and latency distributions. The event stream becomes the audit trail, the debugging tool, and the billing data source.
This is not novel architecture. It is the same event driven pattern used in payment processing, fulfillment workflows, and data pipelines - applied to a new domain.
What the AI Industry Is Rediscovering
Every infrastructure problem in AI agents has been solved before:
- Orchestration → Temporal, Inngest, Airflow, Step Functions
- State management → Durable execution, event sourcing
- Retry logic → Exponential backoff, circuit breakers, retry budgets
- Observability → Distributed tracing, structured logging, metrics pipelines
- Failure recovery → Saga pattern, compensation logic
- Idempotency → Idempotency keys, exactly-once semantics
The AI ecosystem is rebuilding these primitives from scratch in agent frameworks because most AI engineers come from ML backgrounds, not systems engineering backgrounds. They are solving orchestration problems for the first time and discovering through painful experience why workflow engines exist.
The engineers who will build reliable AI infrastructure are not prompt engineers. They are the same engineers who built payment processing systems, workflow engines, and event driven architectures. The domain is new. The infrastructure problems are not.
What This Means for Building AI Systems
If you are building AI agents for production:
- Use a workflow engine for orchestration - do not build state machines from scratch
- Persist agent state durably - in-memory state dies with the process
- Design retry policies per failure type - not every error deserves the same response
- Emit events for every significant action - you will need the audit trail
- Trace model calls like you trace service calls - with correlation IDs, latency, and cost
- Budget for model costs the way you budget for infrastructure - per workflow, not per call
The prompt is the least interesting part of the system. The infrastructure around it determines whether the agent works once in a demo or works reliably at scale.
References
- How Temporal Works- Temporal
- Inngest: Durable Functions for AI- Inngest
- Building Reliable AI Agents with Durable Execution- Anthropic Engineering
- Circuit Breaker Pattern- Martin Fowler
- Exponential Backoff And Jitter- AWS Architecture Blog