spinny:~/writing $ less agentic-infrastructure-stack.md
12We have often talked about agentic frameworks. LangGraph, CrewAI, AutoGen, various SDKs, loop, tool calling, memory, planner, critic, supervisor. All useful words, for goodness sake. But the more I look at the agents actually used, the more it seems to me that the interesting part has moved below the framework level.34The question is no longer just: which library do I use to make a step model think?56The real question is: where does this agent live when he stops being a demo?78Because a serious agent is not a function that calls a model and returns text. It's a small distributed system. It must read context, use tools, execute code, touch files, remember decisions, ask permission, fail well, restart, leave logs, not burn the budget and not turn into a bulldozer inside the production repository.910The framework is the steering wheel. The infrastructure is the road, the brakes, the garage, the insurance and the person who knows where the keys are.1112## Because there's a lot of talk about it now1314In 2023 and 2024 the conversation was very model-centric. Which LLM? How much context? How much does it cost? How good is he at programming?1516In 2025 and 2026 the conversation has shifted. The models are good enough to do real work, but that's why the boring bits become visible: runtime, security, connectors, identity, observability, code execution, deployment, rollback.1718It's the natural transition from magic to engineering.1920When an agent just needs to generate a response, a chat is enough. When you need to open a pull request, query a database, call a CRM, start a job, navigate a site, read Slack, compile code and update a document, you need an operating system around it.2122Not in a literal sense. In an organizational sense.2324## The first piece: a runtime where the agent can last2526An agent often works in steps. Look at the state, choose an action, use a tool, observe the result, update the plan, repeat.2728If this loop lives inside a single HTTP request, you immediately have a problem. Some actions are slow. Some await human input. Some fail and must be tried again. Some must survive a deployment or timeout.2930This is where durable workflows, queues, job backgrounds and state machines come into play. They're not glamorous, but they're the difference between an agent who seems smart on demo and one you can leave working while you go get coffee.3132For me the agentic runtime must answer very concrete questions:3334- where do I save the state between one step and another?35- what happens if the process dies halfway through?36- can I pause and ask for approval?37- can I replay a run to understand why he made that choice?38- can I limit duration, memory, tools and cost?3940Vercel is pushing hard on this front with AI SDKs, functions, workflows and tools for building agents within web applications. But the point is not just Vercel. The point is that the agent needs an operational home, not a single endpoint.4142## The second piece: sandbox, because the agent must be able to get dirty without breaking4344As soon as an agent writes code or executes commands, a sandbox is needed.4546It seems like a technical word, but the idea is domestic: you give him a workbench. It can open files, install dependencies, run tests, do experiments, generate output. If he gets it wrong, you've contained the damage. If it works, promote the result.4748An agentic sandbox should have some properties:4950- isolated filesystem;51- CPU, memory and time limits;52- controlled network;53- secrets mounted only when needed;54- complete logs;55- possibility to export artifacts;56- clean reset between runs, when necessary.5758Vercel Sandbox goes exactly in this direction: isolated environments to run code, install dependencies, work with files and produce artifacts without running everything in the main application runtime.5960This thing is more important than it seems. Many agentic prototypes jump directly from the model to the real system. The model can call tool. Tools can do things. It all seems elegant until the first wrong command, the first dependency installed in the wrong place, the first token that ends up in a log.6162The sandbox is the adult way of saying: go ahead, but in here.6364## The third piece: MCP and the connector problem6566The Model Context Protocol has become one of the most interesting parts of the ecosystem because it tries to standardize something that otherwise quickly becomes unmanageable: how a model discovers and uses external tools.6768Without a standard, each integration is a small island. A connector for GitHub done one way, one for Slack done another, one for databases with different semantics, one for browser automation that looks like nothing.6970MCP proposes a common language between client and server: tools, resources, prompts, authorizations, transport, discovery. It doesn't magically solve governance and security, but it gives a grammar.7172And grammar matters. When an agent can connect to many tools, the question is not just "can he do it?". The problem is "does he understand what he can do, with what limits, on behalf of whom, and leaving what trace?".7374For me MCP is not hype because it "does tool calling". We already did that. It's hype because it shifts the center of gravity from single integration to the operational catalog of tools.7576In a good agentic architecture, MCP becomes a kind of patch panel:7778- GitHub for code and issues;79- Slack for conversational context;80- Linear or Jira for planned work;81- read-only database for analytics;82- browser or scraper controlled for external sites;83- document storage;84- isolated execution environments;85- internal systems exposed with strict permissions.8687The tricky part is that a policy-free tool catalog is just a more elegant way to create chaos.8889## The fourth piece: identity and permissions9091This is the area where many demos turn a blind eye.9293An agent acts on someone's behalf. So it must be clear who the subject of the action is.9495Is it using user permissions? Of a service account? Of a workspace? Do you have temporary or permanent access? Can you read everything or just some resources? Can you write? Can you cancel? Can he text real people?9697If you don't answer these questions well, sooner or later you'll build an assistant with house keys and no memory of who gave them to him.9899The rule of thumb I like is this: the agent must be able to do less than the human, not more than the human. And when he has to do something riskier, he has to stop and ask.100101This means OAuth, token scoped, secret management, audit log, tool policy, allowlist, approval step. Not very romantic stuff. Necessary stuff.102103## The fifth piece: memory and context, but without accumulating garbage104105Agents need memory, but memory is dangerous when it becomes an attic.106107There are at least three types of memory:108109- run memory: what happened in this execution;110- project memory: conventions, decisions, constraints;111- personal or team memory: preferences, tone, rituals, processes.112113Putting everything in the prompt is the shortcut. It works until it doesn't work anymore. Useful memory must be taken care of: indexed, updated, expired, verified, made citable.114115An agent who remembers badly is worse than an agent who doesn't remember. Because he speaks with confidence.116117Therefore the infrastructure must include retrieval, instruction files, knowledge base, embedding when needed, but also cleaning. We need a culture of memory: what enters, who approves it, when it decays, how do I correct it.118119## The sixth piece: observability, eval and replay120121If an agent makes a mistake, the "called the model" log is not enough.122123You want to see the route. What context did he receive? What tools were available? Which tool did you choose? With what arguments? What response did you get? How much did it cost? Where did it get stuck? Did the human approve of anything? Is the error model, tool, prompt, data or permission error?124125Here the agents are more like distributed systems than chatbots.126127You need readable traces, not just text logs. You need to be able to replay a run. It is necessary to compare two versions of the same agent on known tasks. We need to measure regressions: not only does it "answer better", but it "closes the right ticket without touching unsolicited files".128129Agentic evals are more difficult than text evals because they include actions. It is not enough to compare an expected string. You have to look at sequences, side effects, quality of the artefact, time, cost, number of human interventions.130131The funny thing is, we always come back there: software engineering. Tests, environments, traces, rollbacks. Except that the code now also decides what to do next.132133## The seventh piece: human interfaces134135The agent doesn't have to just live in a chat.136137Some agents need a board. Others a page with status and log. Others of an "approve" button. More inline comments. Still others of a CLI.138139The UI changes behavior. If the only way to control an agent is to write a long message, the user will give the agent vague instructions. If, however, he sees the plan, diff, sources, risks and next action, he can intervene precisely.140141A decent agent infrastructure includes control surfaces:142143- current status;144- editable plan;145- produced artefacts;146- diff;147- approval requests;148- chronology;149- stop button;150- retry button;151- visible permissions.152153It seems trivial, but it isn't. The difference between "creepy AI" and "reliable assistant" is often just that the latter shows you where it has its hands.154155## The mental stack156157If I were to draw it today, the minimum agent stack would be this:1581591. Model: reasoning, generation, tool calling, multimodal if necessary.1602. Orchestration: loop, step, planner, policy, human-in-the-loop.1613. Durable runtime: workflow, queue, retry, pause, resume.1624. Sandbox: code execution, isolated file system, limitations, artifacts.1635. Tool layer: MCP, internal API, browser, database, repository.1646. Identity layer: OAuth, scope, secret, audit, policy.1657. Memory layer: project context, retrieval, instructions, expiration.1668. Observability: trace, replay, eval, cost and quality metrics.1679. Product surface: chat when enough, dashboard when needed, review when it matters.168169The agentic framework mainly covers points 2 and a piece of point 1. The rest is the real work.170171## What I would do in practice172173If a team told me “we want agents in production,” I wouldn't start with ten agents.174175I would start with a small, repetitive and observable workflow. For example: open maintenance PRs, update documentation from closed issues, prepare a weekly review, triage duplicate bugs, generate tests for affected files.176177Then I would set very clear limits:178179- no writing without branches or sandbox;180- no secrets in the prompt;181- tools in allowlist;182- human approval for external actions;183- mandatory log and trace;184- budget per run;185- output always inspectable.186187Only then would I expand.188189Agents don't fail just because the models get it wrong. They fail because we put them in vague environments, with confusing permissions and theatrical expectations.190191## My reading192193Agentic infrastructure is boring in the best way.194195It's not the part that makes you clap in the demo. It's the part that lets you actually use the demo on Monday morning, with real people, real data, and real consequences.196197The future of agents will not be decided only by who has the best role model. It will be decided by whoever builds the best place in which to make him work: isolated when he experiments, connected when needed, always observable, authorized with criteria and humble enough to stop when he doesn't know.198199That's where agents stop being a toy and become infrastructure.200201## Sources202203- [Vercel: How to build AI agents with Vercel and the AI SDK](https://vercel.com/kb/guide/how-to-build-ai-agents-with-vercel-and-the-ai-sdk)204- [Vercel Docs: Sandbox](https://vercel.com/docs/sandbox)205- [Vercel Docs: Working with Sandbox](https://vercel.com/docs/sandbox/working-with-sandbox)206- [Vercel Docs: MCP](https://vercel.com/docs/mcp)207- [Model Context Protocol: Specification](https://modelcontextprotocol.io/specification)208- [OpenAI: New tools for building agents](https://openai.com/index/new-tools-for-building-agents/)209- [Cloudflare Blog: Agents on Cloudflare](https://blog.cloudflare.com/agents-on-cloudflare/)210
:The agentic infrastructure and the new backendlines 1-210 (END) — press q to close