spinny:~/writing $ less codex-multi-agent-workflows.md
12The first time a coding agent actually fixes a bug for you, the reaction is almost always the same: a mixture of enthusiasm and suspicion. Nice, sure. But then you look at the diff and ask yourself: "Ok, but what exactly did he touch? Can I trust him? Will he do it again in the same way tomorrow?".34That's where I think the interesting part begins. Not when the agent writes a function, but when it becomes capable enough to take on entire pieces of work: read the repository, create a patch, run tests, open a PR, come back after a review comment. Codex is moving precisely in that direction: background work, separate worktrees, integrated browser, automations, plugins, memory and more explicit permission controls.56The point is not to imagine a future where no one reads code anymore. It would be a terrible future, as well as quite naive. The point is to figure out how to work with agents who can do a lot without letting them do everything.78## The change of habit910With the traditional autocomplete you were always at the wheel. The AI suggested a line, you decided. With an agent, however, the relationship changes: you give him a goal and he goes through multiple steps on his own.1112This is powerful, but it shifts the problem. The question is no longer just "can the model program?". The question becomes:1314- Did I give him a small enough scope?15- do you know how to check the result?16- Am I working in an isolated environment?17- Is the final review still humane and careful?1819A healthy workflow looks more like this than a magic wand:2021```mermaid22flowchart LR23 Idea[Human task] --> Scope[Small, verifiable purpose]24 Scope --> Agent[Agent in worktree isolated]25 Agent --> Checks[Test, lint, build, browser]26 Checks --> Review[Human review]27 Review --> Merge[Merge or new iteration]28 Review --> Iterate[Precise comments on the diff]29 Iterate --> Agent30```3132It sounds less romantic than "the agent builds everything", but it works much better. And it's also how teams that are good with humans work: clear tasks, quick feedback, explicit accountability.3334## The good prompt is almost a good ticket3536The most dangerous prompt is the vague but confident one: "fix the invoices page", "improve the architecture", "clean up the auth module". These are requests that sound productive and generate huge diffs. But then you find yourself doing archaeology.3738A helpful prompt is more boring. For example: implement CSV export for the invoices page, knowing that the table is in `app/(dashboard)/invoices/page.tsx`, the queries are in `src/server/invoices.ts` and there is already a similar pattern in `app/(dashboard)/reports`.3940Then add clear constraints: don't change the database schema, don't add dependencies if a small utility is enough, keep the existing UI style. And close with the verification: `npm test -- invoices` and `npm run build`.4142This type of brief is not to "explain better to the AI". It serves above all to make it clearer to you what you are delegating. If you can't write it down concretely, maybe the task isn't ready for an agent yet.4344## Three jobs that I willingly delegate4546The first is repetitive but verifiable work: adding tests, migrating calls to a new internal API, updating imports, replacing deprecated components, fixing TypeScript errors. Here the agent can save hours and the risk is controllable.4748The second is exploratory work: "find where this total is calculated", "explain to me why this test is fragile", "reproduce the bug and tell me which files seem to be affected". Even when it doesn't produce a patch right away, it can do useful reconnaissance.4950The third is recurring maintenance work: small dependency updates, cleanup of old feature flags, summary of blocked PRs, checking of forgotten TODOs. It's not glamorous, but it's exactly the kind of work that tends to pile up.5152## Three jobs that I keep human5354Product decisions remain human. If a change changes how a user pays, deletes data, sees prices, or understands a permission, I want a responsible person.5556Security boundaries also deserve human attention: auth, roles, tokens, sensitive data logging, database migrations. An agent can help implement, but doesn't have to be the sole decision maker.5758Finally, I keep everything that requires architectural taste human. An agent can propose a refactor, but understanding whether an abstraction is really necessary or whether we are just polishing a non-existent problem remains a job.5960## The review is not optional6162The temptation, when an agent is good, is to trust the green of the CI. It's understandable. It's also when the problems start.6364I always look at at least five things:65661. Does the patch only solve the requested task?672. Did he touch files that had nothing to do with it?683. Do the tests cover novel behavior or just happy chance?694. Does the code follow local patterns?705. Are errors handled as in the rest of the project?7172When something is wrong, feedback needs to be specific. “Fix it” is lazy. Better: this utility duplicates `parseMoney` into `src/lib/money.ts`; reuse that function, add a test for the EUR case and don't change the public API of the billing module.7374Agents respond much better to small, verifiable comments. Curiously, so do the people.7576## Guardrails worth the effort7778If an agent can read files, write code, and execute commands, it should be treated as a powerful process. There's no need for paranoia, you need hygiene.7980Use separate worktrees or branches. So you can compare the diff, throw away failed experiments, and not mix the agent's work with what you were doing.8182Limit permissions. Commands like `rg`, `git diff`, `npm test` and `npm run build` can be quite free. Deployments, database migrations, access to secrets and destructive commands must remain explicit.8384Reduce network access when you don't need it. For many tasks, official documentation, package registry and specific internal services are sufficient. Less surface area, fewer surprises.8586Track actions. When a patch arrives in review, you should be able to reconstruct prompts, commands executed, tests passed and files modified. Not to create bureaucracy, but to be able to understand what happened if something goes wrong.8788## An easy way to get started as a team8990If I were to introduce agents into a small team, I would start without major revolutions.9192I would create a `agent-ready` label for issues with clear scope. I would add a template with context, constraints and verification commands. I would ask for small PR, ideally under a few hundred lines. I would require testing or screenshots for visible changes. And above all I would keep a person responsible for the merge.9394After two weeks I would look at the data: which tasks were really speeded up, which reviews were heavy, which prompts were confusing, which parts of the codebase are too fragile to delegate.9596It's a less spectacular approach than "from today we'll do everything with the agents", but it's the one that allows you to get to the third week without regrets.9798## The most human part99100The funny thing is that the more autonomous agents become, the more important the classic skills become again: writing a good ticket, making small cuts, creating tests, reading diffs, communicating trade-offs. The agent accelerates those who already know how to work well. It also amplifies the chaos of those who delegate badly.101102So no, I don't see multi-agent workflows as a shortcut to stop doing engineering. I see them as a way to shift more energy to the parts that matter: deciding what to build, making sure it works, keeping the system understandable.103104Agents can make great asynchronous colleagues. But an asynchronous colleague, to be useful, needs context, boundaries and review. Just like everyone else.105106## Useful sources107108- [Codex for (almost) everything - OpenAI](https://openai.com/index/codex-for-almost-everything/)109- [Running Codex safely at OpenAI](https://openai.com/index/running-codex-safely/)110- [Introducing Codex - OpenAI](https://openai.com/index/introducing-codex/)111- [What's new with GitHub Copilot coding agent](https://github.blog/ai-and-ml/github-copilot/whats-new-with-github-copilot-coding-agent/)112
:Codex and multi-agent workflow: work with agents without losing controllines 1-112 (END) — press q to close