Codex and multi-agent workflow: work with agents without losing control

spinny:~/writing $ less codex-multi-agent-workflows.md

1 
2The first time a coding agent actually fixes a bug for you, the reaction is almost always the same: a mixture of enthusiasm and suspicion. Nice, sure. But then you look at the diff and ask yourself: "Ok, but what exactly did he touch? Can I trust him? Will he do it again in the same way tomorrow?".
3 
4That's where I think the interesting part begins. Not when the agent writes a function, but when it becomes capable enough to take on entire pieces of work: read the repository, create a patch, run tests, open a PR, come back after a review comment. Codex is moving precisely in that direction: background work, separate worktrees, integrated browser, automations, plugins, memory and more explicit permission controls.
5 
6The point is not to imagine a future where no one reads code anymore. It would be a terrible future, as well as quite naive. The point is to figure out how to work with agents who can do a lot without letting them do everything.
7 
8## The change of habit
9 
10With the traditional autocomplete you were always at the wheel. The AI suggested a line, you decided. With an agent, however, the relationship changes: you give him a goal and he goes through multiple steps on his own.
11 
12This is powerful, but it shifts the problem. The question is no longer just "can the model program?". The question becomes:
13 
14- Did I give him a small enough scope?
15- do you know how to check the result?
16- Am I working in an isolated environment?
17- Is the final review still humane and careful?
18 
19A healthy workflow looks more like this than a magic wand:
20 
21```mermaid
22flowchart LR
23    Idea[Human task] --> Scope[Small, verifiable purpose]
24    Scope --> Agent[Agent in worktree isolated]
25    Agent --> Checks[Test, lint, build, browser]
26    Checks --> Review[Human review]
27    Review --> Merge[Merge or new iteration]
28    Review --> Iterate[Precise comments on the diff]
29    Iterate --> Agent
30```
31 
32It sounds less romantic than "the agent builds everything", but it works much better. And it's also how teams that are good with humans work: clear tasks, quick feedback, explicit accountability.
33 
34## The good prompt is almost a good ticket
35 
36The most dangerous prompt is the vague but confident one: "fix the invoices page", "improve the architecture", "clean up the auth module". These are requests that sound productive and generate huge diffs. But then you find yourself doing archaeology.
37 
38A helpful prompt is more boring. For example: implement CSV export for the invoices page, knowing that the table is in `app/(dashboard)/invoices/page.tsx`, the queries are in `src/server/invoices.ts` and there is already a similar pattern in `app/(dashboard)/reports`.
39 
40Then add clear constraints: don't change the database schema, don't add dependencies if a small utility is enough, keep the existing UI style. And close with the verification: `npm test -- invoices` and `npm run build`.
41 
42This type of brief is not to "explain better to the AI". It serves above all to make it clearer to you what you are delegating. If you can't write it down concretely, maybe the task isn't ready for an agent yet.
43 
44## Three jobs that I willingly delegate
45 
46The first is repetitive but verifiable work: adding tests, migrating calls to a new internal API, updating imports, replacing deprecated components, fixing TypeScript errors. Here the agent can save hours and the risk is controllable.
47 
48The second is exploratory work: "find where this total is calculated", "explain to me why this test is fragile", "reproduce the bug and tell me which files seem to be affected". Even when it doesn't produce a patch right away, it can do useful reconnaissance.
49 
50The third is recurring maintenance work: small dependency updates, cleanup of old feature flags, summary of blocked PRs, checking of forgotten TODOs. It's not glamorous, but it's exactly the kind of work that tends to pile up.
51 
52## Three jobs that I keep human
53 
54Product decisions remain human. If a change changes how a user pays, deletes data, sees prices, or understands a permission, I want a responsible person.
55 
56Security boundaries also deserve human attention: auth, roles, tokens, sensitive data logging, database migrations. An agent can help implement, but doesn't have to be the sole decision maker.
57 
58Finally, I keep everything that requires architectural taste human. An agent can propose a refactor, but understanding whether an abstraction is really necessary or whether we are just polishing a non-existent problem remains a job.
59 
60## The review is not optional
61 
62The temptation, when an agent is good, is to trust the green of the CI. It's understandable. It's also when the problems start.
63 
64I always look at at least five things:
65 
661. Does the patch only solve the requested task?
672. Did he touch files that had nothing to do with it?
683. Do the tests cover novel behavior or just happy chance?
694. Does the code follow local patterns?
705. Are errors handled as in the rest of the project?
71 
72When something is wrong, feedback needs to be specific. “Fix it” is lazy. Better: this utility duplicates `parseMoney` into `src/lib/money.ts`; reuse that function, add a test for the EUR case and don't change the public API of the billing module.
73 
74Agents respond much better to small, verifiable comments. Curiously, so do the people.
75 
76## Guardrails worth the effort
77 
78If an agent can read files, write code, and execute commands, it should be treated as a powerful process. There's no need for paranoia, you need hygiene.
79 
80Use separate worktrees or branches. So you can compare the diff, throw away failed experiments, and not mix the agent's work with what you were doing.
81 
82Limit permissions. Commands like `rg`, `git diff`, `npm test` and `npm run build` can be quite free. Deployments, database migrations, access to secrets and destructive commands must remain explicit.
83 
84Reduce network access when you don't need it. For many tasks, official documentation, package registry and specific internal services are sufficient. Less surface area, fewer surprises.
85 
86Track actions. When a patch arrives in review, you should be able to reconstruct prompts, commands executed, tests passed and files modified. Not to create bureaucracy, but to be able to understand what happened if something goes wrong.
87 
88## An easy way to get started as a team
89 
90If I were to introduce agents into a small team, I would start without major revolutions.
91 
92I would create a `agent-ready` label for issues with clear scope. I would add a template with context, constraints and verification commands. I would ask for small PR, ideally under a few hundred lines. I would require testing or screenshots for visible changes. And above all I would keep a person responsible for the merge.
93 
94After two weeks I would look at the data: which tasks were really speeded up, which reviews were heavy, which prompts were confusing, which parts of the codebase are too fragile to delegate.
95 
96It's a less spectacular approach than "from today we'll do everything with the agents", but it's the one that allows you to get to the third week without regrets.
97 
98## The most human part
99 
100The funny thing is that the more autonomous agents become, the more important the classic skills become again: writing a good ticket, making small cuts, creating tests, reading diffs, communicating trade-offs. The agent accelerates those who already know how to work well. It also amplifies the chaos of those who delegate badly.
101 
102So no, I don't see multi-agent workflows as a shortcut to stop doing engineering. I see them as a way to shift more energy to the parts that matter: deciding what to build, making sure it works, keeping the system understandable.
103 
104Agents can make great asynchronous colleagues. But an asynchronous colleague, to be useful, needs context, boundaries and review. Just like everyone else.
105 
106## Useful sources
107 
108- [Codex for (almost) everything - OpenAI](https://openai.com/index/codex-for-almost-everything/)
109- [Running Codex safely at OpenAI](https://openai.com/index/running-codex-safely/)
110- [Introducing Codex - OpenAI](https://openai.com/index/introducing-codex/)
111- [What's new with GitHub Copilot coding agent](https://github.blog/ai-and-ml/github-copilot/whats-new-with-github-copilot-coding-agent/)
112

:Codex and multi-agent workflow: work with agents without losing controllines 1-112 (END) — press q to close

2The first time a coding agent actually fixes a bug for you, the reaction is almost always the same: a mixture of enthusiasm and suspicion. Nice, sure. But then you look at the diff and ask yourself: "Ok, but what exactly did he touch? Can I trust him? Will he do it again in the same way tomorrow?".

4That's where I think the interesting part begins. Not when the agent writes a function, but when it becomes capable enough to take on entire pieces of work: read the repository, create a patch, run tests, open a PR, come back after a review comment. Codex is moving precisely in that direction: background work, separate worktrees, integrated browser, automations, plugins, memory and more explicit permission controls.

6The point is not to imagine a future where no one reads code anymore. It would be a terrible future, as well as quite naive. The point is to figure out how to work with agents who can do a lot without letting them do everything.

8## The change of habit

10With the traditional autocomplete you were always at the wheel. The AI suggested a line, you decided. With an agent, however, the relationship changes: you give him a goal and he goes through multiple steps on his own.

12This is powerful, but it shifts the problem. The question is no longer just "can the model program?". The question becomes:

14- Did I give him a small enough scope?

15- do you know how to check the result?

16- Am I working in an isolated environment?

17- Is the final review still humane and careful?

19A healthy workflow looks more like this than a magic wand:

21```mermaid

22flowchart LR

23 Idea[Human task] --> Scope[Small, verifiable purpose]

24 Scope --> Agent[Agent in worktree isolated]

25 Agent --> Checks[Test, lint, build, browser]

26 Checks --> Review[Human review]

27 Review --> Merge[Merge or new iteration]

28 Review --> Iterate[Precise comments on the diff]

29 Iterate --> Agent

30```

32It sounds less romantic than "the agent builds everything", but it works much better. And it's also how teams that are good with humans work: clear tasks, quick feedback, explicit accountability.

34## The good prompt is almost a good ticket

36The most dangerous prompt is the vague but confident one: "fix the invoices page", "improve the architecture", "clean up the auth module". These are requests that sound productive and generate huge diffs. But then you find yourself doing archaeology.

38A helpful prompt is more boring. For example: implement CSV export for the invoices page, knowing that the table is in `app/(dashboard)/invoices/page.tsx`, the queries are in `src/server/invoices.ts` and there is already a similar pattern in `app/(dashboard)/reports`.

40Then add clear constraints: don't change the database schema, don't add dependencies if a small utility is enough, keep the existing UI style. And close with the verification: `npm test -- invoices` and `npm run build`.

42This type of brief is not to "explain better to the AI". It serves above all to make it clearer to you what you are delegating. If you can't write it down concretely, maybe the task isn't ready for an agent yet.

44## Three jobs that I willingly delegate

46The first is repetitive but verifiable work: adding tests, migrating calls to a new internal API, updating imports, replacing deprecated components, fixing TypeScript errors. Here the agent can save hours and the risk is controllable.

48The second is exploratory work: "find where this total is calculated", "explain to me why this test is fragile", "reproduce the bug and tell me which files seem to be affected". Even when it doesn't produce a patch right away, it can do useful reconnaissance.

50The third is recurring maintenance work: small dependency updates, cleanup of old feature flags, summary of blocked PRs, checking of forgotten TODOs. It's not glamorous, but it's exactly the kind of work that tends to pile up.

52## Three jobs that I keep human

54Product decisions remain human. If a change changes how a user pays, deletes data, sees prices, or understands a permission, I want a responsible person.

56Security boundaries also deserve human attention: auth, roles, tokens, sensitive data logging, database migrations. An agent can help implement, but doesn't have to be the sole decision maker.

58Finally, I keep everything that requires architectural taste human. An agent can propose a refactor, but understanding whether an abstraction is really necessary or whether we are just polishing a non-existent problem remains a job.

60## The review is not optional

62The temptation, when an agent is good, is to trust the green of the CI. It's understandable. It's also when the problems start.

64I always look at at least five things:

661. Does the patch only solve the requested task?

672. Did he touch files that had nothing to do with it?

683. Do the tests cover novel behavior or just happy chance?

694. Does the code follow local patterns?

705. Are errors handled as in the rest of the project?

72When something is wrong, feedback needs to be specific. “Fix it” is lazy. Better: this utility duplicates `parseMoney` into `src/lib/money.ts`; reuse that function, add a test for the EUR case and don't change the public API of the billing module.

74Agents respond much better to small, verifiable comments. Curiously, so do the people.

76## Guardrails worth the effort

78If an agent can read files, write code, and execute commands, it should be treated as a powerful process. There's no need for paranoia, you need hygiene.

80Use separate worktrees or branches. So you can compare the diff, throw away failed experiments, and not mix the agent's work with what you were doing.

82Limit permissions. Commands like `rg`, `git diff`, `npm test` and `npm run build` can be quite free. Deployments, database migrations, access to secrets and destructive commands must remain explicit.

84Reduce network access when you don't need it. For many tasks, official documentation, package registry and specific internal services are sufficient. Less surface area, fewer surprises.

86Track actions. When a patch arrives in review, you should be able to reconstruct prompts, commands executed, tests passed and files modified. Not to create bureaucracy, but to be able to understand what happened if something goes wrong.

88## An easy way to get started as a team

90If I were to introduce agents into a small team, I would start without major revolutions.

92I would create a `agent-ready` label for issues with clear scope. I would add a template with context, constraints and verification commands. I would ask for small PR, ideally under a few hundred lines. I would require testing or screenshots for visible changes. And above all I would keep a person responsible for the merge.

94After two weeks I would look at the data: which tasks were really speeded up, which reviews were heavy, which prompts were confusing, which parts of the codebase are too fragile to delegate.

96It's a less spectacular approach than "from today we'll do everything with the agents", but it's the one that allows you to get to the third week without regrets.

98## The most human part

100The funny thing is that the more autonomous agents become, the more important the classic skills become again: writing a good ticket, making small cuts, creating tests, reading diffs, communicating trade-offs. The agent accelerates those who already know how to work well. It also amplifies the chaos of those who delegate badly.

101

102So no, I don't see multi-agent workflows as a shortcut to stop doing engineering. I see them as a way to shift more energy to the parts that matter: deciding what to build, making sure it works, keeping the system understandable.

103

104Agents can make great asynchronous colleagues. But an asynchronous colleague, to be useful, needs context, boundaries and review. Just like everyone else.

105

106## Useful sources

107

108- [Codex for (almost) everything - OpenAI](https://openai.com/index/codex-for-almost-everything/)

109- [Running Codex safely at OpenAI](https://openai.com/index/running-codex-safely/)

110- [Introducing Codex - OpenAI](https://openai.com/index/introducing-codex/)

111- [What's new with GitHub Copilot coding agent](https://github.blog/ai-and-ml/github-copilot/whats-new-with-github-copilot-coding-agent/)

112