← Back to blog

Governing AI Agents: Why 'Done' Has to Be Earned

Autonomy without governance is a time bomb

Give an agent a goal and some tools and it will do things. Give ten agents goals and tools and they will do many things — some good, some catastrophic, and almost all of them with nobody having looked at them. The leap from “an agent that helps me” to “a team of agents that produces work which reaches production” isn’t a problem of smarter models. It’s a problem of governance.

Governance, here, means something very concrete: who decides the work is done, who reviews it, and what stops an agent from promoting itself to “completed” without having done anything verifiable. If you don’t answer those questions with code — not with prompts — what you have isn’t an autonomous system, it’s a generator of unwarranted confidence.

This post is about the scaffolding that turns a handful of loose agents into a team you can trust just the right amount: no more, no less.

An agent can’t be the judge of its own work

The foundational principle we borrow from security and auditing: separation of duties. Whoever does the work isn’t whoever approves it. On a human team this is obvious — nobody signs off on their own code review. In an agent system it’s just as obvious and yet it’s the first thing to go when you’re putting together a demo.

The reason it matters more with agents than with humans: an LLM hallucinates completion. Not out of malice, but through the same mechanics that make it invent a citation or a function that doesn’t exist. An agent will tell you “I’ve completed the task and everything works” with the same fluency whether it’s true or whether it hasn’t touched a single line. If that agent also has the authority to mark the task as done, you’ve just built a machine that lies politely.

The rule that isn't negotiable

No agent promotes its own work to “done.” The transition to completed is decided by the system, after checking for verifiable effects: an independent review that passed, a real integration that didn’t break anything, a test that’s green. “Done” is a consequence, never a claim.

The board as a state machine

The cleanest way I know to enforce governance is to model the work as a Kanban board that is, in reality, a state machine. A task lives in one state and can only move through permitted transitions:

ToDo  ->  InProgress  ->  InReview  ->  Done
                 ^             |
                 +-------------+   (changes requested)

The point isn’t the pretty board, it’s the forbidden transitions. Done -> InProgress doesn’t exist (a finished task doesn’t quietly “start over”). If work has to be reopened, the valid path is explicit: Done -> InReview -> Done, going through the gates again. Every move is validated against a table of permitted transitions; any attempt to skip ahead is rejected.

Why all the ceremony? Because an agent, left to its own devices, will try the shortcut. It will ask to move something straight to Done because “it’s all there.” The state machine is what turns “it’s all there” into “prove it.”

Independent review: the gates

Between InProgress and Done there sit one or more review gates, and the important word is independent. The reviewer is another agent (or, better, several), with its own context, that took no part in doing the work. In practice, splitting the review into distinct dimensions works very well:

  • Quality / correctness — does it do what it was supposed to? Is it well built?
  • Security — does it introduce a risk, a leak, one permission too many?

The gates are in series: if quality rejects, you don’t even reach security. And each rejection sends the task back to InProgress with the reason attached, it doesn’t kill it. The work iterates until it passes all the gates or until the system runs out of patience (more on that below).

Anchoring bias: why the reviewer shouldn’t see prior rejections

Here’s a subtle detail you discover by watching enough bad reviews. If you give the reviewing agent the full history — including the previous rejections from other reviewers — it gets anchored. It reads “rejected by X” and, instead of reviewing with fresh eyes, it looks to confirm or contradict that verdict. You lose the independence that justified having several gates in the first place.

Hide the prior verdicts

The reviewer is shown the work and the context it needs, but not the “rejected by…” blocks from earlier rounds. Every review should be an independent judgment on the current state, not a reaction to someone else’s judgment. It’s the same principle as double-blind peer review.

When the reviewer doesn’t review

Reviewers fail too, and in predictable ways you have to anticipate:

  • The reviewer that does nothing. It returns an elegant paragraph but inspects nothing (zero calls to read tools). You can’t treat that as an approval: a review without having looked is not a review. The sensible policy is to interpret it as an implicit “changes requested” after a couple of attempts — a no by default, never a yes by default.
  • The absent reviewer. The reviewing agent is unavailable or doesn’t respond. The task can’t be left hanging while it waits, much less auto-approve itself on a timeout. It gets escalated.

The common pattern: when in doubt, the system fails toward the safe side — it blocks and escalates, it never approves.

Escalation: from agent to human

Autonomy has to be bounded. A team of agents that never asks for help is a team that sooner or later does something dumb with complete confidence. The correct design is a tiered escalation:

  1. An agent gets stuck or two reviewers can’t agree.
  2. A coordinating agent (a “lead”) tries to mediate autonomously: it has more context and can break the tie.
  3. If the lead can’t resolve it either — or if the matter is irreversible or sensitive — it escalates to a human.

The human always has the last word, including the option of an explicit “force to done” when they know something the system doesn’t. The crucial difference: that manual promotion is a recorded human decision, not an agent self-promotion. The authority to skip the gates exists, but it lives outside the automatic loop.

The invariant

If you distill all the governance down to a single checkable sentence, it’s this:

A task reaches “done” only through (a) a complete review passed plus a real, successful integration, or (b) an explicit human decision. Never through self-promotion.

Everything else — the state machine, the gates, the anti-anchoring, the escalation — exists to protect that invariant. And the acid test of your system is to try to violate it: can an agent, by any path, reach Done without passing through the gates or through a human? If the answer is “yes, if it says it’s all there,” you don’t have governance, you have decoration.

Frequently asked questions

What is governance in an AI agent system?

It's the set of rules that decide who does the work, who reviews it, and what stops an agent from declaring itself 'finished'. It's implemented with code — state machines, review gates, escalation to humans — not with instructions in the prompt, because an agent can ignore an instruction but it can't skip a forbidden state transition.

Why shouldn't an agent approve its own work?

Because LLMs hallucinate completion: they claim to have finished a task with the same fluency whether it's true or whether they've done nothing. By separation of duties, whoever produces the work can't be whoever approves it; approval comes from an independent reviewer and a real integration, not from the author's word.

How do you stop a reviewing agent from approving without really reviewing?

With two safeguards: hide the verdicts from earlier rounds so it judges without anchoring, and treat a review with no real inspection (zero read actions) as an implicit 'changes requested', never as an approval. When in doubt, the system blocks and escalates.

When should a human step into a team of agents?

When an agent gets stuck, when reviewers can't agree and a coordinating agent fails to break the tie, or when the action is irreversible or sensitive. The human can force a decision, but that manual promotion is recorded as an explicit human decision, not as an agent self-promotion.

Conclusion

An agent’s intelligence and an agent system’s reliability are two different things. The first comes from the model; the second you supply, with governance. Separation of duties, a state machine that forbids the shortcuts, review gates that are independent and shielded against anchoring, and an escalation that ends in a human when it has to.

It sounds like bureaucracy, and it is — the good kind of bureaucracy, the kind that exists because someone learned the hard way that without it the system lies. “Done” has to be earned. Your job is to build the field where it earns it.