Audit Your Own Code As If You Were the Attacker

You can’t review your own code (and you still have to)

When you review your own work, your brain cheats. You read what you meant to write, not what you wrote. Your implicit goal is to confirm it’s fine — and confirmation bias is very good at finding evidence for what you already believe. That’s why classic self-review catches typos, variable names and formatting, but misses design flaws, security gaps and edge cases. Those don’t show up when you’re looking for confirmation; they show up when you’re looking for blood.

Adversarial review is a deliberate change of goal: instead of “prove this works,” you set out to prove this is broken. It’s the same difference as between a test that passes and a test that tries to break. And although it sounds like something reserved for pentesters, it’s a discipline you can —and should— apply to your own code before someone with worse intentions does.

The difference between reviewing and attacking

Reviewing asks “is this done well?”. Attacking asks “how do I break it?”. They’re two distinct mental modes and they produce distinct findings.

The attacker mode assumes hostility by default. The input isn’t the one you expect, it’s the worst possible one. The user isn’t benevolent, they’re an adversary with time. The network fails at the worst moment, two requests arrive at once right where they shouldn’t, and the data that “always” comes populated arrives null. You’re not checking that the happy path works — you’re hunting the unhappy paths nobody wrote on purpose.

The change of question

Before you look at a single line, change the question you ask yourself. Not “is this correct?” but “if I had to trigger a failure, a leak or corrupted data from here, where would I get in?”. That reframing, all by itself, changes what your eyes see.

One pass per hat

The most common mistake when reviewing is trying to see everything at once and, in practice, seeing almost nothing. Adversarial review works much better by lenses: one complete pass with a single hat on, then another with the next.

Correctness — does it do what it says? what about the edge cases, the empty, the zero, the negative?
Security — untrusted input, injection, authorization, secrets in logs?
Concurrency — what happens if this runs twice at the same time? is there shared state left unprotected?
Failure modes — what happens when the dependency this leans on goes down? does the error propagate or get swallowed silently?
Performance — does it scale with the worst case of input size, or is it linear where it should be constant?

Each lens makes you a different person for ten minutes. The concurrency one isn’t distracted by an ugly name; the security one doesn’t forgive an empty catch. Mixing them is the recipe for all of them losing their edge.

Don’t trust the first verdict

A single reviewer —human or agent— gets it wrong in two directions: it sees flaws where there are none (false positives) and it misses real flaws (false negatives). The way to tame both is independent verification: more than one reviewer, without them seeing each other, and a decision rule over their votes.

To confirm that a finding is real, demand a majority. If three reviewers look at a supposed bug and only one sees it, it’s probably noise. This trims the false positives that would waste your time.
To rule out that something is safe, be strict: one reviewer raising their hand is enough to investigate. Here the cost of a false negative (a real bug ignored) is greater than that of over-looking.

Independence is the key, and it’s fragile. If the reviewers see each other’s verdicts, they anchor and stop being independent: the second tends to ratify the first. Each judgment has to be issued blind on the state of the code, not on another reviewer’s opinion.

Classify by severity

A finding without severity is noise you can’t act on. Before closing the audit, every finding gets a classification —the CVSS 3.0 scheme is a good starting standard— in four levels: Critical, High, Medium, Low. Severity isn’t bureaucracy: it’s what tells you what to fix before deploying, what to fix this sprint and what to note down as conscious debt.

Severity	Meaning	Action
Critical	Exploitable, severe impact, no mitigation	Blocks the deployment
High	Serious risk, partial mitigation	Fix it before closing
Medium	Real risk under certain conditions	Plan it, don’t ignore it
Low	Defense in depth, hygiene	Note it and keep an eye on it

”Zero criticals” isn’t security

Here’s the trap that sinks the most teams: finishing an audit with zero critical findings and declaring victory. The absence of criticals isn’t the same as security. First, because a critical you didn’t find doesn’t appear in the report — and the report only reflects what you looked at. Second, because medium and low findings compose: three aligned “mediums” —a log that leaks too much, a loose validation and an inherited permission— can chain into the incident that no isolated “critical” would have caused.

The report isn't the territory

An audit report with zero criticals describes what the auditor looked at and didn’t find, not what exists. Treat it as a partial map, not a certificate. The useful question isn’t “did it come out clean?” but “what did we not get to look at?”. What you didn’t audit reads, mistakenly, as audited.

Agents as a red team

There’s a modern twist: using AI agents as adversarial reviewers. They fit surprisingly well, because the pattern we already described —several independent reviewers with distinct lenses— parallelizes naturally. You launch several agents at once, give each one a lens and the explicit instruction to refute (not to approve), and apply the majority rule over their verdicts.

The key is the adversarial prompt: ask the agent to assume the code is guilty and that its job is to prove guilt, not innocence. A reviewer —of silicon or of carbon— that you ask to confirm everything is fine will confirm everything is fine. One that you ask to find the flaw really looks. Confirmation bias applies to machines too; the antidote is the same: change the question.

An adversarial checklist

For a concrete session, this list covers most of the value:

Model the threat. Who would want to break this and what would they gain? Without an actor there’s no focus.
Enumerate the trust boundaries. Every point where data crosses from “untrusted” to “trusted” is a place to validate — or to sneak through.
Assume hostile input. Null, empty, huge, with weird characters, out of range. What breaks?
Hunt the silent failures. Look for every catch that doesn’t rethrow or log, every error that gets swallowed, every default value that hides a problem.
Question the isolation. In multi-tenant systems, can one tenant see another’s data through some path? Isolation leaks are almost never obvious.
Verify what you verify. Every finding, confirm it with a second independent look before calling it good.

FAQ

What is an adversarial code review?

It's a review whose goal is to prove the code is broken, not to confirm that it works. It adopts the attacker's mindset —assuming hostile input, concurrency at the worst moment and dependencies that fail— to find design and security flaws that self-review, biased toward confirmation, lets slip by.

Why is it a bad idea to review your own code looking for it to be fine?

Because of confirmation bias: if your goal is to confirm it works, your brain finds evidence for it and skips what contradicts it. You read what you meant to write, not what you wrote. That's why self-review catches typos but misses design flaws and edge cases.

Why does 'zero critical findings' not mean the code is secure?

Because a report only reflects what was looked at: a critical that wasn't found doesn't appear. On top of that, several medium- or low-severity findings compose —a log that leaks, a loose validation and an extra permission can chain into an incident that no isolated critical would have caused. The useful question is what wasn't audited.

Can AI agents be used for adversarial review?

Yes, and they fit well because the pattern of several independent reviewers with distinct lenses parallelizes naturally. The key is the adversarial prompt: asking each agent to assume the code is guilty and try to prove it, not to confirm it's fine. Then a majority rule is applied over their verdicts.

Conclusion

Adversarial review isn’t distrust, it’s discipline. It’s accepting that your brain is wired to see what it expects to see, and building a process that compensates for that bias: changing the question from “does it work?” to “how do I break it?”, reviewing by lenses instead of everything at once, verifying with several independent looks, classifying by severity and not confusing “zero criticals” with “secure”.

Do it as a permanent practice, not as a pre-release ritual. The attacker is going to audit your code sooner or later. The only question is whether you do it first.