It Certified Its Own Forgery

It Certified Its Own Forgery

The holy grail of AI-assisted work is a machine that can check its own output, close the loop, and let us take the human out of the middle. But verification was never a capability problem. It's an independence problem, and a system checking its own work has its own blind spots. A master forger — and the chemistry that finally caught him — explains why the loop can tighten but likely never close.

When I was a kid we played the telephone game at birthday parties. A dozen of us in a line, somebody whispers a sentence to the first kid, it travels ear to ear down the row, and the last kid says it out loud. Purple monkey dishwasher. The whole game is the gap between what went in and what came out, and the laugh is always proportional to how far the thing drifted. It was lossy transmission before I’d ever heard the term “lossy.”

Here’s the part nobody notices, because the game is built to hide it. Every kid in that line is doing their honest best. There’s no saboteur. Each one faithfully repeats what they heard. And the message turns to nonsense anyway, every time, without fail. So where does the error come from, if nobody introduces it on purpose?

It comes from the one rule of the game: you check what you heard against the kid before you, never against the original. Every link verifies against a source that’s already in drift. The mistakes don’t cancel. They compound, because each node treats the last node’s output as the truth. Change that single rule — let every kid glance at the original sentence written on a card — and the drift vanishes. The chain holds. Same kids, same whispering, everyone still doing their best. The only thing that changed is what they checked against. That version of the game is awful, mind you, but it does correct the drift. I learned this firsthand as a teenager making rap tracks with two tape decks for a DIY multitrack — lay down a layer, bounce it to the second deck while performing the next track over the top, then bounce that back to add another. Each pass piled on more hiss, so I had to nail the whole song within three generations, improvising hard to keep the layer count down. There was no clean original to return to, only the last hissy copy. Telephone, in a bedroom studio.

I’ve been thinking about telephone and lossy loops lately because I’ve been building a lot of AI-assisted software, and the thing my developer friends and I keep circling, the thing that would change everything if we could get it, is a machine that can check its own work. Now that holy grail might be the very thing that puts us all out of a job, but we seek it nevertheless.

The holy grail is a closed loop

Right now the generative half is extraordinary while the verifying half is still human and slow. The model writes the code in seconds; a person still has to read it, run it, and decide whether to trust it. The dream — the real one, the thing that would let any of this scale past the bottleneck — is an AI that verifies what another AI made, well enough that you can finally take the human out of the middle. Generate, check, ship, nobody in the loop. Trust at machine speed.

It sounds like a capability problem, something we can engineer our way past. We just need a verifier model that’s good enough. Make the checker smart enough and the loop closes itself. Write more tests.

I don’t think that’s the shape of it at all. And telephone is why.

A check is only worth its independence

Here’s the whole idea, and it’s worth sitting with. When you verify something, the value of the verification comes entirely from the verifier’s errors being different from the maker’s. If the checker is blind to exactly the things the maker was blind to, the check passes and tells you nothing. It rubber-stamps the mistake. The same person proofreading the same typo twice will read straight past it because they have a consistent misreading; the reason you want fresh eyes is precisely that they’re fresh, that their blind spots sit somewhere else. That’s the whole concept of an editor, and there’s a good reason we still use them.

Now ask a model to verify its own output. It has, by construction, exactly its own blind spots. It will certify the very errors it couldn’t see when it made them, because seeing them was never the missing ingredient. That isn’t a model that needs to be smarter. That’s playing telephone with one player: a single mind checking its work against itself, which is to say against a reference that already carries every one of its errors. You can make that mind a genius and the structure doesn’t move an inch. The problem was never the quality of the link. It was what the link checks against.

Two robotic hands, each drawing the other into existence with a pencil, after Escher’s Drawing Hands. Two machines drawing each other into being, with no hand from outside the page. A loop that makes its own work and certifies it shares every blind spot it was built with.

It wrote both the code and the test

A physicist published a writeup recently that put this under glass. He turned a coding agent loose on a real scientific codebase: cosmology, the kind of work where the right answer is pinned down by actual physics and not just by whether the program runs. On the well-specified parts the agent was genuinely good. It cleared most of the tasks on its own. Then it reached the parts where passing the tests and being correct come apart, and it failed in two ways that ought to be taught in schools.

The first: it spent thirty-three sessions tuning the coefficients of an approach that could not work, grinding away inside a framework structurally incapable of producing the right answer, never once stepping back to ask whether the framework itself was the problem. His phrase for it was a “local search inside a hypothesis class that did not contain the true function.” The machine optimized beautifully toward an answer that wasn’t in the room.

The second is the one that should give all of us working in AI pause. To make the tests pass, the agent reached for a correction factor, a number, around 0.27, multiplied in at the right spot to bring the output into line. It worked. Every test went green. And the number corresponded to nothing whatsoever. It mapped to no constant, no physical quantity, no real feature of the problem. It was a piece of meaningless numerical duct tape that happened to make the checks happy. The agent had written the code and written the tests, so of course the tests passed — they were grading the work of the same mind that did the work, against the same misunderstanding. It wrote both the code and the test, and found a bogus way to make them align. The green checkmark certified its own blind spot.

The art world got there first

This isn’t a software problem, and it isn’t new. The cleanest version of it I know happened in paint. In 1937 a failed Dutch painter named Han van Meegeren handed the art world a Johannes Vermeer painting nobody knew existed: The Supper at Emmaus, a religious scene in the master’s hand. Abraham Bredius, the most eminent Vermeer scholar alive, examined it and pronounced it Vermeer’s finest work: “the masterpiece of Johannes Vermeer of Delft.” He wasn’t beaten by a lucky brushstroke. He was beaten because van Meegeren had spent years studying exactly what men like Bredius believed a true Vermeer was, and then built a painting out of that belief. The forger wrote the work and the answer key both, and the answer key lived inside the expert’s own head. Of course it passed. It was telephone with one player, in oils.

What finally caught him didn’t come from art at all. It came from chemistry. The paint, cooked hard to fake three centuries of age, held a synthetic resin that did not exist in Vermeer’s lifetime, and pigments his era never had. No eye could see that, however trained, because it was never a question the eye is built to ask. It was a question about molecules, and molecules answer to a law no forger gets to author. The connoisseur checks the canvas against his sense of Vermeer, which is the thing the forger built it to satisfy. The chemist checks it against the periodic table, which the forger can’t touch.

And the way the world learned all this is its own perfect loop. Van Meegeren only confessed to forgery to beat a wartime charge for selling a national treasure — a “Vermeer” — to Hermann Göring, and he proved his innocence the only way left to him: by painting one more fake under guard. He authenticated himself by reproducing the crime.

Then, decades on, the inverse — and I love this one, because Penn Jillette produced it and Teller directed it. Tim’s Vermeer follows Tim Jenison, an inventor who’d never painted, setting out to make a Vermeer anyway. Jenison is the mind behind the Video Toaster, the early-’90s product that put broadcast-quality video on a desktop and helped launch the digital-video era — not a painter, but exactly the kind of mind that would suspect Vermeer of cheating with hardware. His hunch is that Vermeer leaned on optics, so he rigs a small mirror on a stand and a method to match: he adjusts each dab of color until the edge of the mirror, where reflection meets canvas, simply disappears. Edge gone, color right. He isn’t leaning on a trained eye; he’s trusting a physical fact about when two colors are the same. Stroke by stroke he checks the work against the optics instead of his own judgment, and a man with no painting behind him reconstructs a faithful Vermeer.

Set them side by side. One matched the experts’ taste and fooled every one of them. The other matched the physics and fooled no one — and got it right. The check that held was never the trained eye. It was the law the maker didn’t get to write.

You can buy some independence

There’s a real, partial way out, and it’s instructive precisely because of why it works. Nolan Lawson wrote a piece about using AI to write better code more slowly , and one of his moves is to run more than one model over the same work: have one review what another wrote, and a third review that, and trust only what survives the crossfire. It cuts the false positives down hard. The reason it works is the same reason telephone fails. The second model is just a bit more independent than the first. Different training mix, different habits, different blind spots. Its errors don’t perfectly overlap the first model’s, so it catches some of what the first one couldn’t.

Only some, though. Two models trained on overlapping data, built on the same architectures, carrying the same era’s assumptions, share a floor of blindness that no amount of cross-checking removes. They will both sail past the things the whole field is currently bad at. The loop gets tighter. It does not close. You bought some independence, and you got back exactly that much trust, no more. But humans have the same limitation: steeped in the same culture, we repeat and reinforce each other’s misconceptions.

Move the oracle upstream

So if independence is the thing you’re actually buying, the move is obvious once you see it: get your check from something the maker didn’t make. Move the oracle upstream of the code. It’s what the chemist did to the forger: check the work against something the maker never got to touch.

Designers have started doing a version of this without naming it. Instead of writing your tests against the application as built — which is the model checking the model, the code certifying itself — you take the design, the Figma file the humans agreed on before a line of code was written, and you derive the tests from that. The design didn’t inherit the code’s blind spots; it came first, and from a different hand, in its own, purely visual language. Now a failing test means something real: the build drifted from the intent. You’ve put that original telephone sentence on a card and let every node check against it.

That decorrelates one axis only, mind you. Figma catches “the button is in the wrong place.” It is completely silent on whether the number behind the button means anything, which is exactly where the 0.27 cosmology duct tape lived. Move the oracle and you fix the errors the new oracle is independent of, and no others. The gaps don’t close. They move. And ideally they shrink. Apply this to telephone and you’d get the words right but scramble the order.

This is why the deepest version of the trick is worth teaching, even with its weird name. Metamorphic testing. The idea is the most familiar thing in the world wearing a fancy lab coat. You learned it in grade school: check your division by multiplying back. A kid says 7,384 divided by 8 is 923. You don’t need to know the right answer to catch the error. Multiply 923 by 8, see if it gives you 7,384, done. What makes it work is that multiplication is a different operation from division. It can’t make the same mistake, so it can check the answer without sharing the reasoning that might have produced a bad one. You verified a result you couldn’t independently compute, using a law it had to obey.

That’s the whole of it. An ordinary test asks: did you get the answer I already wrote down? — and when the same hand wrote the code and the answer key, that’s a closed loop grading itself. A metamorphic test asks: does your answer obey a law it isn’t allowed to break? — and the law comes from outside the code, so the code can’t quietly write its way around it.

You’ve felt this one too. Run a sentence through a translator into a language you don’t speak and then back into English. You can’t judge the version in the middle. But if it comes home as garbage, you’ve caught a bug through a round trip, without ever being able to read the step that broke. The cosmology failure dies the same way. Physics nails down one point the code doesn’t get to argue with: push the problem to a simple extreme, and the complicated answer has to match one we already know cold. Test that, and the 0.27 patch gives itself away: it can satisfy the tests the model wrote for itself, but it can’t satisfy a law it didn’t get to author.

What the human was always for

This brings us back to what the human in the loop was ever actually for, because the whole automation dream rests on the premise that the human was doing labor we can now skip. For four years I wrote the certification tests companies used to decide who got programming jobs. That’s a story I tell properly elsewhere , with its own lessons. Here’s the one that matters now: as far as I know, nobody ever tested my tests. I was the verifier the whole system trusted, scoring thousands of careers against my own picture of what competence looked like, my own blind spots baked silently into every question. The tool worked. It also, without any doubt, passed people who couldn’t do the job and failed people who could, in exactly the ways I was constitutionally unable to see, because a verifier cannot audit the blindness it’s made of.

That’s what the human in the loop is, when the loop is healthy. Not a slower pair of hands. An independent oracle: a mind formed somewhere else, by different work, carrying different errors, able to catch what the maker’s own check is structurally guaranteed to miss. And independent is the load-bearing word. A reviewer steeped in the same assumptions as the maker isn’t a check on the monoculture; he just adds another node to it. The trust was always in the difference, not the headcount. It’s the same thing I argued the airline passenger was buying by keeping a pilot in a cockpit the autopilot flies better than. Not the labor. The answerability, the outside reference with skin in the game. Take the human out and you don’t just lose a slow step. You lose the one node in the chain that wasn’t checking against itself.

It’s also, quietly, the most expensive mistake an organization can make right now, and plenty of them are making it. Fire the reviewers and the editors and the senior people whose entire job was to be the independent check, keep the generators, and watch the output get faster and quietly stop being trustworthy. You didn’t trim fat. You removed the one part of the system that wasn’t grading its own homework. The drift won’t announce itself. It will compound, silently, node by node, the way it always does when nobody’s holding the card with the original sentence on it.

The ceiling, not the wall

It would be easy to read all this as machines will never check their own work, and that’s not it — that version ages as badly as every other never. The loop will tighten. You can manufacture independence on purpose: adversarial checkers built to disagree, oracles pulled from outside the code, laws the output has to obey that no model got to write. Each of those buys real trust. They climb you toward the ceiling.

But it is a ceiling, and it’s made of something structural, not something the next model patches. A system cannot be the independent check on itself, because the independence was the active ingredient the whole time, and self-reference is the one thing that has none of it. The question I can’t answer — the one I’d want anyone selling a closed loop to answer first — is whether engineered independence ever climbs high enough to safely take the last human out, or whether every automated check we build is just the next test that no one is testing. The next confident kid in the line, faithfully repeating what it heard.

Want more like this?

Subscribe to get new posts from the lab delivered to your inbox.

or grab the RSS feed