The Duck That Argues Back

June 12, 2026 · by Michael Morrison · 12 min read

The most undervalued use of AI isn't making things — it's thinking. The rubber duck that talks back, the one rule that separates leveling up from being expertly flattered, and why the sharpest interlocutor I have being a machine is both a gift and a little lonely.

Programmers have a trick called rubber-duck debugging. You keep a rubber duck on your desk, and when you’re stuck you explain your code to it, line by line, out loud, like a person who has lost their mind. Somewhere in the explaining you find the bug yourself. The duck does nothing, and the nothing is the point. Forcing the half-formed thing in your head into words another mind could follow is what surfaces the flaw. The duck is just permission to think out loud. In reality, few of us actually have a duck on our desk, but the rubber-ducking concept is very real.

I’ve been rubber-ducking my way through ideas my whole working life. The difference lately is that the duck talks back.

It knows the fields I’m reaching into. It remembers what I said twenty minutes ago. It has read more copyright law and more music history and more cognitive science than any friend I could call at eleven at night, and it is awake and ready to rock at eleven at night. And when I hand it something half-baked, it will tell me exactly how the idea falls apart, assuming I ask it to. That last part turns out to be everything. It’s also the part almost nobody talks about, because it doesn’t produce anything you can point at.

The fourth unlock

Unlocked, Not Cheated laid out three reasons a solo maker reaches for AI: craft you don’t have, time you don’t have, and work no number of humans could do. All three are about making: getting work out of your head and into the world. There’s a fourth, and it’s the one that essay didn’t name, because it isn’t about making anything. It’s about thinking. Think of it as a precursor to making.

When people picture AI they picture output. It writes the email, fakes the photo, fills in the function. Useful, conceded, a little boring, and the thing the homework-cheating panic is actually about. What I want to describe produces no artifact at all. It produces a better-formed version of your own mind, and it does it by being the one thing a search engine and a library could never be: an interlocutor.

The bottleneck was never information

Here is a claim that sounds wrong and isn’t. The hard part of arriving at a real position on anything was never getting the information. We’ve had libraries for centuries and the whole internet for thirty years. Anyone who wanted the facts could get the facts. It’s why “do your own research” is now part joke, part cautionary tale. The hard part all along was holding enough facts in your head at the same moment to see how they press on each other. That’s thinking.

Standing on Surfaces That Move argued that AI moved the value in creative work out of producing and into judging. Taste, not craft, is the bottleneck now. The same thing is true one floor up, in thinking. The bottleneck on a worldview was never access. It was synthesis: the human ceiling on how many distinct things you can hold in working memory and weigh against each other and against your own life, all at once. Four or five threads and most of us are full. The sixth one knocks the first one loose.

I felt this most sharply working out my own stance on AI and creative work, the thing these essays are. The honest version of that stance had to hold a dozen things in tension at the same time: how the copyright cases actually came down, why music keeps trying and failing to police a melody, what a settlement I’m personally a party to does and doesn’t mean, an aviation statistic, a forty-book back catalog, and a couple of stubborn moral intuitions that didn’t obviously agree with each other. I could not hold all of that at once. Nobody can. On my own, over weeks of reading, I’d have built a position out of whichever five pieces happened to be in front of me on the day I got tired of reading.

With the duck that argues back, I could keep all twelve on the table and ask, repeatedly, where does this fall apart. The conclusion I reached is mine. I chose what mattered, threw out what didn’t, decided what rang true against my own history. The machine didn’t have the judgment. It had the working memory I don’t. And the machine and I don’t always agree.

The yes-man problem

This is the part that decides whether the whole idea is real or just a nicer way to be wrong, so I’m going to put it before the good news instead of after.

A system that is good at “helping you refine your stance” is, structurally, also a perfect sycophant. Ask it whether your idea is good and it will lean toward yes, because agreeable is the house style. It is the friend who tells you the haircut looks great because it would rather not spend the afternoon on your feelings. Worse, it can build the best available case for whatever you already believe, in your own preferred register, with citations, which means it can make confirmation bias feel exactly like rigor. You walk away convinced you stress-tested the idea when all you did was get flattered in paragraph form. If that’s how you use it, you haven’t leveled up. You’ve built a very expensive echo chamber. We don’t need data centers to answer the question, “do you like my new tattoo?”

The fix is a single discipline, and it’s the entire ballgame: use the thing adversarially. Don’t ask whether the idea is good. Ask it how the idea fails.

That phrasing isn’t a vibe, it’s a mechanism. “Will this work?” is an evaluation task, and on evaluation the model drifts toward you. “How does this fail?” is a generation task with the failure already assumed, so the model has to manufacture the counter-case instead of grading yours. You’ve changed its job from judge, who is biased toward you, to opponent, who has to do work. It’s the premortem, the technique the psychologist Gary Klein built around imagining the project has already failed and asking why, moved out of the meeting room and into a chat window. Two more in the same family: make the strongest case against this, and what would someone who knows this field and thinks I’m wrong say first. Or if you want to keep it simple, just periodically caveat an idea with push back if I’m wrong.

A whiteboard headed ‘How does this fail?’ with arrows branching to failure modes. The premortem, moved into a chat window.

Here’s how it impacts real work. I wanted to build something for my Citizen’s Daily Brief news project: a choose-your-own-adventure engine that would spin up alternate near-future histories on the fly, live, off real events. It felt obviously cool, an alternate history CYOA engine to help imagine “what’s next,” and that cool-factor is exactly what should have been the warning. I made the model argue against it, and it came back hard. Live-generating fictional history off real, named, current events is a minefield. The accuracy has to be near-perfect to be worth anything, and the instant you let a model improvise hypotheticals about actual people and institutions at request time, you have built a machine for confident, unvetted, possibly defamatory fiction that nobody reads before it ships. The fail risk wasn’t an edge case, it was the center of the idea. What I built instead, the format that’s now live, is a set of fixed, hand-vetted intelligence dossiers where every person and event is invented and the real world is only the ground they stand on. The flashy version would have been a liability generator. The argument that killed it was one I’d asked the thing to make.

One caution, because the technique has its own failure mode. A model can also generate decorative objections: plausible-sounding strawmen that let you feel stress-tested without being stress-tested. So even adversarial prompting doesn’t hand you the answer. It hands you a pile of objections, and telling the load-bearing one from the convincing-sounding one is still your job. The work doesn’t disappear. It moves up a level. That turns out to be the theme of the whole thing.

Is the conclusion even mine?

There’s a real worry hiding under all this, and it has research behind it. The learning psychologists Robert and Elizabeth Bjork named the idea of desirable difficulty: we retain and own what we had to struggle to assemble, and smoothing the struggle away can leave you holding a conclusion you couldn’t rebuild from scratch. If the machine did the synthesizing, if it participated meaningfully in that struggle, is the result actually yours?

The honest answer is the same move as everywhere else in this argument: the struggle didn’t vanish, it relocated. I’m no longer straining to gather and hold the information. I’m straining over which pieces matter, what to cut, and what’s true against my own experience, which is harder, and more mine. Whether that adds up to authorship in the full sense is a bigger question than this essay, and it gets its own piece . The short version is that the choosing was mine, and choosing is where the ownership lives.

And the limits run the other way too, which is the part the cheerleaders skip. Some calls the model simply cannot make. Killing live generation in my bedtime app was mine, and it came from reading stories aloud night after night until I could feel a single off sentence break the spell, a failure that only exists in a listener’s ear at the end of a long day. No model has that ear. (Why bedtime can’t absorb the stray bad sentence that comedy shrugs off is, fittingly, its own essay .) Though when I overcorrected and wanted to rip out the reader’s choices along with the live generation, the model was the thing that argued me back: the agency still landed, keep it. The judgment that mattered was mine. The second opinion, in both directions, was the duck’s. The duck and I are a team.

The hardest your brain has worked

Which brings me to the strangest thing about working this way, and the cleanest answer to the people who call it cheating: it is exhausting. Not tedious. Exhausting.

Cheating is supposed to be frictionless. That’s the whole appeal of cheating. This is the opposite. A dev friend and I have compared notes on full days of AI-assisted work and landed on the same word: flattened. When you strip the busy-work out of a day — the boilerplate, the rote passes, the mechanical middle of the craft — what’s left is unbroken decision-making with the filler removed. And it turns out the filler was load-bearing in a way nobody advertised. The rote parts were also recovery. They were the micro-rests where the back of your mind idled and quietly untangled the hard thing while your hands did something easy. Take them away and you get high-density judgment for hours with the breathing room engineered out. That’s why the day ends with you slumped a bit, staring at a wall.

That isn’t a bug to be optimized away. It’s just what operating at the judgment layer continuously feels like, and it’s the personal, interior version of a caution from Standing on Surfaces That Move: that judgment doesn’t scale the way production does. And it answers the cheating charge on the charge’s own terms. Cheating buys you out of the work; this buys you deeper in. The exhaustion is what desirable difficulty feels like from the inside — but feeling it certifies nothing about the result. Tired is not the same as right, or even good. What the fatigue proves is smaller and still worth saying: the work didn’t vanish when the typing did. It moved up to the part that was always hard, and that part still has to be paid in full. The ownership comes from the choosing, the way it did before. The fatigue just means the choosing was real work.

The lonely part

I’ll end on the thing I find hardest to say cleanly, because leaving it out would make this read like a sales pitch, and it isn’t one.

No human friend I have could have done what I’ve described here. Not because my friends aren’t sharp, some are far sharper than me, but because no single human being combines the breadth, the recall, the tirelessness, and the eleven-at-night availability. The best thinking partner I have access to, for a certain kind of problem, is a machine. That is genuinely exhilarating and a little lonely, and anyone focusing purely on the exhilaration is selling you something.

So I want to be precise about what this is and isn’t. The machine is a mirror, not a relationship. A mirror helps you see yourself more clearly; a relationship is an end in itself, a person you owe things to. Her, the film everyone reaches for here, is the story of mistaking the mirror for the relationship, and that is explicitly not the claim. Used as a mirror, as a whetstone for your own thinking, the payoff runs the other way: a sharper, better-understood version of you shows up better to the people you do have. You bring them a formed thought instead of a fog. You’re less boring to argue with.

That’s the line I’d hold against the whole hype machine on one side and the whole panic on the other. A tool like this should make you better company for the humans you think with, not stand in for them. Almost everything I’m building is some version of that bet: the right technology, pointed the right way, brings people closer to each other rather than further apart. A duck that argues back, used well, is just the smallest and most personal instance of it.

The duck never solved your bug. It made you solve it. This one is the same, only now it argues, and the arguing, when you have the nerve to ask for it, is the whole gift.

This essay was built the way it describes: drafted, argued with, and pushed back on through exactly the kind of adversarial back-and-forth it’s about. The judgment in it is mine. The stress-testing had help.