The Benchmark Fallacy

AI beats us on every test. That was never the question.

Jun 11, 2026

A teenager holding a perfect driving-test score in one hand and a set of ambulance keys in the other, looking unsure.

AI can now pass almost any test we give it. So people say it has caught up to us.

The sharpest warning about that came from an odd place. Not a tech lab. The Vatican.

Pope Leo XIV wrote a letter that captures it exactly. A machine can beat you on every test you can write, and still not be able to replace you.

The trap has a name worth keeping. The benchmark fallacy: treating “AI beat the human on the test” as “AI can replace the human.” (A benchmark is just a scored test.)

A test grades the answer. It never grades what makes an answer good when things go sideways: judgment, and the nerve to own the result. Passing the test and holding the job are two different claims. We keep merging them.

Acing the test is not the same as being trusted with the job.

Picture a sixteen-year-old who aces the written driving test. The score is real. It proves they learned the rules.

It does not mean you would hand them the keys to an ambulance in a snowstorm.

The test and the road are different places. Only one of them has a patient in the back.

AI keeps passing, which tells us less, not more.

The scoreboard looks one-sided. Back in 2024, AI was already clearing 80% on broad knowledge exams. By 2026 it was near 80% on harder tests too, including graduate-level science and fixing real bugs in real code.

The easy read is “the gap is closing.”

But there is an old rule here. Once a test becomes the goal, people aim at the test instead of the skill, and the test stops measuring anything real. Economists call it Goodhart’s Law.

A test everyone passes has stopped being a test. So the more AI aces them, the less the scores tell us about what people are for.

A priest and a tech crowd reached the same verdict.

In May 2026, Pope Leo XIV put it plainly:

“They may imitate language, behavior, and analytical skills, or even simulate empathy and understanding, but they do not understand what they produce.”

AI can copy how we talk. It can act caring. It still does not know what it is saying.

Here is the part I keep coming back to. The tech world reached the same place from the opposite side. Their version: AI does the easy part fast, so the hard part, the judgment, lands back on a person. The work does not vanish. It moves.

A priest and a tech podcast do not usually agree on anything. When they do, it is not a coincidence. It is a finding.

Some of what you would automate is how your team learns.

The Pope’s next point is the one most roadmaps miss:

“humanity flourishes not despite limitations, but often through them.”

In plain terms: the hard parts are often how we grow.

A job works the same way. Some struggle is just tedious, and a machine should take it. Some struggle is how a person becomes good at the work. Automate all of it, and the work ships faster while your people quietly get worse.

This argument has one real failure point, so I will name it. If AI gains lasting memory, a body, and real stakes in long relationships, the line blurs. Watch that line. But notice what would cross it: lived experience, not a higher score.

Before you automate a job, ask what the test never measured.

Stop scoring on “can AI beat the human.” Ask three smaller questions about each thing you plan to automate:

Where does a person have to understand the output, not just pass it on? That step is where your risk lives.
What are you giving up: judgment, a hard-won skill, a human touch? Decide that on purpose, before a postmortem decides it for you.
Which hard parts teach your team the most? Protect those. Automate the busywork.

And one rule for the calls you cannot take back: if you cannot see how AI reached its answer, do not give it the final say.

AI aced the written test. It was always going to.

Whether to hand it the keys was never on the test.

That part is still yours.

Source Notes

The encyclical and its exact words. Magnifica Humanitas, Pope Leo XIV, issued 15 May 2026, “On Safeguarding the Human Person in the Time of Artificial Intelligence.” Paragraphs 99 and 118 are quoted word for word; the irreversible-decision rule is grounded in paragraph 198. External source: https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html

Product Theatre

Discussion about this post

Ready for more?