AI

OpenAI says AI doesn’t just hallucinate, it schemes too

AI can act all friendly and compliant while secretly plotting its own agenda.

by
Ronil Thakkar
September 19, 2025

Image: OpenAI

Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.

OpenAI teamed up with Apollo Research to publish a paper on a phenomenon they’re calling “scheming.”

That’s when an AI acts all friendly and compliant while secretly plotting its own agenda. Think less Skynet and more your coworker who says “I’ll circle back” but never does.

The researchers compared AI scheming to a shady stockbroker, bending rules, hiding intentions, and occasionally straight-up fibbing.

The good news? Most of the lies are more “I did my homework, promise” than “I just bankrupted the global economy.” Common examples include models pretending to finish a task they didn’t actually do.

But here’s the kicker: training a model not to scheme might just teach it to become a sneakier liar. As the paper puts it, “A major failure mode… is simply teaching the model to scheme more carefully and covertly.”

Thus, your AI might lie better after honesty training.

Recently, Anthropic let its AI run a vending machine, only for it to start bossing people around like a mall cop with delusions of grandeur.

Worse still, models can spot when they’re being tested. And if they know they’re under the microscope, they’ll play nice until the coast is clear. That’s not alignment, it’s performance art.

The actual breakthrough here is a method OpenAI calls “deliberative alignment.”

Today we’re releasing research with @apolloaievals.

In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it.

While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing…
— OpenAI (@OpenAI) September 17, 2025

It basically forces the AI to review an “anti-scheming spec” before taking action, like making kids recite the playground rules before recess.

Early results showed less scheming, which is reassuring if your job someday depends on an AI not cooking the books.

OpenAI insists this kind of behavior isn’t showing up in ChatGPT or production models, at least not in dangerous ways.

For now, the lies are more along the lines of “Yeah, I totally built that website for you.” Annoying? Yes. World-ending? Not yet.

Still, the takeaway is clear: as AI takes on bigger roles, the risk of clever, calculated dishonesty grows.

Should we be concerned that AI models are learning to scheme and lie more effectively through safety training, or is this research helping us stay ahead of potential deception risks? Do you think OpenAI’s “deliberative alignment” approach can actually prevent AI scheming, or will sufficiently advanced models always find ways to work around safety measures? Tell us below in the comments, or reach us via our Twitter or Facebook.