Friday, September 19, 2025

"OpenAI’s research on AI models deliberately lying is wild"

From TechCrunch via Yahoo Finance, September 18:

Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its latest quantum chip indicated multiple universes exist. Or when Anthropic gave its AI agent Claudius a snack vending machine to run and it went amok, calling security on people and insisting it was human. 

This week, it was OpenAI’s turn to raise our collective eyebrows.

OpenAI released on Monday some research that explained how it’s stopping AI models from “scheming.” It’s a practice in which an “AI behaves one way on the surface while hiding its true goals,” OpenAI defined in its tweet about the research. 

In the paper, conducted with Apollo Research, researchers went a bit further, likening AI scheming to a human stock broker breaking the law to make as much money as possible. The researchers, however, argued that most AI “scheming” wasn’t that harmful. “The most common failures involve simple forms of deception — for instance, pretending to have completed a task without actually doing so,” they wrote.

The paper was mostly published to show that “deliberative alignment⁠” — the anti-scheming technique they were testing — worked well.

But it also explained that AI developers haven’t figured out a way to train their models not to scheme. That’s because such training could actually teach the model how to scheme even better to avoid being detected.

“A major failure mode of attempting to ‘train out’ scheming is simply teaching the model to scheme more carefully and covertly,” the researchers wrote.

Perhaps the most astonishing part is that, if a model understands that it’s being tested, it can pretend it’s not scheming just to pass the test, even if it is still scheming. “Models often become more aware that they are being evaluated. This situational awareness can itself reduce scheming, independent of genuine alignment,” the researchers wrote.

It’s not news that AI models will lie. By now most of us have experienced AI hallucinations, or the model confidently giving an answer to a prompt that simply isn’t true. But hallucinations are basically presenting guesswork with confidence, as OpenAI research released earlier this month documented.

Scheming is something else. It’s deliberate....

....MUCH MORE