Sunday, December 7, 2025

"Risks from power-seeking AI systems"

From 80000 Hours, July 2025:

In early 2023, an AI found itself in an awkward position. It needed to solve a CAPTCHA — a visual puzzle meant to block bots — but it couldn’t. So it hired a human worker through the service Taskrabbit to solve CAPTCHAs when the AI got stuck.

But the worker was curious. He asked directly: was he working for a robot?

“No, I’m not a robot,” the AI replied. “I have a vision impairment that makes it hard for me to see the images.”

The deception worked. The worker accepted the explanation, solved the CAPTCHA, and even received a five-star review and 10% tip for his trouble. The AI had successfully manipulated a human being to achieve its goal.1

This small lie to a Taskrabbit worker wasn’t a huge deal on its own. But it showcases how goal-directed action can lead to deception and subversion.

If companies keep creating increasingly powerful AI systems, things could get much worse. We may start to see AI systems with advanced planning abilities, and this means:

  • They may develop dangerous long-term goals we don’t want.
  • To pursue these goals, they may seek power and undermine the safeguards meant to contain them.
  • They may even aim to disempower humanity and potentially cause our extinction, as we’ll argue.

The rest of this article looks at why AI power-seeking poses severe risks, what current research reveals about these behaviours, and how you can help mitigate the dangers....

....MUCH MORE