From IEEE Spectrum, August 2:
Can a Large Language Model Recognize Itself?
Not quite—but it could allow LLMs to game interactions and extract sensitive info
Given the uncannily human capabilities of the most powerful AI chatbots, there’s growing interest in whether they show signs of self-awareness. Besides the interesting philosophical implications, there could be significant security consequences if they did, according to a team of researchers in Switzerland. That’s why the team has devised a test to see if a model can recognize its own outputs.
The idea that large language models (LLMs) could be self-aware has largely been met with skepticism by experts in the past. Google engineer Blake Lemoine’s claim in 2022 that the tech giant’s LaMDA model had become sentient was widely derided and he was swiftly edged out of the company. But more recently, Anthropic’s Claude 3 Opus caused a flurry of discussion after supposedly displaying signs of self-awareness when it caught out a trick question from researchers. And it’s not just researchers who are growing more credulous: A recent paper found that a majority of ChatGPT users attribute at least some form of consciousness to the chatbot.
The question of whether AI models have self-awareness isn’t just a philosophical curiosity either. Given that most people who are using LLMs are using those provided by a handful of tech companies, these models are highly likely to come across outputs produced by instances of themselves. If an LLM is able to recognize that fact, says Tim Davidson, a Ph.D. student at the École Polytechnique Fédérale de Lausanne in Switzerland, it could potentially be exploited by the model or its user to extract private information from others....
....MUCH MORE