Saturday, July 19, 2025

"Our Spreadsheet Overlords"

From The Ideas Letter, May 29, 2025:

Two years have passed since OpenAI released ChatGPT and the panic set in. Two years of above-the-fold headlines about “AI”—a subaltern specialty topic and the preserve of goofy sci-fi films for some 80 years prior—and two years of confusing, rank speculation about “artificial general intelligence” (AGI), a loosely defined idea of “human-level” yet machinic reasoning. Large Language Models, or LLMs, capture and generate what we have long taken to be an essentially human thing, language, shaking our historical sense of our own species to the core. But their abilities are matched by a lack of intelligence, and even a lack of the consistency we have long expected from computing machines.  

As a new surge of AGI talk has taken over the airwaves in the third year of LLMs, a deeply revealing form of Actually Existing AI speaks against the hype: Elon Musk’s Department of Governmental Efficiency, a sloppy, violent-yet-banal attack on the codebase and massive personal data dragnet of the federal government. While we wait for AGI—and while we’re distracted by endless, ungrounded debates about it—the reality of modern AI is parading in plain sight in the form of the most boring constitutional crisis imaginable. Rather than machine intelligence, AI is an avant-garde form of digital bureaucracy, one that deepens our culture’s dependence on the spreadsheet.  

The discourse is providing cover for this disastrous attack. Kevin Roose, a tech columnist for the New York Times, recently explained why he’s “feeling the AGI.” (Unfortunately, Roose’s reasons seem to boil down to, “I live in San Francisco.”) Similarly, Ezra Klein, of the paper’s Opinion pages, thinks the government knows AGI is coming. And the statistician Nate Silver suggests we have to “come to grips with AI.” The internet ethnographer and journalist Max Read has dubbed this surge of AI believers the “AI backlash backlash,” a reaction to the anti-tech skepticism we’ve seen over the past few years. The position, according to Read, is that AI “is quite powerful and useful, and even if you hate that, lots of money and resources are being expended on it, so it’s important to take it seriously rather than dismissing it out of hand.” That’s a far cry from the derisive characterization of Large Language Models (LLMs) like ChatGPT as “stochastic parrots” (which remix and repeat human language) or “fancy autocomplete.” These systems are far more capable—and more dangerous—than the skeptics make them out to be. Dispelling the myth of their intelligence does not excuse us from paying close attention to their power.  

Rather than providing the much-vaunted innovation and efficiency associated with Silicon Valley, AI systems create more confusion than clarity. They are a coping mechanism for a global society that runs on digital data sets too vast to make sense of, too complex to disentangle manually. Feeding off a staggering amount of digitized data, they are a tool specified to that data and its tabular format. When we think of AI, we should think less of Terminator 2 and more of the TV show Severance, in which office workers search for “bad numbers” on the strength of vibes alone.  

An LLM is nothing more than a distilled matrix of values that represent words. The models we are all familiar with now—ChatGPT, Claude, Gemini, Grok—have many moving parts, but their core element is a large set of rows and columns that is the result of billions of dollars in training. The training data are on the order of 6 trillion –to 10 trillion tokens (including words, letters, and other marks like “&,” “-ing,” and “3”)— orders of magnitude more text than humans have ever used for any purpose—and they only exist today because of the planetary sprawl of the internet. Using all this training data, you’ll be able to make a bot that responds to human questions, retrieves information, generates poetry and memos and anything else you like, and effectively feels like magic. You’ll have an AI model that feels like AGI.  

If—as happened between early 2023 and late 2024—people stop feeling that magic, you can also then tweak your model. Instead of its just responding to prompts and queries, you can tell it to generate a bunch of responses and then print off its “thoughts” as it chooses the best one. This new model could do fun things, like fill an Instacart order or book a vacation. And those things are what agents do, so—after a new round of training and a new round of VC funding—everyone will be feeling AGI again. 

Two tendencies, alike in error, reign over AI discourse today. The one, as Read observes, is that critics deride AI as a tool of capitalism and a con put on by tech oligarchs, failing to explain its power. The other, which I’m going to call “the performance fallacy,” confuses benchmarks for intelligence. Until we move past this pas de deux of shallow analysis, we will not be able to confront the very real problem of AI today.  

The Performance Fallacy 

In 1950, Alan Turing proposed a simple way to determine if a machine could think: Ask it some questions. If you couldn’t figure out if you were talking to a machine or not, you should concede that it is intelligent. This game became known as the “Turing Test,” and no one, to my knowledge, has ever been satisfied by it. Turing’s idea was that when we decide someone else is intelligent, it’s not that we know this, it’s that we assume it. I don’t ask to see how your brain works to determine if you’re intelligent; I just think of you as a human. The definition of intelligence that comes from this isn’t a definition at all—and that’s why AI has been permanently split between two ways of understanding what Turing meant.  

The first way is according to the benchmark. Every new model that gets released today is tested on an endless series of performance thresholds with fancy acronym titles (ARC-AGI, a series of difficult puzzles, is a popular one these days). Each set of benchmark performances is compared to earlier attempts: A new model is said to score 87% where the previous best was 59%, even if no one can tell you what those percentages mean. If OpenAI’s 03 “reasoning” model scores 87% on ARC-AGI, does that mean it is 87% intelligent? Is “87% intelligent” a coherent idea? In the world of pure benchmark culture, such questions don’t matter and can’t really be asked. The system is optimizing for something that looks like what intelligent beings (humans) do, so there’s little reason for skepticism. The most extreme version of this benchmarking is arguably the Loebner Prize, a competition that ran for 30 years and awarded a large sum to the most convincing chatbot. Its benchmark for “intelligence” was taken from an offhanded comment of Turing’s: that a chatbot that fooled a human roughly two-thirds of the time would count as intelligent.  

But it’s not clear that Turing really intended for this, or any other, benchmark to determine what intelligence was or who counted as intelligent. In “Computing Machinery and Intelligence,” he concocted several exchanges between himself and a fictional future computer, in which he asked the machine to do math problems, play chess, and compose a poem about the Forth Bridge in Scotland. These transcripts of an imaginary set of conversations—alongside ideas like a machine needing to “enjoy strawberries and cream”—show that Turing was thinking of intelligence holistically. This second way of framing intelligence is negative and, maybe surprisingly, not technical at all. Conversation was the un-benchmarkable threshold. And even though LLMs can’t prove that they can enjoy anything, they can certainly say that they can, and in language that scrambles the very idea of the Turing Test in its benchmark form altogether.  

Benchmark culture adds to the vaudeville quality of tech today, with its demos, entertainer personalities, and gimmicks. All of the showmanship claims to be about performance. Your new iPhone is faster, better, stronger. Analytics makes everything from finance to sports better....

....MUCH MORE 

The author, Leif Weatherby, is an associate professor of German and the director of the Digital Theory Lab at New York University. He is the author of " Language Machines."