From Knowledge@Wharton, January 14, 2025:
New research from Wharton's Philip Tetlock finds that combining predictions from large language models can achieve accuracy on par with human forecasters.
Decision-makers have long relied on the “wisdom of the crowd” — the idea that combining many people’s judgments often leads to better predictions than any individual’s guess. But what if the crowd isn’t human?
New research from Wharton management professor Philip Tetlock finds that combining predictions from multiple artificial intelligence (AI) systems, known as large language models (LLMs), can achieve accuracy on par with human forecasters. This breakthrough offers a cheaper, faster alternative for tasks like predicting political outcomes or economic trends.
“What we’re seeing here is a paradigm shift: AI predictions aren’t just matching human expertise — they’re changing how we think about forecasting entirely,” said Tetlock.
Dubbed as the “wisdom of the silicon crowd” by the Wharton academic and his co-authors — Philipp Schoenegger of London School of Economics, independent researcher Indre Tuminauskaite, and Peter Park from Massachusetts Institute of Technology — this approach highlights how groups of AI systems can provide reliable predictions about the future.
By pooling predictions from multiple LLMs, the researchers present a practical method for organizations to access high-quality forecasting without relying solely on expensive teams of human prognosticators.
“This isn’t about replacing humans, however,” Tetlock said, “it’s about making predictions smarter, faster, and more accessible.”
How Do AI Predictions Work?“AI predictions aren’t just matching human expertise — they’re changing how we think about forecasting entirely.”— Philip Tetlock
Individually, AI models like GPT-4, made by Microsoft-backed OpenAI, have struggled with forecasting. Previous studies revealed that their predictions were often no better than random guesses. However, Tetlock’s paper, “Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy,” found that combining predictions from multiple models significantly boosted their accuracy.So how does it work? The magic lies in how errors balance out. Just as human crowds average out individual biases, combining AI models cancels out inconsistencies in their predictions. Each model brings a slightly different perspective, much like human forecasters with varied expertise and experiences. “Just how human crowds balance individual biases, AI ensembles turn competing perspectives into consensus,” Tetlock said.
His study also found that AI predictions were greatly improved — between 17% and 28% — when informed by human input, such as insights from forecasting tournaments, where people compete to predict future events accurately. These competitions provide valuable, real-time data that AI systems can incorporate into their predictions.
“The best forecasts come when human intuition meets machine precision,” said Tetlock....
....MUCH MORE
Previously on Professor Tetlock, the CIA and the "Shoulda seen it coming" series of posts.