From the University of Chicago, Booth School of Business, Chicago Booth Review, December 9:
Using a Bayesian framework helps explain ‘double descent.’
In recent years, something unexpected has been happening in artificial intelligence. Modern AI appears to be breaking a rule that statisticians have preached for nearly a century: Keep models in a Goldilocks zone. They should be complex enough to capture patterns in the data but still simple enough that they don’t become too tailored to their training examples.....MUCH MORE
But is it possible the rule isn’t actually being broken? Chicago Booth’s Nicholas Polson and George Mason University’s Vadim Sokolov find that modern AI’s success can be understood within established Bayesian statistical principles.
Traditionally, model performance has followed a predictable U-shaped curve. At first, increasing a model’s size reduces test error—the error that can crop up when a model is applied to unseen data. But as the model becomes more complex and tightly fitted to its training data, the test-error rate will start rising again. Practitioners aim to find the sweet spot for model complexity at the bottom of the U shape before the error begins to rise again.
As its complexity increases, a model can eventually reach the interpolation threshold, where its parameters—think of them as dials on a recording studio’s mixing board, each shaping a particular element of how the model processes data—equal the number of training examples. At this point, the model essentially memorizes its training data and, by conventional logic, should fail when applied to new data. It acts like a student who memorizes practice test questions but fails to learn concepts.
But with recent advances in AI technology adding even more complexity and pushing models far beyond the interpolation threshold, researchers are observing that some, unexpectedly, will see their error rate fall a second time —a phenomenon known as “double descent” that has them flummoxed. This occurrence was first formally documented in 2019 by a team of researchers including University of California at San Diego’s Mikhail Belkin working with linear regression models, and then later observed in some generative AI systems. It baffles researchers because it seems to conflict with Occam’s razor, the idea that simpler explanations are usually better.
Polson and Sokolov argue, using Bayesian statistical methods, that this seeming paradox makes mathematical sense when viewed through the right analytical framework....