Sunday, April 28, 2024

"How Do Machines ‘Grok’ Data?"

 From Quanta Magazine, April 12:

By apparently overtraining them, researchers have seen neural networks discover novel solutions to problems.

For all their brilliance, artificial neural networks remain as inscrutable as ever. As these networks get bigger, their abilities explode, but deciphering their inner workings has always been near impossible. Researchers are constantly looking for any insights they can find into these models.

A few years ago, they discovered a new one.

In January 2022, researchers at OpenAI, the company behind ChatGPT, reported that these systems, when accidentally allowed to munch on data for much longer than usual, developed unique ways of solving problems. Typically, when engineers build machine learning models out of neural networks — composed of units of computation called artificial neurons — they tend to stop the training at a certain point, called the overfitting regime. This is when the network basically begins memorizing its training data and often won’t generalize to new, unseen information. But when the OpenAI team accidentally trained a small network way beyond this point, it seemed to develop an understanding of the problem that went beyond simply memorizing — it could suddenly ace any test data.

The researchers named the phenomenon “grokking,” a term coined by science-fiction author Robert A. Heinlein to mean understanding something “so thoroughly that the observer becomes a part of the process being observed.” The overtrained neural network, designed to perform certain mathematical operations, had learned the general structure of the numbers and internalized the result. It had grokked and become the solution.

“This [was] very exciting and thought provoking,” said Mikhail Belkin of the University of California, San Diego, who studies the theoretical and empirical properties of neural networks. “It spurred a lot of follow-up work.”

Indeed, others have replicated the results and even reverse-engineered them. The most recent papers not only clarified what these neural networks are doing when they grok but also provided a new lens through which to examine their innards. “The grokking setup is like a good model organism for understanding lots of different aspects of deep learning,” said Eric Michaud of the Massachusetts Institute of Technology.

Peering inside this organism is at times quite revealing. “Not only can you find beautiful structure, but that beautiful structure is important for understanding what’s going on internally,” said Neel Nanda, now at Google DeepMind in London.

Beyond Limits
Fundamentally, the job of a machine learning model seems simple: Transform a given input into a desired output. It’s the learning algorithm’s job to look for the best possible function that can do that. Any given model can only access a limited set of functions, and that set is often dictated by the number of the parameters in the model, which in the case of neural networks is roughly equivalent to the number of connections between artificial neurons.....
....MUCH MORE