Saturday, January 29, 2022

"Researchers Build AI That Builds AI"

From Quanta Magazine:

By using hypernetworks, researchers can now preemptively fine-tune artificial neural networks, saving some of the time and expense of training.

Artificial intelligence is largely a numbers game. When deep neural networks, a form of AI that learns to discern patterns in data, began surpassing traditional algorithms 10 years ago, it was because we finally had enough data and processing power to make full use of them.

Today’s neural networks are even hungrier for data and power. Training them requires carefully tuning the values of millions or even billions of parameters that characterize these networks, representing the strengths of the connections between artificial neurons. The goal is to find nearly ideal values for them, a process known as optimization, but training the networks to reach this point isn’t easy. “Training could take days, weeks or even months,” said Petar Veličković, a staff research scientist at DeepMind in London.

That may soon change. Boris Knyazev of the University of Guelph in Ontario and his colleagues have designed and trained a “hypernetwork” — a kind of overlord of other neural networks — that could speed up the training process. Given a new, untrained deep neural network designed for some task, the hypernetwork predicts the parameters for the new network in fractions of a second, and in theory could make training unnecessary. Because the hypernetwork learns the extremely complex patterns in the designs of deep neural networks, the work may also have deeper theoretical implications.

For now, the hypernetwork performs surprisingly well in certain settings, but there’s still room for it to grow — which is only natural given the magnitude of the problem. If they can solve it, “this will be pretty impactful across the board for machine learning,” said Veličković.

Getting Hyper

Currently, the best methods for training and optimizing deep neural networks are variations of a technique called stochastic gradient descent (SGD). Training involves minimizing the errors the network makes on a given task, such as image recognition. An SGD algorithm churns through lots of labeled data to adjust the network’s parameters and reduce the errors, or loss. Gradient descent is the iterative process of climbing down from high values of the loss function to some minimum value, which represents good enough (or sometimes even the best possible) parameter values.

But this technique only works once you have a network to optimize. To build the initial neural network, typically made up of multiple layers of artificial neurons that lead from an input to an output, engineers must rely on intuitions and rules of thumb. These architectures can vary in terms of the number of layers of neurons, the number of neurons per layer, and so on....

....MUCH MORE