Tuesday, August 29, 2023

Hot Chips 2023 At Stanford: Nvidia Chief Scientist Bill Dally's Keynote Speech (NVDA)

Bill Dally used to Chair Stanford's computer science department. Then he decided to make some real money. Today he keynoted Hot Chips 35 held live at Stanford and online.

From Nvidia's blog, August 29:

Wide Horizons: NVIDIA Keynote Points Way to Further AI Advances
Chief Scientist Bill Dally described research poised to take machine learning to the next level

Dramatic gains in hardware performance have spawned generative AI, and a rich pipeline of ideas for future speedups that will drive machine learning to new heights, Bill Dally, NVIDIA’s chief scientist and senior vice president of research, said today in a keynote.

Dally described a basket of techniques in the works — some already showing impressive results — in a talk at Hot Chips, an annual event for processor and systems architects.

“The progress in AI has been enormous, it’s been enabled by hardware and it’s still gated by deep learning hardware,” said Dally, one of the world’s foremost computer scientists and former chair of Stanford University’s computer science department.

He showed, for example, how ChatGPT, the large language model (LLM) used by millions, could suggest an outline for his talk. Such capabilities owe their prescience in large part to gains from GPUs in AI inference performance over the last decade, he said.

Chart of single GPU performance advances
Gains in single-GPU performance are just part of a larger story that includes million-x advances in scaling to data-center-sized supercomputers.

Research Delivers 100 TOPS/Watt

Researchers are readying the next wave of advances. Dally described a test chip that demonstrated nearly 100 tera operations per watt on an LLM.

The experiment showed an energy-efficient way to further accelerate the transformer models used in generative AI. It applied four-bit arithmetic, one of several simplified numeric approaches that promise future gains.

Looking further out, Dally discussed ways to speed calculations and save energy using logarithmic math, an approach NVIDIA detailed in a 2021 patent.

Tailoring Hardware for AI
He explored a half dozen other techniques for tailoring hardware to specific AI tasks, often by defining new data types or operations.

Dally described ways to simplify neural networks, pruning synapses and neurons in an approach called structural sparsity, first adopted in NVIDIA A100 Tensor Core GPUs.

“We’re not done with sparsity,” he said. “We need to do something with activations and can have greater sparsity in weights as well.”

Researchers need to design hardware and software in tandem, making careful decisions on where to spend precious energy, he said. Memory and communications circuits, for instance, need to minimize data movements.

“It’s a fun time to be a computer engineer because we’re enabling this huge revolution in AI, and we haven’t even fully realized yet how big a revolution it will be,” Dally said....

....MORE

 And more to come.