Thursday, November 21, 2024

"Jensen Huang says the 3 elements of AI scaling are all advancing. Nvidia's Blackwell demand will prove it." (NVDA)

This is a serious issue over the next couple years, some recent links after the jump.

From Business Insider, November 21:

  • Reports on an AI progress slowdown raised concerns about model scaling on Nvidia's earnings call.
  • An analyst questioned if models are plateauing and if Nvidia's Blackwell chips could help.
  • Huang said there are three elements in scaling and that each continues to advance.

If the foundation models driving the panicked rush toward generative AI stop improving, Nvidia will have a problem. Silicon Valley's whole value proposition is the continued demand for more and more computing power.

Concerns about scaling laws started recently with reports that OpenAI's progress in improving its models was slowing. But Jensen Huang isn't worried.

The Nvidia CEO got the question Wednesday, on the company's third-quarter earnings call. Has progress stalled? And could the power of Nvidia's Blackwell chips start it up again?

"Foundation model pre-training scaling is intact and it's continuing," Huang said.
He added that scaling isn't as narrow as many think.

In the past, it may have been true that models only improved with more data and more pre-training. Now, AI can generate synthetic data and check its own answer to —in a way— train itself. But, we're running out of data that hasn't already been ingested by these models, and the impact of synthetic data for pre-training is debatable.

As the AI ecosystem matures, tools for improving models are gaining importance. The first generation of post-training improvement for models came from armies of humans checking AI's responses one by one.

Huang shouted out OpenAI's Strawberry or o1 model, which uses more modern strategies like "chain of thought reasoning" and "multi-path planning." These are both tactics that encourage the models to think longer and in a more step-by-step fashion so that the responses are more considered.

"The longer it thinks, the better and higher quality answer it produces," Huang said.
Pre-training, post-training improvements, and new reasoning strategies all improve models, Huang said. Of course, if the model is doing more computing to answer the same fundamental question, that's where higher-powered compute is necessary — especially since users want their responses just as fast, if not faster....

....MORE