Wednesday, February 21, 2024

The Next Platform Mocked Nvidia: "Even Nvidia Can’t Get Enough H100s For Its Supercomputer" (NVDA)

Re: today's earnings release, as the analysts say on the conference calls, "Great quarter, guys."

The two key takeaways: growth in data center (mainly AI chips) revenue from $3.62 billion in the fourth quarter of 2023 to the just-reported $18.4 billion for Q4, 2024.

Secondly: profit margins are approaching those of software publishers or designers like ARM (okay, that's hyperbole, software and ARM can tag 90%+) 76.0% in Q4 '24 vs. 63.3% in Q4 '23.

That combination of incredibly fast top-line growth and expanding margins resulted in this little line: 
"GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago."
"Great quarter, guys."
And from TheNextPlatform, February 15:
 
Half Eos’d: Even Nvidia Can’t Get Enough H100s For Its Supercomputer

UPDATED: Getting your hands on an Nvidia “Hopper” H100 GPU is probably the most difficult thing in the world right now. And even the company that makes them is having to parcel them out carefully for its own internal uses, to the point where it looks like half of the Eos supercomputer that Nvidia was showing off last November running the MLPerf benchmarks was repurposed for some other machine, leaving the Eos machine that Nvidia is bragging about today in its original configuration but half of its peak.

It’s a weird time in the AI datacenter these days.

Apropos of nothing, Nvidia put out a blog and a video describing the Eos machine in what it considers detail to the regular press but which we think of as a coloring book version that comes with black, green, and yellow crayon, like the menus for the kids at Cracker Barrel.

The Eos machine was discussed as part of the Hopper GPU accelerator launch in March 2022, was installed later that year, and made it into the Number 9 position on the Top500 supercomputer rankings when the High Performance LINPACK benchmark was certified in a run for the November 2023 list.

The latest MLPerf machine learning benchmark for datacenter training and inference test was also unveiled at this time, and Nvidia bragged about having a machine with 10,752 H100 GPUs all lashed together with 4000 Gb/sec Quantum-2 NDR InfiniBand interconnects and called this the Eos system.

And we quote Nvidia itself: “Among many new records and milestones, one in generative AI stands out: Nvidia Eos — an AI supercomputer powered by a whopping 10,752 Nvidia H100 Tensor Core GPUs and Nvidia Quantum-2 InfiniBand networking — completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes.”....

....MUCH MORE

I assume, based on the way Nvidia presented the story that this is not some scarcity marketing ploy.