Monday, March 18, 2024

The Register Weighs In On Nvidia (NVDA)

From The Register, March 18:

Nvidia turns up the AI heat with 1,200W Blackwell GPUs
Five times the performance of the H100, but you'll need liquid cooling to tame the beast

For all the saber-rattling from AMD and Intel, Nvidia remains, without question, the dominant provider of AI infrastructure. With today's debut of the Blackwell GPU architecture during CEO Jensen Huang's GTC keynote, it aims to extend its technical lead – in both performance and power consumption.

Given Nvidia's rapid rise in the wake of the generative AI boom, the stakes couldn't be higher. But at least on paper, Blackwell – the successor to Nvidia's venerable Hopper generation – doesn't disappoint. In terms of raw FLOPS, the GPU giant's top-specced Blackwell chips are roughly 5x faster.

Of course, the devil is in the details, and getting this performance will depend heavily on a number of factors. While Nvidia claims the new chip will do 20 petaFLOPS, that's only when using its new 4-bit floating point data type and opting for liquid-cooled servers. Looking at gen-on-gen FP8 performance, the chip is only about 2.5x faster than the H100.

At the time of writing, Blackwell encompasses three parts: the B100, B200, and Grace-Blackwell Superchip (GB200). Presumably there will be other Blackwell GPUs at some point – like the previously teased B40, which use a different die, or rather dies – but for now the three chips share the same silicon.

And it's this silicon which is at least partially responsible for Blackwell's performance gains this generation. Each GPU is actually two reticle-limited compute dies, tied together via a 10TB/sec NVLink-HBI (high-bandwidth interface) fabric, which allows them to function as a single accelerator. The two compute dies are flanked by a total of eight HBM3e memory stacks, with up to 192GB of capacity and 8TB/sec of bandwidth. And unlike H100 and H200, we're told the B100 and B200 have the same memory and GPU bandwidth.

Nvidia is hardly the first to take the chipset – or in its preferred parlance "multi-die" – route. AMD's MI300-series accelerators – which we looked at in December – are objectively more complex and rely on both 2.5D and 3D packaging tech to stitch together as many as 13 chiplets into a single part. Then there's Intel's GPU Max parts, which use even more chiplets.

AI's power and thermal demands hit home
Even before Blackwell's debut, datacenter operators were already feeling the heat associated with supporting massive clusters of Nvidia's 700W H100.

With twice the silicon filling out Nvidia's latest GPU, it should come as some surprise that it runs only a little hotter — or at least it can, given the ideal operating conditions.

With the B100, B200, and GB200, the key differentiator comes down to power and performance rather than memory configuration. According to Nvidia, the silicon can actually operate between 700W and 1,200W, depending on the SKU and type of cooling used.

Within each of these regimes, the silicon understandably performs differently. According to Nvidia, air-cooled HGX B100 systems are able to squeeze 14 petaFLOPS of FP4 per GPU, while consuming the same 700W power target as the H100. This means if your datacenter can already handle Nvidia's DGX H100 systems, you shouldn't run into trouble adding a couple of B100 nodes to your cluster.

Where things get interesting is with the B200. In an air-cooled HGX or DGX configuration, each GPU can push 18 petaFLOPS of FP4 while sucking down a kilowatt. According to Nvidia its DGX B200 chassis with eight B200 GPUs will consume roughly 14.3kW – something that's going to require roughly 60kW of rack power and thermal headroom to handle.

For newer datacenters built with AI clusters in mind, this shouldn't be an issue – but for existing facilities it may not be so easy.

Speaking of AI datacenters, reaching Blackwell's full potential will require switching over to liquid cooling. In a liquid-cooled configuration, Nvidia says the chip can output 1,200W of thermal energy when pumping out the full 20 petaFLOPS of FP4.

All of this is to say that liquid cooling isn't a must this generation but – if you want to get the most out of Nvidia's flagship silicon – you're going to need it....