From The Register, January 2:
2024 sure looks like an exciting year for datacenter silicon
Loads of chips from Nvidia, AMD, Intel on the way – and very probably some surprises along the way as well
Comment The new year is already shaping up to be one of the most significant in terms of datacenter silicon we've seen in a while. Every major chip house is slated to refresh their CPU and/or GPU product lines over the coming twelve months.
Nvidia has a slew of new accelerators, GPU architectures, and networking kit planned for 2024. Intel will launch arguably its most compelling Xeons in years alongside new Habana Gaudi AI chips. Meanwhile AMD, riding high on its MI300-series launch, is slated to bring its 5th-gen Epyc processors to market.
In no particular order, let's dig into some of the bigger datacenter chip launches on our radar in 2024. Oh, and if you think we missed one, let us know in the comments or email.
Nvidia's HBM3e-toting H200 AI chips arrive
Among the first new chips to hit the market in 2024 will be Nvidia's H200 accelerators. The GPU is essentially a refresh of the venerable H100.You might expect the latest chip to offer a performance uplift over its older sibling, yet it won't in the conventional sense. Dig through the spec sheet and you'll see the floating point performance is identical to that of the H100. Instead, the part's performance uplift — Nvidia claims as much as double the perf for LLMs including Llama 70B — is down to the chip's HBM3e memory stacks.
We're promised the H200 will be available with up to 141 GB of HBM3e memory that's good for a whopping 4.8TB/s of bandwidth. With the rise in popularity of LLMs – such as Meta's Llama 2, Falcon 40B, Stable Diffusion and others – memory capacity and bandwidth have an outsized impact on inference performance — namely how big a model can you fit into a single accelerator or server, and how many requests can you handle simultaneously.
As we recently explored in our analysis of AMD and Nvidia's benchmarking debacle, FLOPS aren't nearly as important as memory capacity and bandwidth when it comes to these kinds of AI workloads....
....MUCH MORE