From The Register, April 5:
The GPU king's move to optical scale-up was inevitable
If you thought Nvidia's GB200 rack systems were big, CEO Jensen Huang
is just getting started. At GTC last month, the world's most valuable
company revealed plans to use photonic interconnects to pack more than a
thousand GPUs into a single mammoth system by 2028.
The company isn't waiting to secure supply chains either. Over the
past month, the GPU giant has invested billions in companies
specializing in optics and interconnects, like Marvell, Coherent, and
Lumentum, in preparation for the widespread deployment of these systems.
"For everyone who is in our ecosystem, we need a lot more capacity," Huang said during his GTC keynote speech.
"We need a lot more capacity for copper; we need a lot more capacity
for optics; we need a lot more capacity for CPO; and that's why we've
been working with all of you to lay the foundation for this level of
growth."
However, Nvidia's journey to this point began much earlier. In fact,
by the time OpenAI revealed ChatGPT to the world in late 2022, Nvidia
already knew it had a problem.
At the time, the GPU giant's most potent systems only featured eight
GPUs, and the models driving the AI boom required thousands to train.
Nvidia needed a bigger box, or at least a faster network that could
effectively distribute work across dozens of chips.
We caught our first glimpse
of this with Nvidia's Grace Hopper superchips in 2023, but it wasn’t
until early 2024 that the full picture came into view. Unveiled at GTC
that year, the Grace Blackwell NVL72, a monstrous 120 kilowatt machine, uses a copper backplane containing miles of cables to make 36 nodes and 72 GPUs behave like one enormous AI accelerator.
Copper was the natural choice for this, Gilad Shainer, senior VP of networking at Nvidia, told El Reg.
"Copper is the best connectivity, if you can use it," he said. "It's
very cost effective, very cheap, and consumes zero power. It's very
reliable. There are no active components."
But copper isn't perfect. At 1.8 TB/s, the cables could only stretch a
few feet before the signal degraded as GPUs communicated with one
another. If you ever wondered why the NVL72's NVSwitches are all in the
center of the rack, it's because the runs were that short. Copper's
limited reach also meant Nvidia had to cram as many GPUs into a single
rack as possible.
Two years later, Nvidia is rapidly approaching the limits of copper
and will need to embrace optics if it wants to assemble an even bigger
GPU system.
The pluggable problem
When Huang first showed off the NVL72 rack, codenamed Oberon, the only commercially viable way to connect two accelerators optically would have been to use pluggable optics. These modules are about the size of a pack of gum and contain all the
lasers, retimers, and digital signal processing required to turn
electrical signals into light and back again.
Pluggables are nothing new in datacenter networks, but using them for
scale-up compute fabrics, like Nvidia's NVLink, presents certain
problems.
To reach the 1.8 TB/s of bandwidth, each Blackwell GPU would have
required eighteen 800 Gbps pluggables: nine for the accelerator, and
another nine for the switch. On their own, these pluggables don't use
that much power – around 10-15 watts – but multiplied across 72 GPUs,
that adds up pretty quickly.
As Huang noted in his 2024 GTC keynote speech, optics would have required an additional 20,000 watts of power.
However, a lot has changed since the Oberon rack was first revealed.
Advancements in co-packaged optics (CPO), which integrates optical
engines directly alongside the switch ASIC, have helped drive down power
consumption.
In 2025, Nvidia became one of the first AI infrastructure providers to embrace CPO
by integrating it directly into its Spectrum Ethernet and Quantum
InfiniBand switches. (Broadcom-based Micas Networks was making similar
moves.)
This dramatically reduced the number of pluggables required to build
an AI training cluster. However, it was only more recently that the
company began discussing the use of optics and CPO for its NVSwitch
fabrics.
NVLink goes optical
After pooh-poohing optical interconnects as too power-hungry two years earlier, Huang revisited the topic at GTC this spring by unveiling the Vera Rubin NVL576 and Rosa Feynman NVL1152, two multi-rack systems that would use photonics to expand their compute domains by a factor of eight....
....MUCH MORE
Most recently - March 31: Photonics: "Nvidia Invests $2 Billion in Marvell, Announces Partnership" (MRVL; NVDA)
And back in the dark ages (see what I did there?):
November 6, 2015 - "NVIDIA: “Expensive and Worth It,” Says MKM Partners" (NVDA)
We don't do much individual stock stuff on the blog but this one is special.
We use it as an example of what Silicon Valley used to be, when high tech meant high technology and not a new app for some (still) mundane task.
Simply put, NVIDIA makes some of the fastest computer chips in the world.
They are used in gaming systems that require graphics that don't make you (literally) puke. Right now automakers use their chips for graphic displays.
The future: Robocars? May 2015: "Nvidia Wants to Be the Brains Of Your Autonomous Car (NVID)":
Among the fastest processors in the business are the one's originally
developed for video games and known as Graphics Processing Units or
GPU's. Since Nvidia released their Tesla hardware in 2008 hobbyists (and
others) have used GPU's to build personal supercomputers.
Here's Nvidias Build your Own page.
Or have your tech guy build one for you.
In addition Nvidia has very fast connectors they call NVLink.
Using a hybrid combination of IBM Central Processing Units (CPU's) and
Nvidia's GPU's, all hooked together with NVIDIA's NVLink, Oak Ridge National
Laboratory is building what will be the world's fastest supercomputer
when it debuts in 2018.
As your kid plays Grand Theft Auto....