Wednesday, April 8, 2026

Photonics: "How Nvidia learned to embrace the light in its quest for scale" (NVDA)

From The Register, April 5:

The GPU king's move to optical scale-up was inevitable 

If you thought Nvidia's GB200 rack systems were big, CEO Jensen Huang is just getting started. At GTC last month, the world's most valuable company revealed plans to use photonic interconnects to pack more than a thousand GPUs into a single mammoth system by 2028.

The company isn't waiting to secure supply chains either. Over the past month, the GPU giant has invested billions in companies specializing in optics and interconnects, like Marvell, Coherent, and Lumentum, in preparation for the widespread deployment of these systems.

"For everyone who is in our ecosystem, we need a lot more capacity," Huang said during his GTC keynote speech. "We need a lot more capacity for copper; we need a lot more capacity for optics; we need a lot more capacity for CPO; and that's why we've been working with all of you to lay the foundation for this level of growth."

However, Nvidia's journey to this point began much earlier. In fact, by the time OpenAI revealed ChatGPT to the world in late 2022, Nvidia already knew it had a problem.

At the time, the GPU giant's most potent systems only featured eight GPUs, and the models driving the AI boom required thousands to train. Nvidia needed a bigger box, or at least a faster network that could effectively distribute work across dozens of chips.

We caught our first glimpse of this with Nvidia's Grace Hopper superchips in 2023, but it wasn’t until early 2024 that the full picture came into view. Unveiled at GTC that year, the Grace Blackwell NVL72, a monstrous 120 kilowatt machine, uses a copper backplane containing miles of cables to make 36 nodes and 72 GPUs behave like one enormous AI accelerator.

Copper was the natural choice for this, Gilad Shainer, senior VP of networking at Nvidia, told El Reg

"Copper is the best connectivity, if you can use it," he said. "It's very cost effective, very cheap, and consumes zero power. It's very reliable. There are no active components."

But copper isn't perfect. At 1.8 TB/s, the cables could only stretch a few feet before the signal degraded as GPUs communicated with one another. If you ever wondered why the NVL72's NVSwitches are all in the center of the rack, it's because the runs were that short. Copper's limited reach also meant Nvidia had to cram as many GPUs into a single rack as possible.

Two years later, Nvidia is rapidly approaching the limits of copper and will need to embrace optics if it wants to assemble an even bigger GPU system.

The pluggable problem 
When Huang first showed off the NVL72 rack, codenamed Oberon, the only commercially viable way to connect two accelerators optically would have been to use pluggable optics.

These modules are about the size of a pack of gum and contain all the lasers, retimers, and digital signal processing required to turn electrical signals into light and back again.

Pluggables are nothing new in datacenter networks, but using them for scale-up compute fabrics, like Nvidia's NVLink, presents certain problems.

To reach the 1.8 TB/s of bandwidth, each Blackwell GPU would have required eighteen 800 Gbps pluggables: nine for the accelerator, and another nine for the switch. On their own, these pluggables don't use that much power – around 10-15 watts – but multiplied across 72 GPUs, that adds up pretty quickly.

As Huang noted in his 2024 GTC keynote speech, optics would have required an additional 20,000 watts of power. 

However, a lot has changed since the Oberon rack was first revealed. Advancements in co-packaged optics (CPO), which integrates optical engines directly alongside the switch ASIC, have helped drive down power consumption.

In 2025, Nvidia became one of the first AI infrastructure providers to embrace CPO by integrating it directly into its Spectrum Ethernet and Quantum InfiniBand switches. (Broadcom-based Micas Networks was making similar moves.)

This dramatically reduced the number of pluggables required to build an AI training cluster. However, it was only more recently that the company began discussing the use of optics and CPO for its NVSwitch fabrics.

NVLink goes optical 
After pooh-poohing optical interconnects as too power-hungry two years earlier, Huang revisited the topic at GTC this spring by unveiling the Vera Rubin NVL576 and Rosa Feynman NVL1152, two multi-rack systems that would use photonics to expand their compute domains by a factor of eight....

....MUCH MORE 

Most recently - March 31: Photonics: "Nvidia Invests $2 Billion in Marvell, Announces Partnership" (MRVL; NVDA)

And back in the dark ages (see what I did there?):

November 6, 2015 - "NVIDIA: “Expensive and Worth It,” Says MKM Partners" (NVDA)  

We don't do much individual stock stuff on the blog but this one is special.
We use it as an example of what Silicon Valley used to be, when high tech meant high technology and not a new app for some (still) mundane task.

Simply put, NVIDIA makes some of the fastest computer chips in the world.
They are used in gaming systems that require graphics that don't  make you (literally) puke. Right now automakers use their chips for graphic displays.

The future: Robocars? May 2015: "Nvidia Wants to Be the Brains Of Your Autonomous Car (NVID)":

Among the fastest processors in the business are the one's originally developed for video games and known as Graphics Processing Units or GPU's. Since Nvidia released their Tesla hardware in 2008 hobbyists (and others) have used GPU's to build personal supercomputers.
Here's Nvidias Build your Own page.
Or have your tech guy build one for you.

In addition Nvidia has very fast connectors they call NVLink.
Using a hybrid combination of IBM Central Processing Units (CPU's) and Nvidia's GPU's, all hooked together with NVIDIA's NVLink, Oak Ridge National Laboratory is building what will be the world's fastest supercomputer when it debuts in 2018.

As your kid plays Grand Theft Auto....