Saturday, March 21, 2026

"Nvidia Finally Admits Why It Shelled Out $20 Billion For Groq" (and Senator Warren swings by) NVDA

From The Next Platform, March 17:

Back in late December, Nvidia did a $20 billion “acquihire” of most of the development team at Groq and licensed the technology underlying its LPU dataflow engines for doing AI inference. We expected for Nvidia to move fast to deploy the tensor streaming processors created by Jonathan Ross, the ex-Googler who created a fully-scheduled, programmable tensor processing unit after he left the search engine giant. When the GenAI boom took off, these were renamed Language Processing Units, but the architecture did not change. Now, Nvidia is working with Samsung to bring the third generation LP30 chips to market, which Nvidia co-founder and chief executive officer Jensen Huang said in his opening keynote presentation at the GTC 2026 conference would happen in the second half of this year, and very likely in the third quarter.

Nvidia is not wasting any time, and that is because it does not have time to waste. Groq was going to start getting traction in low latency inference, just as Cerebras Systems has and that SambaNova Systems can do given their focus on ultra-high bandwidth SRAM memory against more modest compute to have zippy inference across a large number of compute engines. Where speed matters, these system makers and the dozens of upstarts who are trying to tackle inference at scale are so many piranhas swarming towards a fat cow standing in the Amazon (the river, not the bookseller and cloud utility). So Nvidia had to moooooooove. . . . 

Hence, the dramatic $20 billion acquihire of Groq, which could not be an outright acquisition because that might take a year or two and might not pass muster with the world’s antitrust regulators. And hence its immediate absorption into the Vera-Rubin platform. Which arguably should be called the Vera-Rubin-Groq platform, given that Huang said during his keynote that low latency, premium priced token generation should represent somewhere on the order of 25 percent of the compute in an AI cluster.

Remember that Rubin CPX large context compute engine that Nvidia preview back in September 2025? The one based on a variant of the Rubin architecture and equipped with cheaper and more available GDDR7 graphics memory?

“We discovered a great idea,” Ian Buck, vice president of AI and HPC at Nvidia, said on a call ahead of GTC 2026 going over the systems announcements. “Integrating the LPU and LPX into our Rubin platform to optimize the decode. That's where we're focused right now, and we're excited to be bringing that to market.”

In other words, scratch Rubin CPX.

Huang stacked up what we presume is the “Rubin” R200 GPU accelerator beside what we presume was called the “Alan-3” Groq LP30 inference accelerator. One is a general purpose, dynamically scheduled compute engine that is pretty good at batching up lots of inferences and pipelining them through HBM stacked memory with reasonable latency and supporting many concurrent users. (That would be the GPU.) And the other is a rack or more of fairly modest, inference-specific, statically scheduled, deterministic compute engines that work in concert to support a small number of users – that number is likely one most of the time – and distribute model weights (not data) across their aggregate SRAM in such a way that the response time for token generation scales down as you add more machines. The GPU is a thresher, the LPU is a speed demon. They can work together with the Dynamo inference stack to provide a more balanced pareto curve for inference performance across a range of throughout and latency.

Here are the feeds and speeds of the R200 and the LP30 chips:....

....MUCH MORE 

 And from the office of Senator Elizabeth Warren, March 19:

Jensen Huang
President and Chief Executive Officer
NVIDIA Corporation
2788 San Tomas Expressway
Santa Clara, CA 95051 

Dear Mr. Huang: 

We write to request additional information regarding the terms of NVIDIA’s recent deal with Groq, an artificial intelligence (AI) chip startup and NVIDIA competitor, to assess the agreement’s implications for competition in the AI chip sector. On December 24, 2025, NVIDIA and Groq announced an agreement under which NVIDIA will pay Groq $20 billion to acquire a non-exclusive license for Groq’s inference chip design technology and hire many of Groq’s key employees, including its CEO and president.1 The deal will give NVIDIA “all of Groq’s assets,”2 and appears to be structured to evade scrutiny by antitrust regulators.3 We are concerned that this takeover could stifle competition, further entrenching NVIDIA’s dominance in the AI chip industry and ceding our technological leadership to China.

NVIDIA currently dominates the market for the powerful graphics processing units (GPUs) used for developing and deploying advanced AI models.4 NVIDIA controls roughly 90% of the market for high-end data center GPUs, with buyers forced to wait for supplies because enterprise demand far exceeds supply.5 As of the end of Q3 2025, NVIDIA also controlled 92% of the market for personal computer (PC) GPUs used for computationally-intensive tasks such as video games and hosting smaller AI models, with Advanced Micro Devices (AMD) controlling 7% and Intel possessing just 1% market share.6  

And competition in the narketplace has shrunk over time. AMD controlled 12% of the PC GPU market at the end of Q1 2024, 7 8% at the end of Q1 2025, 8 and just 7% as of the most recent quarter. 9 Because GPUs are essential for advanced AI development, NVIDIA effectively controls which companies can compete in AI, and the entire AI industry is held hostage to NVIDIA’s product decisions and priorities. This market dominance, combined with the recent explosion of interest and investment in AI, has seen NVIDIA reach a market capitalization of $5 trillion, making it the most valuable company in history.10....

....MUCH MORE (6 page PDF)