Thursday, April 1, 2021

Chips: "Arm’s v9 Architecture Explains Why Nvidia Needs To Buy It"

From The Next Platform, March 30:

Many of us have been wracking our brains why Nvidia would spend a fortune – a whopping $40 billion – to acquire Arm Holdings, a chip architecture licensing company that generates on the order of $2 billion in sales – since the deal was rumored back in July 2020. As we sat and listened to the Arm Vision Day rollout of the Arm V9 architecture, which will define processors ranging from tiny embedded controllers in IoT device all the way up to massive CPUs in the datacenter, we may have figured it out.

There are all kinds of positives, as we pointed out in our original analysis ahead of the deal, in our analysis the day the deal was announced in September 2020, and in a one-on-one conversation with Nvidia co-founder and chief executive officer Jensen Huang in October 2020.

We have said for a long time that we believe that Nvidia needs to control its own CPU future, and even joked with Huang that it didn’t need to have to buy all of Arm Holdings to make the best Arm server CPU, to which he responded that this was truly a once-in-a-lifetime opportunity to create value and push all of Nvidia’s technologies – its own GPUs for compute and graphics and Mellanox network interface chips, DPU processors, and switch ASICs – through an Arm licensing channel to make them all as malleable and yet standardized as the Arm licensing model not only allows, but encourages.

Huang would be the first to tell you that Nvidia can’t create every processor for every situation, and indeed no single company can. And that is why the Arm ecosystem needs to not only be protected, it needs to be cultivated and extended in a way that only a relatively big company like Nvidia can make happen. (Softbank is too distracted by the financial woes of its investments around the globe that have gone bad and basically has to sell Arm to fix its balance sheet. Which is a buying opportunity for Nvidia, which is only really spending $12 billion in cash to get control of Arm; the rest is funny money from stock market capitalization, which in a sense is “free” money that Nvidia can spend to fill in the remaining $28 billion.)

We have sat through these interviews, and chewed on all of this, and chocked it up to yet another tech titan having enough dough to do a big thing. But, as we watched the Vision Day presentations by Arm chief executive officer Simon Segars and the rest of the Arm tech team, they kept talking about pulling more vector math, matrix math, and digital signal processing onto the forthcoming Arm V9 architecture. And suddenly, it all finally became clear: Nvidia and Arm both believe that in a modern, massively distributed world all kinds of compute are going to be tailored to run analytics, machine learning, and other kinds of data manipulation and transaction processing or preprocessing as locally as possible and a single, compatible substrate is going to be the best answer to creating this malleable compute fabric for a lot of workloads. What this necessarily means is that both companies absolutely believe that in many cases, the applicability of a hybrid CPU-GPU compute model will not and cannot work.

In other words, Nvidia’s GPU compute business has a limit to its expansion, and perhaps it is a lot lower than many of us have been thinking. The pendulum will be swinging back to purpose built CPUs that have embedded vector and matrix capabilities, highly tuned for specific algorithms. This will be specifically true for intermediate edge computing and endpoint IoT devices that need to compute locally because shipping data back to be processed in a datacenter doesn’t make sense at all, either technically or economically.

Jem Davies, an Arm Fellow and general manager of its machine learning division, gave a perfect example of the economic forces that are driving compute out of the datacenter and into a more diffuse data galaxy, as we called it three years ago.

“In the Armv9 decade, partners will create a future enabled by Arm AI with more ML on device,” Davies explained. “With over eight billion voice assistive devices. We need speech recognition on sub-$1 microcontrollers. Processing everything on server just doesn’t work, physically or financially. Cloud computing bandwidth aren’t free and recognition on device is the only way. A voice activated coffee maker using cloud services used ten times a day would cost the device maker around $15 per year per appliance. Computing ML on device also benefits latency, reliability and, crucially, security.”

To bring this on home, if the coffee maker with voice recognition was in use for four years, the speech recognition cost for chewing on data back in the Mr Coffee datacenter would wipe out the entire revenue stream from that coffee maker, but that same function, if implemented on a device specifically tuned for this very precise job, could be done for under $1 and would not affect the purchase price significantly. And, we think, the coffee maker manufacturer could probably charge a premium for the voice recognition and recoup some or all of the investment in the technology added to the pot over a reasonably short time until it just became normal. Like having a clock and timer in a coffee maker did several decades ago, allowing us all to wake up to a hot cup of java or joe or whatever you call it in the morning by staging the ground coffee beans and water the night before....

....MUCH MORE