Friday, January 5, 2024

AI: "Inside Tesla’s Innovative And Homegrown 'Dojo' AI Supercomputer" (TSLA)

It really is a big deal that a company can afford to spend over a billion dollars to build their own supercomputer and it really is a big deal that the same company has all the training data from the billions of miles of real-world driving and it really is a great example of the concept of advantage flywheels and hyper-pareto distribution of rewards, i.e. the rich get richer.

Whether it is going to open-up the $10 trillion addressable market and add the $500 billion of market cap that Morgan Stanley foresees is still an open question.

From The Next Platform, August 23, 2023:

How expensive and difficult does hyperscale-class AI training have to be for a maker of self-driving electric cars to take a side excursion to spend how many hundreds of millions of dollars to go off and create its own AI supercomputer from scratch? And how egotistical and sure would the company’s founder have to be to put together a team that could do it?

Like many questions, when you ask these precisely, they tend to answer themselves. And what is clear is that Elon Musk, founder of both SpaceX and Tesla as well as a co-founder of the OpenAI consortium, doesn’t have time – or money – to waste on science projects.

Just like the superpowers of the world underestimated the amount of computing power it would take to fully simulate a nuclear missile and its explosion, perhaps the makers of self-driving cars are coming to the realization that teaching a car to drive itself in a complex world that is always changing is going to take a lot more supercomputing. And once you reconcile yourself to that, then you start from scratch and build the right machine to do this specific job.

That, in a nutshell, is what Tesla’s Project Dojo chip, interconnect, and supercomputer effort is all about.

At the Hot Chips 34 conference, the chip, system, and software engineers who worked on the Dojo supercomputer unveiled many of the architectural features of the machine for the first time, and promised to talk about the performance of the Dojo system at the Tesla AI Day 2 event on September 30.

Emil Talpes, who worked at AMD for nearly 17 years on various Opteron processors as well as on the ill-fated “K12” Arm server chip, gave the presentation on the Dojo processor that his team created. Debjit Das Sarma, who was a CPU architect at AMD for nearly as long, was credited on the presentation and is currently the autopilot hardware architect at Tesla, as was Douglas Williams, about whom we know nothing. Bill Chang, principal system engineer at the car maker, spent a decade and a half at IBM Microelectronics designing IP blocks and working on manufacturing processes before helping Apple move off of X86 processors to its own Arm chips, and Rajiv Kurian, who has been working at Tesla and then Waymo on autonomous car platforms, talked about how the Dojo system. As far as we know, Ganesh Venkataramanan, who spoke at Tesla AI Day 1 last August and who is senior director of autopilot hardware at Tesla, is in charge of the Dojo project. Venkataramanan was also a leader on the CPU design teams at AMD for nearly a decade and a half.

So in a weird way, Dojo represents an alternate AI future that might have been at AMD had Tesla come to it to help design a custom AI supercomputer from scratch, from the vector and integer units inside a brand new core cores all the way out to a full exascale system designed for scale and ease of programming for the AI training use case.

Like a number of other relatively new platforms from AI startups, the Dojo design is elegant and thorough. And what is most striking are the things that Tesla’s engineers threw out as they focused on scale.

“The defining goal of our application is scalability,” Talpes explained at the end of his presentation. “We have de-emphasized several mechanisms that you find in typical CPUs, like coherency, virtual memory, and global lookup directories just because these mechanisms do not scale very well when we scale up to a very large system. Instead, we have relied on a very fast and very distributed SRAM storage throughout the mesh. And this is backed by an order of magnitude higher speed of interconnect than what you find in a typical distributed system.”

And with that, let’s dive into the Dojo architecture.

The Dojo core has an integer unit that borrows some instructions from the RISC-V architecture, according to Talpes, and has a whole bunch of additional instructions that Tesla created itself. The vector math unit was “mostly implemented” by Tesla from scratch, and Talpes did not elaborate on what this means. He did add that this custom instruction set was optimized for running machine learning kernels, which we take to mean that it would not do a very good job running Crysis....

....MUCH MORE