From Motley Fool Transcribing – Feb 26, 2025 at 9:30PM
....Stewart Stecker -- Senior Director, Investor Relations
hank you. Good afternoon, everyone, and welcome to NVIDIA's conference call for the fourth quarter of fiscal 2025. With me today from NVIDIA are Jensen Huang, president and chief executive officer; and Colette Kress, executive vice president and chief financial officer. I'd like to remind you that our call is being webcast live on NVIDIA's Investor Relations website.
The webcast will be available for replay until the conference call to discuss our financial results for the first quarter of fiscal 2026. The content of today's call is NVIDIA's property. It can't be reproduced or transcribed without prior written consent. During this call, we may make forward-looking statements based on current expectations.....
*****
Colette M. Kress -- Chief Financial Officer, Executive Vice President
Thanks, Stewart. Q4 was another record quarter. Revenue of $39.3 billion was up 12% sequentially and up 78% year on year and above our outlook of $37.5 billion. For fiscal 2025 revenue was $130.5 billion, up 114% from the prior year.
Let's start with Data Center. Data center revenue for fiscal 2025 was $115.2 billion, more than doubling from the prior year. In the fourth quarter, Data Center revenue of $35.6 billion was a record, up 16% sequentially and 93% year on year as the Blackwell ramp commenced and Hopper 200 continued sequential growth. In Q4, Blackwell sales exceeded our expectations.
We delivered $11 billion of Blackwell revenue to meet strong demand. This is the fastest product ramp in our company's history, unprecedented in its speed and scale. Blackwell production is in full-year across multiple configurations, and we are increasing supply quickly expanding customer adoption. Our Q4 Data Center compute revenue jumped 18% sequentially and over 2x year on year.
Customers are racing to scale infrastructure to train the next generation of cutting-edge models and unlock the next level of AI capabilities. With Blackwell, it will be common for these clusters to start with 100,000 GPUs or more. Shipments have already started for multiple infrastructures of this size. Post-training and model customization are fueling demand for NVIDIA infrastructure and software as developers and enterprises leverage techniques such as fine-tuning reinforcement learning and distillation to tailor models for domain-specific use cases.
Hugging Face alone hosts over 90,000 derivatives freighted from the Llama foundation model. The scale of post-training and model customization is massive and can collectively demand orders of magnitude, more compute than pretraining. Our inference demand is accelerating, driven by test time scaling and new reasoning models like OpenAI's o3, DeepSeek-R1, and Grok 3. Long-thinking reasoning AI can require 100x more compute per task compared to one-shot inferences.
Blackwell was architected for reasoning AI inference. Blackwell supercharges reasoning AI models with up to 25x higher token throughput and 20x lower cost versus Hopper 100. It is revolutionary transformer engine is built for LLM and mixture of experts inference. And its NVLink Domain delivers 14x the throughput of PCIe Gen 5, ensuring the response time, throughput, and cost efficiency needed to tackle the growing complexity of infants of scale.
Companies across industries are tapping into NVIDIA's whole STAG inference platform to boost performance and slash costs. Now, tripled inference throughput and cut costs by 66%, using NVIDIA TensorRT for its screenshot feature. Perplexity sees 435 million monthly queries and reduced its inference costs 3x with NVIDIA Triton Inference Server and TensorRT-LLM. Microsoft Bing achieved a 5x speed up at major TCO savings for visual search across billions of images with NVIDIA, TensorRT, and acceleration libraries.
Blackwell has great demand for inference. Many of the early GB200 deployments are earmarked for inference, a first for a new architecture. Blackwell addresses the entire AI market from pretraining, post-training to inference across cloud, to on-premise, to enterprise. CUDA's programmable architecture accelerates every AI model and over 4,400 applications, ensuring large infrastructure investments against obsolescence in rapidly evolving markets....
*****
And jumping ahead to the analyst Q&A:
Thank you. [Operator instructions] And your first question comes from C.J. Muse with Cantor Fitzgerald. Please go ahead.
C.J. Muse -- Analyst
Yeah. Good afternoon. Thank you for taking the question. I guess for me, Jensen, as test-time compute and reinforcement learning shows such promise, we're clearly seeing an increasing blurring of the lines between training and inference.
What does this mean for the potential future of potentially inference dedicated clusters? And how do you think about the overall impact to NVIDIA and your customers? Thank you.
Jensen Huang -- President and Chief Executive Officer
Yeah, I appreciate that, C.J. There are now multiple scaling laws. There's the pre-training scaling law, and that's going to continue to scale because we have multimodality, we have data that came from reasoning that are now used to do pretraining. And then the second is post-training skilling, using reinforcement learning human feedback, reinforcement learning AI feedback, reinforcement learning, verifiable rewards.
The amount of computation you use for post-training is actually higher than pretraining. And it's kind of sensible in the sense that you could, while you're using reinforcement learning, generate an enormous amount of synthetic data or synthetically generated tokens. AI models are basically generating tokens to train AI models. And that's post-trade.
And the third part, this is the part that you mentioned is test-time compute or reasoning, long thinking, inference scaling. They're all basically the same ideas. And there, you have a chain of thought, you have search. The amount of tokens generated the amount of inference compute needed is already 100x more than the one-shot examples and the one-shot capabilities of large language models in the beginning.
And that's just the beginning. This is just the beginning. The idea that the next generation could have thousands of times and even, hopefully, extremely thoughtful and simulation-based and search-based models that could be hundreds of thousands, millions of times more compute than today is in our future. And so, the question is, how do you design such an architecture? Some of it -- some of the models are auto-regressive.
Some of the models are diffusion-based. Some of it -- some of the times you want your data center to have disaggregated inference. Sometimes, it is compacted. And so, it's hard to figure out what is the best configuration of a data center, which is the reason why NVIDIA's architecture is so popular.
We run every model. We are great at training. The vast majority of our compute today is actually inference and Blackwell takes all of that to a new level. We designed Blackwell with the idea of reasoning models in mind.
And when you look at training, it's many times more performing. But what's really amazing is for long thinking test time scaling, reasoning AI models were tens of times faster, 25x higher throughput. And so, Blackwell is going to be incredible across the board. And when you have a data center that allows you to configure and use your data center based on are you doing more pretraining now, post-training now, or scaling out your inference, our architecture is fungible and easy to use in all of those different ways.
And so, we're seeing, in fact, much, much more concentration of a unified architecture than ever before.
Operator
Your next question comes from the line of Joe Moore with JPMorgan. Please go ahead.
Joe Moore -- JPMorgan Chase and Company -- Analyst
Morgan Stanley, actually. Thank you. I wonder if you could talk about GB200 at CES, you sort of talked about the complexity of the rack-level systems and the challenges you have. And then as you said in the prepared remarks, we've seen a lot of general availability.
Where are you in terms of that ramp? Are there still bottlenecks to consider at a systems level above and beyond the chip level? And just have you maintained your enthusiasm for the NVL72 platforms?
Jensen Huang -- President and Chief Executive Officer
Well, I'm more enthusiastic today than I was at CES. And the reason for that is because we've shipped a lot more since CES. We have some 350 plants manufacturing the 1.5 million components that go into each one of the Blackwell racks, Grace Blackwell racks. Yes, it's extremely complicated.
And we successfully and incredibly ramped up Grace Blackwell, delivering some $11 billion of revenues last quarter. We're going to have to continue to scale as demand is quite high, and customers are anxious and impatient to get their Blackwell systems. You've probably seen on the web, a fair number of celebrations about Grace Blackwell systems coming online and we have them, of course. We have a fairly large installation of Grace Blackwell for our own engineering and our own design teams and software teams.
CoreWeave has now been quite public about the successful bring-up of theirs. Microsoft has, of course, OpenAI has, and you're starting to see many come online. And so, I think the answer to your question is nothing is easy about what we're doing, but we're doing great, and all of our partners are doing great....
....MUCH MUCH MORE
There is, to put it bluntly, no other company in the world that is doing what Nvidia is doing.
That being said, the stock see-sawed after the earning release with the final after-market print being at $129.32 down $1.96 (-1.49%) from the regular session close.