From EE Times, June 26:
The latest round of MLPerf training benchmarks includes GPT-3, the model ChatGPT is based on, for the first time. The GPT-3 training crown was claimed by cloud provider CoreWeave using more than 3,000 Nvidia H100 GPUs. What’s more surprising is that there were no entries from previous training submitters Google, Graphcore and others, or other competitors like AMD. It was left to Intel’s Habana Labs to be the only challenger to Nvidia on GPT-3 with its Gaudi2 accelerator.
CoreWeave used 3,584 Nvidia HGX-H100s to train a representative portion of GPT-3 in 10.94 minutes (this is the biggest number of GPUs the cloud provider could make available at one time, and is not the full size of its cluster). A portion of GPT-3 is used for the benchmark since it would be impractical to insist submitters train the entirety of GPT-3, which could take months and cost millions of dollars. Submitters instead train an already partially-trained GPT-3 from a particular checkpoint until it converges to a certain accuracy. The portion used is about 0.4% of the total training workload for GPT-3; based on CoreWeave’s 10.94 minutes score, 3,584 GPUs would take almost two days to train the whole thing.
Nvidia H100s were used for the bulk of the GPT-3 submissions. This is the leading hardware for AI training on the market. Its software includes Nvidia’s Transformer Engine, designed specifically to speed up training and inference of networks based on the same architecture as GPT-3, by lowering precision to FP8 to improve throughput wherever possible....
....MUCH MORE