Thursday, June 28, 2018

New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500

The 51st (25th anniversary) Top500 list was released June 25, 2018 in Frankfurt Germany.
Here's some of Top500's thinking on the latest list and the state of the art.

From Top500, June 26:

For the first time in history, most of the flops added to the TOP500 list came from GPUs instead of CPUs. Is this the shape of things to come?
In the latest TOP500 rankings announced this week, 56 percent of the additional flops were a result of NVIDIA Tesla GPUs running in new supercomputers – that according to the Nvidians, who enjoy keeping track of such things. In this case, most of those additional flops came from three top systems new to the list: Summit, Sierra, and the AI Bridging Cloud Infrastructure (ABCI).

Summit, the new TOP500 champ, pushed the previous number one system, the 93-petaflop Sunway TaihuLight, into second place with a Linpack score of 122.3 petaflops. Summit is powered by IBM servers, each one equipped with two Power9 CPUs and six V100 GPUs. According to NVIDIA, 95 percent of the Summit’s peak performance (187.7 petaflops) is derived from the system’s 27,686 GPUs.

NVIDIA did a similar calculation for the less powerful, and somewhat less GPU-intense Sierra, which now ranks as the third fastest supercomputer in the world at 71.6 Linpack petaflops. And, although very similar to Summit, it has four V100 GPUs in each dual-socked Power9 node, rather than six. However, the 17,280 GPUs in Sierra still represent the lion’s share of that system’s flops.
Likewise for the new ABCI machine in Japan, which is now that country’s speediest supercomputer and is ranked fifth in the world. Each of its servers pairs two Intel Xeon Gold CPUs with four V100 GPUs. Its 4,352 V100s deliver the vast majority of the system’s 19.9 Linpack petaflops.

As dramatic as that 56 percent number is for new TOP500 flops, the reality is probably even more impressive.  According to Ian Buck, vice president of NVIDIA's Accelerated Computing business unit, more than half the Tesla GPUs they sell into the HPC/AI/data analytics space are bought by customers who never submit their systems for TOP500 consideration. Although many of these GPU-accelerated machines would qualify for a spot on the list, these particular customers either don’t care about all the TOP500 fanfare or would rather not advertise their hardware-buying habits to their competitors.

It’s also worth mentioning that the Tensor Cores in the V100 GPUs, with their specialized 16-bit matrix math capability, endow these three new systems with more deep learning potential than any previous supercomputer. Summit alone boasts over three peak exaflops of deep learning performance. Sierra’s performance in this regard is more in the neighborhood of two peak exaflops, while the ABCI number is around half an exaflop. Taken together, these three supercomputers represent more deep learning capability than the other 497 systems on the TOP500 list combined, at least from the perspective of theoretical performance.

The addition of AI/machine learning/deep learning into the HPC application space is a relatively new phenomenon, but the V100 appears to be acting as a catalyst. “This year’s TOP500 list represents a clear shift towards systems that support both HPC and AI computing,” noted TOP500 author Jack Dongarra, Professor at University of Tennessee and Oak Ridge National Lab....MORE
Working backward, here is the Top500 press release announcing the new list, June 25:

US Regains TOP500 Crown with Summit Supercomputer, Sierra Grabs Number Three Spot 
FRANKFURT, Germany; BERKELEY, Calif.; and KNOXVILLE, Tenn.—The TOP500 celebrates its 25th anniversary with a major shakeup at the top of the list. For the first time since November 2012, the US claims the most powerful supercomputer in the world, leading a significant turnover in which four of the five top systems were either new or substantially upgraded.

Summit, an IBM-built supercomputer now running at the Department of Energy’s (DOE) Oak Ridge National Laboratory (ORNL), captured the number one spot with a performance of 122.3 petaflops on High Performance Linpack (HPL), the benchmark used to rank the TOP500 list. Summit has 4,356 nodes, each one equipped with two 22-core Power9 CPUs, and six NVIDIA Tesla V100 GPUs. The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.

Sunway TaihuLight, a system developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, drops to number two after leading the list for the past two years. Its HPL mark of 93 petaflops has remained unchanged since it came online in June 2016.

Sierra, a new system at the DOE’s Lawrence Livermore National Laboratory took the number three spot, delivering 71.6 petaflops on HPL. Built by IBM, Sierra’s architecture is quite similar to that of Summit, with each of its 4,320 nodes powered by two Power9 CPUs plus four NVIDIA Tesla V100 GPUs and using the same Mellanox EDR InfiniBand as the system interconnect.

Tianhe-2A, also known as Milky Way-2A, moved down two notches into the number four spot, despite receiving a major upgrade that replaced its five-year-old Xeon Phi accelerators with custom-built Matrix-2000 coprocessors. The new hardware increased the system’s HPL performance from 33.9 petaflops to 61.4 petaflops, while bumping up its power consumption by less than four percent. Tianhe-2A was developed by China’s National University of Defense Technology (NUDT) and is installed at the National Supercomputer Center in Guangzhou, China.

The new AI Bridging Cloud Infrastructure (ABCI) is the fifth-ranked system on the list, with an HPL mark of 19.9 petaflops. The Fujitsu-built supercomputer is powered by 20-core Xeon Gold processors along with NVIDIA Tesla V100 GPUs. It’s installed in Japan at the National Institute of Advanced Industrial Science and Technology (AIST).

Piz Daint (19.6 petaflops), Titan (17.6 petaflops), Sequoia (17.2 petaflops), Trinity (14.1 petaflops), and Cori (14.0 petaflops) move down to the number six through 10 spots, respectively.

General highlights
Despite the ascendance of the US at the top of the rankings, the country now claims only 124 systems on the list, a new low. Just six months ago, the US had 145 systems. Meanwhile, China improved its representation to 206 total systems, compared to 202 on the last list. However, thanks mainly to Summit and Sierra, the US did manage to take the lead back from China in the performance category. Systems installed in the US now contribute 38.2 percent of the aggregate installed performance, with China in second place with 29.1 percent. These numbers are a reversal compared to six months ago.
The next most prominent countries are Japan, with 36 systems, the United Kingdom, with 22 systems, Germany with 21 systems, and France, with 18 systems. These numbers are nearly the same as they were on the previous list....MORE
And the list:
June 2018
The TOP500 celebrates its 25th anniversary with a major shakeup at the top of the list. For the first time since November 2012, the US claims the most powerful supercomputer in the world, leading a significant turnover in which four of the five top systems were either new or substantially upgraded....MUCH MORE
Earlier today:
"NVIDIA Chief Scientist Bill Dally on How GPUs Ignited AI, and Where His Team’s Headed Next" (NVDA)