Monday, February 19, 2018

Oak Ridge National Laboratory's Report On China's Fastest Supercomputer in the World

Sunway TaihuLight is the computer ORNL will surpass later this year with the new Summit system, more than doubling the speed of the Sunway TaihuLight.
However....China is already planning an exascale machine, five times faster than Summit, capable of a billion billion calculations per second.
This piece is a personal bookmark I'll probably be referring back to. 

From Oak Ridge National Laboratory:

June 24, 2016
Report on the Sunway TaihuLight System
The Sunway TaihuLight System was developed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC) , and installed at the National Supercomputing Center in Wuxi (a joint team with the Tsinghua University, City of Wuxi, and Jiangsu province) , which is in China's Jiangsu province. The CPU vendor is the Shanghai High Performance IC Design Center. The system is in full operation with a number of application s implemented and running on the system. The Center will be a public supercomputing center that provides services for public users in China and across the world. The complete system has a theoretical peak performance of 125 .4 Pflop/s with 10,649,600 cores and 1.31 PB of primary memory . It is based on a processor , the SW26010 processor, that was designed by the Shanghai High Performance IC Design Center. The processor chip is composed of 4 core groups (CGs) , see figure 1 , connected via a NoC , see figure 2, each of which includes a Management Processing Element (MPE) and 64 Computing Processing Elements (CPEs) arranged in an 8 by 8 grid. Each CG has its own memory space, which is connected to the MPE and the CPE cluster through the MC. The processor connects to other outside devices through a system interface (SI).

Each CPE Cluster is composed of a Management Processing Element (MPE) which is a 64 - bit RISC core which is supporting both user and system modes, a 256 - bit vector instructions, 32 KB L1 instruction cache and 32 KB L1 data cache, and a 256KB L2 cache. The Computer Processing Element (CPE) is composed of an 8x8 mesh of 64 - bit RISC cores, supporting only user mode, with a 256 - bit vector instructions, 16 KB L1 instruction cache and 64 KB Scratch Pad Memory (SPM)....
...MUCH MORE (24 page PDF)

January 29
"China set to launch its new supercomputer"
NVIDIA watches, some commentary after the jump...
January 11
With the Summit Supercomputer, U.S. Could Retake Computing’s Top Spot (NVDA)
September 2017
"The Astonishing Engineering Behind America's Latest, Greatest Supercomputer"

And many more, including:
June 2016