IBM’s New Do-It-All Deep Learning Chip
IBM's new chip is designed to do both high-precision learning and low-precision inference across the three main flavors of deep learning
The field of deep learning is still in flux, but some things have started to settle out. In particular, experts recognize that neural nets can get a lot of computation done with little energy if a chip approximates an answer using low-precision math. That’s especially useful in mobile and other power-constrained devices. But some tasks, especially training a neural net to do something, still need precision. IBM recently revealed its newest solution, still a prototype, at the IEEE VLSI Symposia: a chip that does both equally well.
The disconnect between the needs of training a neural net and having that net execute its function, called inference, has been one of the big challenges for those designing chips that accelerate AI functions. IBM’s new AI accelerator chip is capable of what the company calls scaled precision. That is, it can do both training and inference at 32-, 16-, or even 1- or 2-bits.
“The most advanced precision that you can do for training is 16 bits, and the most advanced you can do for inference is 2 bits,” explains Kailash Gopalakrishnan, the distinguished member of the technical staff at IBM’s Yorktown Heights research center who led the effort. “This chip potentially covers the best of training known today and the best of inference known today.”
The chip’s ability to do all of this stems from two innovations that are both aimed at the same outcome—keeping all the processor components fed with data and working.
“One of the challenges that you have with traditional [chip] architectures when it comes to deep learning is that the utilization is typically very low,” says Gopalakrishnan. That is, even though a chip might be capable of a very high peak performance, typically only 20 to 30 percent of its resources can really be brought to bear on a problem. IBM aimed for 90 percent, for all tasks, all the time....MORE