AI Research Targets Nvidia, Mobile
Training chip could beat V100 using mobile DRAM
A researcher from the University of Texas at Austin described a chip for training deep neural networks that he said can outperform an Nvidia V100 — even using low-cost mobile DRAM. At the same event, Arm discussed research on a chip that can significantly increase efficiency for computer vision jobs run on mobile systems.
The papers were among more than 30 at the second annual SysML, a gathering at Stanford of top researchers grappling with systems-level issues in deep learning. Their work showed that it’s still early days for the fast-moving field, with engineers still finding fundamental techniques and applications for this new form of computing.
Speakers showed a willingness to talk candidly about their techniques, prototype chips, and applications in the interests of moving the emerging field forward.
IBM presented techniques for reducing neural-net precision down to 2 bits without significant loss of accuracy. For its part, Facebook showed an approach for saving costs by storing recommendation models in solid-state drives rather than DRAM.
In one of the most noteworthy papers, a researcher from UT Austin described mini-batch serialization (MBS), a method to slash memory accesses needed to train convolutional neural networks (CNNs) so that more work fits in on-chip buffers. When implemented on its own WaveCore chip, the technique reduced DRAM traffic 75%, improved performance 53%, and saved 26% of system energy compared to conventional approaches and accelerators....MORE