From IEEE Spectrum:
In January’s special Top Tech 2017 issue, I wrote about various efforts to produce custom hardware tailored for performing deep-learning calculations. Prime among those is Google’s Tensor Processing Unit, or TPU, which Google has deployed in its data centers since early in 2015.We'll be back with some of the other commentary but in the meantime here are a few (we have hundreds) NVDA posts that may be of interest.
In that article, I speculated that the TPU was likely designed for performing what are called “inference” calculations. That is, it’s designed to quickly and efficiently calculate whatever it is that the neural-network it’s running was created to do. But that neural network would also have to be “trained,” meaning that its many parameters would be tuned to carry out the desired task. Training a neural network normally takes a different set of computational skills: In particular, training often requires the use of higher-precision arithmetic than does inference.
Yesterday, Google released a fairly detailed description of the TPU and its performance relative to CPUs and GPUs. I was happy to see that the surmise I had made in January was correct: The TPU is built for doing inference, having hardware that operates on 8-bit integers rather than higher-precision floating-point numbers.
Yesterday afternoon, David Patterson, an emeritus professor of computer science at the University of California, Berkeley and one of the co-authors of the report, presented these findings at a regional seminar of the National Academy of Engineering, held at the Computer History Museum in Menlo Park, Calif. The abstract for his talk summed up the main point nicely. It reads in part: “The TPU is an order of magnitude faster than contemporary CPUs and GPUs and its relative performance per watt is even larger.”
Google’s blog post about the release of the report shows how much of a difference in relative performance there can be, particularly in regard to energy efficiency. For example, compared with a contemporary GPU, the TPU is said to offer 83 times the performance per watt. That might be something of an exaggeration, because the report itself claims only that there’s a range of between 41 times and 83 times. And that’s for a quantity the authors call incremental performance. The range of improvement for total performance is less: from 14 to 16 times better for the TPU compared with that of a GPU.
The benchmark tests used to reach these conclusions are based on a half dozen of the actual kinds of neural-network programs that people are running at Google data centers. So it’s unlikely that anyone would critique these results on the basis of the tests not reflecting real-world circumstances. But it struck me that a different critique might well be in order.
The problem is this: These researchers are comparing their 8-bit TPU with higher-precision GPUs and CPUs, which are just not well suited to inference calculations. The GPU exemplar Google used in its report is Nvidia’s K80 board, which performs both single-precision (32-bit) and double-precision (64-bit) calculations. While they’re often important for training neural networks, such levels of precision aren’t typically needed for inference.
In my January story, I noted that Nvidia’s newer Pascal family of GPUs can perform “half-precision” (16-bit) operations and speculated that the company may soon produce units fully capable of 8-bit operations, in which case they might be much more efficient when carrying out inference calculations for neural-network programs....MORE
Nvidia's Stock Is Not Yet Out Of The Woods (NVDA)
We focus on the stock, not the company. The company should be fine for at least the next couple years until the artificial intelligence biz catches up to NVIDIA and either takes a different approach or a really different approach and goes quantum computer....BMO Capital On Potential Competition For NVIDIA (NVDA)
Artificial Intelligence: What Could Derail NVIDIA? A Lab in Shenzhen; A Basement in Moscow; An Office in Bristol (NVDA)