…But Google insists specialized chips are necessary for neural networks

So Microsoft and Intel want to replicate the models of data centers past in the AI cloud world – with standard processors, even if those are now programmable.

But Google insists dedicated hardware is essential for deep machine learning and neural network applications, as Jeff Dean, a Google senior fellow and leader of the Google Brain deep learning research project, argued in a keynote speech at last week’s Hot Chips conference.

Dean explained how deep neural nets are supporting major developments in speech and vision recognition, search, robotics and healthcare, among other things. He outlined how neural networks can help solve all the 14 Grand Challenges for Engineering in the 21st Century, identified by the US National Academy of Engineering in 2008. He particularly focused on five of these – restoring and improving urban architecture; advancing health informatics; engineering better medicines ; reverse engineering the human brain; and engineering the tools for scientific discovery.

But he said that optimized chips and hardware would make far more powerful neural networks, especially as the vast majority of machine learning models use just a small number of specific operations. That reduces the need for programmability and favors a chip which is fully optimized for that handful of operations, says Google.

“People have woken up to the idea that we need more computational power for a lot of these problems,” he said. “Building specialized computers for the properties that neural nets have makes a lot of sense. If you can produce a system that is really good at doing very specific operations, that’s what we want.”

Google has been working on this silicon itself, and is now on to the second generation of its TensorFlow processing unit (TPU), a machine learning ASIC which can achieve 180 teraflops of computation and 64GB of High Bandwidth Memory (HBM) when four devices are combined on a custom board.

The chips are designed to be connected into larger groups, called ‘TPU pods’, which could feature 64 second-generation TPUs, cable of 11.5 petaflops and 4TB of HBM memory.

Google is making 1,000 TPUs available, as a cloud service, for free to top researchers in the ML field.

When TPU development began in 2013, Google did actually focus initially on FPGAs, before deciding that these were not competitive with contemporary GPUs. But the TPU designers then moved from GPUs to the ASIC approach, citing the high power draw of those GPUs, and aiming to create a chip with far lower power consumption, to reduce the running costs for data centers and cloud providers.

In 2015, Google open sourced its TensorFlow software library for machine learning, aiming to set a de facto standard for ML systems. The company claims TensorFlow is the most popular such library and has a large community around it.

Dean told the Hot Chips audience that he wrote a thesis about neural networks in 1990. He believed at the time that they would need about 60 times more compute power than was available then. “It turned out that what we really needed was about 1m times more compute power, not 60,” he said.

The TPU is designed to aid that effort of making massive amounts of compute power usable and affordable, and in doing so, to establish Google’s preferred architectures and software as the norm, in the way the firm has often achieved in previous Internet platforms.

In April, almost a year after the first TPU had been unveiled, Google claimed it was already hitting 15 times the speed and 30 times the performance per watt of Intel’s Xeons and Nvidia’s GPUs, with the first generation of its custom ASIC (though both Intel and Nvidia have upgraded their offerings since then, and Intel also has the FPGA play).

Google’s paper argues that researchers have become too focused on Convolutional Neural Networks (CNNs), which only account for around 5% of Google’s AI cloud workload. It argues that they should turn their attention to the kinds of machine learning that will be used in real world applications, such as MultiLayer Perceptrons (MLPs), which account for around 61% of the Google AI workload in the data center. Currently, the researchers favor the CNNs that are more typically used on edge devices, rather than the cloud-centric tools.