Google has announced a release candidate for version 1.0 of its open source machine-learning framework. Called Tensorflow, the deep learning library can be thought of like a programming language, and a new compiler has opened it up to running on mobile devices. IBM has wasted no time in announcing Tensorflow support for its PowerAI compute platform, for servers running its Power chips and Nvidia’s GPUs.
Google already internally uses Tensorflow to power many of its applications, such as its searches, Gmail, Photos, Street View, Translate, and speech recognition for Android. While it was initially released under an Apache 2.0 license in 2015, the developmental software library is set for a big step forward with its new features.
Tensorflow is a collection of algorithms that have been designed to replicate the kinds of cognitive reasoning and computation that human brains carry out – a bit like a very complicated set of flow charts, or stateful dataflow graphs, to use the proper term. Google uses it to better tailor its applications to its users’ behavior, and began developing the Tensorflow predecessor DistBelief back in 2011, under its Google Brain division.
Now, Tensorflow is proving a very popular method for carrying out machine-learning and AI functions in neural networks – or computers designed like the human brain. While the reference design for Tensorflow runs on a single computer, most instances are run across multiple machines working in unison. With the new Tensorflow Accelerated Linear Algebra (XLA) compiler, as well as a Java API and other improvements to Python and Java support, it looks like Tensorflow is being adapted to run on mobile devices.
The XLA compiler better manages the available resources, which is a boon to mobile devices that don’t have additional hardware to fall back on. While it is still in a very experimental phase, the new compiler is poised to open the door to improved machine-learning applications running on smartphones and IoT devices. With hardware partners including Qualcomm, IBM, and Raspberry Pi, Tensorflow is moving into the world of IoT hardware.
For developers, the new release should be easier to work with, thanks to a new Docker image that is compatible with Python 3’s pip package manager – meaning that regular Python can be used instead of a forked version like Anaconda that better supports machine-learning. Reduced memory requirements are another boon to mobile devices, and Tensorflow currently supports Android, iOS, and the Raspberry Pi.
Moving forward, Tensorflow is looking to become somewhat hardware agnostic – able to run on any architecture, without the need for an always-on cloud backend service. That would fully open it up to the kinds of resource and bandwidth constrained devices needed in the IoT, but for now, Tensorflow is still a resource-intensive system.
The process of teaching a machine to carry out a learned function is incredibly resource-intensive in the early stages, as it deals with raw data and develops a rules-based system to respond to that data. Google famously fed its system YouTube thumbnails, whereupon the neural network began hunting for cats in the inputs. The pattern recognition feature that such a system would learn here could eventually be boiled down to a set of code small enough to run on a smartphone, to form part of an application, but the smartphone has nowhere near enough resources to learn that behavior in the first place – hence the need for powerful machines and clouds.
IBM views Tensorflow as a way to sell more of its server capacity to businesses looking to use Tensorflow in their applications. Offering it alongside rival machine-learning frameworks like Caffe, Theano, Torch, and Nvidia’s cuDNN, IBM is hoping to lure businesses into its PowerAI platform service offerings, on the back of its wide-ranging support for machine-learning – either as a cloud offering, or as a server appliance to be installed on-premises.
In terms of the hardware that powers that PowerAI platform, it comes as no surprise that IBM’s Power processor architecture is prominent. With its Power8 CPUs, Big Blue is staving off Intel’s AI ambitions, and embracing another company that is giving Intel nightmares – Nvidia.
In the data center, Intel still enjoys an absolutely dominant market share. In terms of general-purpose server chips, its Xeons have completely cornered the market. However, the looming threat of ARM chips emerging, combined with the rise of GPU-based workloads stealing market share from Intel’s CPU platforms, has led Intel to attempt to use the IoT as a strong portfolio diversification opportunity.
With Nvidia specifically, Intel is worried that Nvidia’s GPU accelerator cards could damage the growth of its enterprise CPU business – if software allows for tasks to be run more efficiently on something like a dual-socket Intel server running a handful of Nvidia accelerators, instead of requiring handfuls of CPUs across multiple racks.
In traditional workloads, the software favors the CPU architecture, but in emerging applications like machine-learning and AI, GPUs are currently the darlings of the software developers behind projects like Tensorflow. Nvidia leads this market, and shows no real signs of slowing down in that regard, and this worries Intel immensely.
So IBM’s PowerAI model completely removes Intel from this kind of niche but lucrative market, thanks to IBM supplying the CPUs, and Nvidia providing the currently favored GPU horsepower. With its current mainstay, the Tesla P100 accelerator card, featuring heavily in PowerAI, Nvidia looks to be integral to IBMs plans here – although Nvidia-rival AMD has similar products that could gain traction here..
However, IBM is always exposed to other cloud computing providers, such as Amazon and Microsoft, who can provide access to GPU compute resources via general purpose cloud servers. IBM is hoping that its dedicated platform will entice those with bleeding-edge performance requirements, but a price war between cloud resources and dedicated iron appliances would prove interesting for observers.
The Tensor Processing Unit, or TPU, is an application-specific integrated circuit (ASIC) that has been designed specifically for Tensorflow. Google used the TPUs in its AlphaGo competition, but also uses them in its day-to-day operations – processing around 100m photos per day per TPU, and augmenting search results. A blog from Norm Jouppi, a Google Distinguished Hardware Engineer, can be found here, and elaborates on the uses of the TPU. Should Google choose to push the TPU as a product, it would prove rather disruptive to the likes of IBM and Nvidia.