Your browser is not supported. Please update it.

13 December 2016

AMD rails against Intel and Nvidia with Instinct GPUs for data centers

AMD has launched into the accelerator card market this week, with the launch of its new Radeon Instinct hardware. It’s a blow to Intel, which finds itself struggling to keep up with another GPU-based rival in the emerging market, but the initial benchmarks seem to indicate that AMD will not dethrone Nvidia.

With Alibaba as a major cloud customer, and Google providing AMD’s hardware in its data centers for customers, as well as niche projects like that at the Canadian Hydrogen Intensity Mapping Experiment, AMD can certainly carve out a nice niche for itself – especially for workloads that might run better on its OpenGL/OpenCL architecture, instead of Nvidia’s CUDA design.

However, AMD’s main competitive disadvantage is the comparative size of its R&D budget, compared to its main rivals Intel (CPU) and Nvidia (GPU). The big-two are able to sit on product improvements made in their labs and release them when AMD unveils a new design – perpetually putting AMD on something of a back-foot.

So the new cards appear to house promising performance that might let AMD leapfrog the offerings from Intel and Nvidia – allowing it to effectively catchup with this potential generational leap. With shipments due to commence in the first half of the New Year, AMD will be worried that chief-rival Nvidia is simply sitting on the next version of the P100, given the age of the Nvidia silicon, ready to unleash a new iteration that crushes the new Instinct range. For competition’s sake, we hope that is not the case.

As well as the new cards, AMD also announced two new open source systems that are going to use the Instinct hardware. The first is MIOpen, a free library that will be launched in Q1 2017 that will provide GPU implementations for machine intelligence routines – including convolution, pooling, activation functions, normalization, and tensor format, according to AMD.

The second is the ROCm deep learning framework – a platform that has been optimized to accelerate the more popular deep-learning software frameworks, with Caffe, Torch 7, and Google’s Tensorflow named in the announcement. The goal is to allow programmers to focus on their applications, rather than the management of the underlying hardware and software – and to this end, AMD is also working on supporting next-gen interconnect technologies for interfacing its cards with the underlying servers, via the CCIX, Gen-Z, and OpenCAPI projects.

The three new accelerator cards are the MI6, the MI8, and the MI25 – the last of which houses AMD’s new Vega GPU design. The MI6 claims a 5.7 TFLOPS (FP16) performance, with 16GB of GPU memory, for a memory bandwidth of 224GBps, and a 150W board consumption, using AMD’s Polaris GPU design.

The MI8 is more a more compact design, aimed at small form-factor HPC installations, using the newer Fiji Nano GPU with just 4GB of high-bandwidth GPU memory – providing a memory bandwidth of 512GBps. It claims 8.2TFLOPS of peak FP16 performance, using 175W of board power.

The MI25 has been optimized for deep-learning training programs, which teach the software how to spot objects by feeding it images and providing feedback for the patterns it spots. Powered by the new Vega design, AMD doesn’t provide a TFLOP performance figure for the MI25, or mention its memory capacity, but the power draw seems much higher – at 300W. Given the correlation between numbering and performance, it appears that the MI25 will be somewhere in the 35 TFLOP ballpark.

As for rivals, Intel’s Xeon Phi CPU-based accelerator cards range from 68-core configurations with 215W power requirements up to 72-core and 230W options. When plugged into a Xeon-based server, Intel says its Phi units provide up to 1.2 TFLOPS of performance per Phi coprocessor card.

This figure is a lot lower than AMD’s claimed new potential, and that’s problematic for Intel. Accelerator cards are going to cannibalize Intel’s server shipments, as they allow a user to but a simple dual-socket Xeon server and fill it with accelerators, achieving performance figures comparable to dozens of Xeon cores over multiple servers. These cards are a very disruptive presence in the cloud market, and AMD’s new Naples platform for pairing up to 16 of the cards with one of its Xen CPUs in a server is emblamtic of the struggles that lay ahead for Intel.

As such, Intel is stepping into the accelerator market, in an attempt to keep its Xeons at the heart of machine-learning and AI workloads, but given its benchmark figures, it doesn’t look capable of competing with the GPU-based designs. However, it’s worth noting that the requirements of individual workloads may favor different architectures, and so the TFLOP figure is not the sole consideration for a deployment.

Nvidia, meanwhile, provides the P100 accelerator card, a GPU-based solution targeted the emerging AI and ML applications that need horsepower that a pure CPU can’t provide at a comparable cost. Nvidia’s P100 claims 21 TFLOPS of FP16 performance, monstering the new MI6 and MI8 AMD cards – although the MI25 (if the naming numbering is to be believed) could take a scalp here – but only until the inevitable Nvidia refresh.

The P100 has a maximum power consumption of 250W, and is sold through Cisco, Dell, HPE, IBM, Lenovo, and Supermicro – among a plethora of other names that didn’t make the Elite OEMs partner list.

AMD hasn’t released pricing for the Instinct range, while Intel charges between $2,585 – $6,401 for its Xeon Phi cards. Similarly, Nvidia’s prices are not publicly available.