MediaTek bites at Qualcomm’s heels, but the US giant wins on AI and 5G

Qualcomm is used to being first across every line when it comes to new modem technologies, and there has certainly been a bright spotlight on its latest flagship, the Snapdragon 855. However, MediaTek, whose place in the smartphone market has weakened in the past couple of years, followed hard on its arch-rival’s heels, announcing its Helio P90 smartphone system-on-chip just a week after the 855 made its debut.

Made in a 12nm node, the P90 runs on two ARM Cortex-75 cores running at up to 2.2 GHz and six A55 cores at 2 GHz. These specs are outshone by those of the 7nm Snapdragon 855, which uses four large cores, one running up to 2.84 GHz and three at 2.42 GHz, and four A55 cores at 1.8 GHz.

The P90 is not MediaTek’s 5G offering – it supports LTE Category 13 with triband carrier aggregation, promising download speeds of 600Mbps, and also supports 802.11ac WiFi with 2 × 2 MIMO.

Snapdragon 855 integrates a Category 20 LTE modem supporting 7× carrier aggregation and two flavors of WiFi — pre-standard 802.11ax and, in the 60 GHz band, 802.11ay, both claiming to be capable of 10Gbps data rates. In addition, the 855 can be implemented with a separate 5G modem, and is already powering first generation 5G handsets from Samsung and OnePlus, due to launch early next year in sub-6 GHz and millimeter wave bands.

Mediatek has an M70 standalone 5G modem, but said: “We’re not certain how widely it will be made available commercially … we’re looking at going directly to an integrated SoC product rather than a standalone modem and focus on sub-6 GHz, aiming for customer products launching in late 2019 or early 2020,” said Russ Mestechkin, senior director of sales and business development. “Our focus is to commercialize our SoC for mainstream handsets at $500+ as opposed to ultra-premium models at $1,000+.”

Phil Solis, handset analyst at IDC, believes Qualcomm has been winning back share with Chinese phonemakers, from MediaTek, although the latter is “starting to win back share with Xiaomi and Vivo … Oppo and Vivo are also using its P50 in mid- to high end products, which will bear fruit in 4Q18,” he said. But he believes MediaTek is “far behind” in 5G and will, instead, focus for growth outside the smartphone market, in “smart speakers, TVs, IoT devices, and other consumer products for their processors and connectivity”.

Meanwhile, Qualcomm’s claim to have launched the world’s first AI-based Image Signal Processor (ISP) must be seen in the context of wider trends in silicon design optimization, for algorithms, and inevitably, in the hype around the AI field. The claim is a bit of a stretch, because what Qualcomm has done is pull the compute units of the CPU, GPU and DSP (digital signal processor) of its latest  Qualcomm Snapdragon 855 mobile platform out into its latest Spectra 380 ISP.

It should really be seen then as part of the overall architectural evolution of Snapdragon, highlighting the importance and computational demand of image processing in smartphones today. It frees up cycles from the CPU, GPU and DSP (Digital Signal Processor) for non-image tasks, while accelerating computer vision (CV) processing by locating the relevant components close together.

This tighter integration of hardware-accelerated CV capabilities allows the Spectra 380, and therefore the whole Snapdragon mobile platform, to classify and segment visual objects much faster with greater depth sensing, while cutting power consumption by up to 75%, according to the company. The latter helps meet the ever-present challenge of keeping a smartphone going through a day without recharge, given that image processing is a major source of power drain.

This can help render special effects such as fuzzy ‘bokeh’ video, as well as 4K HDR (High Dynamic Range) video capture, which is computationally intensive – because it is essentially faking the effect of a large camera lens. The object segmentation will also allow real-time background swaps, which is a requisite for augmented reality (AR) and virtual reality (VR).

This integration of CV components into the ISP resonates with the trend for chips to be designed for running intensive AI and machine learning algorithms. Qualcomm’s Snapdragon 855 also brings an improved DSP, the Hexagon 690, which doubles the number of vector accelerators and brings it closer to being a neural processor for machine learning.

Such trends raise a point which is perhaps missed by many in the AI field, to judge from the recent AI World conference held in Boston. Among numerous presentations involving over 200 speakers there was just one panel debate devoted to AI hardware and that was sparsely attended by only about 20 people. Yet the current renaissance in AI is largely the result of the spectacular advance in hardware and especially GPU performance. This has improved so much that the bottleneck retarding execution of many AI and especially machine learning algorithms is now I/O rather than compute.

The latest GPUs, such as Nvidia’s Tesla T4 AI chip, boast impressive specifications, running almost 12 times faster than the preceding P4 chip at the half precision FP16 floating point arithmetic relevant for AI calculations. But the problem can lie in keeping them fed continuously with data during the training of machine learning algorithms in particular. If data does not arrive like a production line, ready to be used at the optimal time, the chip ends up idling, and in practice this can account for a remarkable proportion of the time taken for active algorithm training.

According to Andy Watson, CTO of WekaIO, which has developed a parallel file system optimized for flash memory, a GPU can be idle for up to 99% of the training time, with obvious scope therefore for slashing the duration of training. Many in the field have come to accept training times in the order of weeks without realizing that the GPUs involved are performing very inefficiently. Watson cited a case of a self-driving car system where the training time was cut by 80x, bringing the total down from two weeks to four hours, giving much greater scope for experimentation as a result of eliminating excessive wait states.

WekaIO’s file system, called MatrixFS, aggregates local SSDs (Solid State Drives) inside the servers into one logical pool, which is then presented to host applications as a distributed and massively parallel file system. This has attracted the attention of a few partners involved in high performance computing, including the San Diego Supercomputer Center, used for applications that are part of the grand challenges of science outlined by the US National Science Foundation (NSF). This fast I/O is therefore still work in progress, but WekaIO can at least point to impressive utilization gains achieved by streamlining deliver of data from such fast flash layers to GPUs. We anticipate this being a growing field spanning AI and high-performance computing research.