Artificial intelligence (AI) is nothing new, but it is having its latest day in the sun because, at last, there is sufficient affordable compute and storage power to make it viable outside specialized labs. With associated disciplines like machine learning (ML), it has the potential to make sense of the vast quantities of data generated by connected people, cars and ‘things’, and to transmit instructions back to those objects so that they can act autonomously.
Those processes appear to present a significant opportunity for mobile operators, since the connected robots, cars and sensors will have to communicate with the central AI engine in the cloud, often over mobile links. But those links are themselves challenging, since many AI-enabled applications will require very high quality of service and very low latency. That has led to high interest in pushing as much AI processing as possible to the edge of the network, to reduce the distance to the engine, and even to the device itself.
That could reduce the need for ultra-low latency, one of the principle justifications for investment in 5G, and some AI platforms are adopting a form of ‘batch processing’, with a high percentage of activity going on at the edge and selected results uploaded to the central cloud periodically to enrich the machine learning base. This makes it important, for mobile operators, that they can monetize AI even when it is at the edge, whether by supporting shorter range links (from a gateway to a device perhaps, or between an autonomous car and roadside infrastructure).
The right balance between the economies of scale and efficiency of a centralized, cloud-based system, and the responsiveness of a distributed, edge-based approach, is one of the key decisions facing enterprises and service providers – and that, in turn, will affect how mobile operators need to plan their networks. Elements of both will be required of course. Some AI-driven processes will remain too compute-intensive, security-sensitive or unusual to be distributed out to smaller gateways and devices. But the balance of investment and enthusiasm currently seems to be tipped towards the edge, and there is a race to support AI/ML applications on small platforms, which is seeing a shifting definition of where the network edge actually lies.
Qualcomm and Intel are battling for dominance here, as in so many other areas of the networked society and the Internet of Things. In the past month, Qualcomm has made its Snapdragon Neural Processing Engine software development kit available, providing programmers with tools to create on-device neural network-driven applications. And it has acquired Dutch start-up Scyfer, a spin-off from the University of Amsterdam, which has been developing a deep learning platform that has been used in several vertical markets such as manufacturing, healthcare and finance, and is also heavily edge-focused.
Qualcomm argues that carrying out most of the AI processing on the device can improve reliability, privacy protection and bandwidth efficiency, compared to solutions which have to transmit data to the cloud for processing.
“We started fundamental research a decade ago, and our current products now support many AI use cases from computer vision and natural language processing to malware detection on a variety of devices — such as smartphones and cars — and we are researching broader topics, such as AI for wireless connectivity, power management and photography,” said Matt Grob, EVP of technology at Qualcomm.
Meanwhile, Intel has shown off the latest design from Movidius, the machine vision processor start-up it acquired last year to strengthen its IoT and AI strategies.
The Myriad X vision processing unit (VPU) made its debut last week and claims to be the first system-on-chip with a dedicated Neural Compute Engine. It is “specifically designed to run deep neural networks at high speed and low power without compromising accuracy, enabling devices to see, understand and respond to their environments in real time,” said Intel.
The low power SoC is a successor to the Myriad 2 but makes a significant leap in functionality and performance. It can handle four trillion operations per second (up from a peak of 1.5 trillion) on an edge-based platform or device, enabling it to sense change in its environment and take action accordingly. It can support challenging tasks such as instant haptic feedback for remote surgery, or making inferences in gaming or decision support – rather than having to communicate with the central engine every time.
The SoC runs on 16 of Movidius’s homegrown cores (up from 12 in Myriad 2). It calls its DSP (digital signal processor) core architecture SHAVE (Streaming Hybrid Architecture Vector Engine). Now, on top of that, it has the new neural compute engine to provide localized deep learning capabilities. This engine is built on over 20 enhanced hardware accelerators which perform specific tasks optimally, and without adding to the compute overhead. Examples include depth mapping to help drones to land, or optical flow for very high performance motion estimation (for surveillance cameras that need to track large numbers of people or objects).
Intel Movidius exec Remi El-Ouazzane told EETimes: “Your architecture needs to deal with new types of workloads in hardware microarchitecture, while DSPs are super-useful in running new types of computer vision and deep learning algorithms.”
The memory architecture, as well as the accelerators, helps to achieve the low power consumption which is critical to doing AI at the edge. Minimizing off-chip data transfer helps keep the power budget to 1W – on-chip memory is 2.5MB, up from 2MB in Myriad 2.
“With this faster, more pervasive intelligence embedded directly into devices, the potential to make our world safer, more productive and more personal is limitless,” El-Ouazzane wrote in a blog post. The main target applications are currently in connected vehicles, as well as drones, robotics, smart surveillance and virtual reality headsets.
Movidius is one of a string of start-ups Intel has acquired in the computer vision and AI fields. They include RealSense; deep learning chip provider Nervana Systems; Saffron, which has developed a cognitive computing platform; and Ascending Technologies, a German drone company with sense-and-avoid algorithms.
This sees Intel taking on Qualcomm’s growing range of silicon for this space including its Snapdragon Flight platform for drones and its Zeroth mobile ‘brain chip’, which is heavily focused on machine vision, as well as the Snapdragon NPE. Jim McGregor, principal analyst at TIRIAS Research, told EETimes that the embedded market is becoming “more competitive, especially with the entrance of the mobile chip vendors, who are accustomed to providing complete reference designers for applications”.
The area where the right balance between centralized and localized may be most critical is the connected car, especially as that becomes more autonomous. Some of the vehicle’s applications will rely on wide area connectivity to the cloud, but for safety-critical decisions, it will need to depend heavily on communications with nearby infrastructure, small cells and other vehicles, and have a great deal of AI processing taking place on its own dashboard.
Qualcomm aims to give operators, drivers or carmakers the flexibility to support all types of in-car communications, with its new cellular ‘vehicle-to-everything’ (V2X) chipset, the 9150 C-V2X. This will support 4G and 5G, but also specialized networks and ADAS (advanced driver assistance systems).
This will enable cars to access data and services across cellular networks when appropriate, but also use low latency connections with nearby connected infrastructure and other items (including those carried by pedestrians). It will also provide data about the vehicle’s surroundings to enrich the picture built up by ADAS components such as cameras, radar and LIDAR.
The chipset enables “two transmission modes of direct communications and network-based communications”, according to Qualcomm. Specifically, as well as LTE and 5G, it will access Intelligent Transportation Systems (ITS) bands to communicate with nearby connected items, without needing a cellular network, SIM card or subscription. The main ITS band in the US and some other regions is 5.9 GHz, used for the DSRC (dedicated short-range communications) technology.
“With its strong synergy with telematics and an evolution towards 5G, C-V2X offers benefits to the automotive industry by developing new capabilities for improving road safety, and enhancing autonomous driving and advanced connected services, while building on the ITS momentum and investments made over the last decade,” said Nakul Duggal, Qualcomm’s VP of product management in a statement.
A reference design based on the chipset will include an application processor running the ITS V2X stack and a hardware security module. The chipset will be available for commercial sampling in the second half of next year.
One of the US firm’s automotive partners, Audi, endorsed the new product. “Qualcomm Technologies’ anticipated 9150 C-V2X chipset serves as a major milestone in paving the road for 5G and safer autonomous driving,” said Thomas Muller, Audi’s head of electronics. Other Qualcomm partners in this sector include Ford, the PSA Group (Peugot, Citroen and DS) and China’s SAIC Motor.
Qualcomm describes custom core at heart of its server chip:
Both Qualcomm and Intel also want to support the centralized aspect of connected AI – the huge server farms which will run the machine learning engines from which intelligence and data can be distributed to the edge. Intel is in pole position here, though it has had its confidence knocked by Nvidia’s success in promoting graphical processor units (GPUs) as the optimal solution for AI/ML. Intel has hit back with more powerful solutions harnessing its CPUs as well as the programmable FPGA technology it acquired with Altera, though Google is arguing that dedicated, specialized chips are required for top end AI (see Wireless Watch August 28 2017).
Qualcomm aims to mount a challenge to Intel’s Xeon with its first data center processor, the Centriq. At last week’s Hot Chips conference, it described the Falkor CPU core at the heart of this product. Though ARM-based, like rival offerings from Cavium and others, the 64-bit Falkor core has been heavily customized under Qualcomm’s architectural licence. Centriq 2400, which will ship later this year, runs on 48 cores and is implemented in 10nm for low power and size (Intel’s new Xeon Skylake is 14m).
Although, when Qualcomm first announced its server chip plans, it was assumed it would not go head-to-head with Intel, but would target newer applications such as cloud coprocessors or Cloud-RAN, it now says Centriq will go after large data centers too. Microsoft has been testing ARM-based server chips from both Qualcomm and Cavium, and Google was also reported to be working with Qualcomm.
There are still few details about key factors such as performance, power consumption or price but Qualcomm did reveal that Falkor’s 48 cores are actually made up of 24 dual-core processors, each pair sharing power control, L2 cache and the ring bus interface with more than 250Gbps aggregate bandwidth. The cores and L2 caches can run at different power states according to the task at hand, with one dropping down to save energy. The L3 cache is centralized on the ring bus.
The SoC supports ARM’s virtualization, TrustZone security and instruction extensions to accelerate crypto operations.