Your browser is not supported. Please update it.

8 June 2021

Even with Nvidia looming, ARM unveils its largest ever release of new core IP

All eyes are on ARM’s infrastructure processor cores at the moment, as licensees like Marvell try to disrupt Intel’s dominance in key segments like vRAN and AI/ML, and with the potential acquisition by Nvidia looming. But the core IP provider has not forgotten its heartland market, and a few weeks after refreshing its Neoverse line of CPU cores for servers, it has also updated its mobile family of products, as well as cores targeted at PCs.

This was ARM’s largest ever IP release, with new families of cores for CPUs, graphics processors (GPUs), dynamic shared units (DSUs) and interconnect. The only family not to have been refreshed in recent weeks is the newest range, of Ethos network processing units (NPUs) for AI acceleration.

The new Cortex CPU cores are based on the latest ARMv9 architecture and come with a new generation of Scalable Vector Extensions (SVE2), claiming to boost performance and power efficiency compared to the older Neon extensions. In particular, SVE2 promises far faster performance for machine learning and other parallel processing workloads, said ARM. All future Arm Cortex CPUs will implement SVE2.

At the top end came the Cortex-X2 with a new CPU cluster, up to 16MB of L3 cache, and up to 32MB of unified system level cache. The Cortex-X2 claims 16% increase in instructions per clock (IPC) performance and 100% higher machine learning performance compared to its predecessor, Cortex-X1. The X2 is particularly targeted at the PC and Chromebook segments, where ARM-based processors remain a minority compared to Intel x86. However, both Apple and Qualcomm are developing new PC platforms to try to drive the ARM-based connected notebook market at last.

There are also new products in the Cortex-A range, aimed at smartphones and other mobile applications such as automotive. The first new cores are the Cortex-A710 and the Cortex-A510 – as always in recent ARM mobile architectures, these are companion chips for big:little deployment (the 710 enabling processor-intensive tasks, the 510 operating at low power for more routine tasks, in order to strike the best performance/power consumption balance).

ARM claims various microarchitectural changes, compared to the previous generation – A78 – have resulted in 30% better efficiency in the A710, a doubling of ML performance, and an overall improvement of 10%.

The company also claimed 35% better overall performance for updated versions of its Cortex-A55 for embedded and consumer electronics applications. The new version is called Cortex-A510 and features a a dual-core complex with a shared L2 cache to save on size, and the SVE2 vector processing unit. T

In addition, ARM launched a whole new family of GPU cores in its Mali range – the Mali-G710, -G610, -G510 and -G310. The entire family features the Valhalla GPU architecture, which now reaches entry level GPUs for the first time. At the high end, the G710 claims about 20% improvements in efficiency, 35% improvement in ML processing and 20% improvement in gaming.

ARM’s DynamIQ Shared Unit (DSU) is a bridge between compute cores, with a shared L3 and I/O interface reducing size and power in system-on-chip designs. The latest product, the DSU-110, has been enhanced to support up to eight Cortex-X2 cores, 16MB of L3 cache, and either higher performance and/or lower latency. The most significant change is in the microarchitecture which enables simultaneous processing of multiple requests and data paths optimized to reduce latency and increase bandwidth.

Finally, there are two new interconnect designs – CoreLink CI-700 coherent interconnect and CoreLink NI-700 network interconnect.

The various CPU, GPU and Ethos NPU cores can be combined with DSU and CoreLink in a wide range of configurations to suit different devices. And all the cores support a key security enhancement, ARM’s Memory Tagging Extensions (MTE). MTE eliminates memory security vulnerabilities by tagging each memory allocation. Subsequent memory access must have the appropriate tags to access the memory.