The AV1 codec from the Alliance for Open Media (AOM) has passed another significant milestone on the road towards widespread adoption with Intel’s launch of the first open source CPU-based encoder for the emerging standard. This complements the first reference decoders already available giving device makers and streaming providers a full tool set to start developing fully AV1 compliant products and services. We recall that the AV1 code base was finally frozen in March 2018, but it has taken until now for the first fully compliant encoder to arrive and even that is a software tool running in the CPU ahead of dedicated AV1 encoders being released later in 2019.
The Intel CPU-based encoder, which supports Linux, macOS and Windows operating systems, requires a hefty system with at least a Skylake or the very latest Xeon processors and at least 48GB of RAM for 10-bit 4K video encoding. This confines its use for now to service providers, product developers or enthusiasts willing to fork out for the hardware. We do not expect general availability of affordable AV1 encoding hardware until early 2020 at least.
However, Intel’s launch is a significant step forward at a time when AV1 is gaining ground, even though its main rival, HEVC from the MPEG/ITU stable, is well entrenched after early adoption by many broadcasters. AOM, on the other hand, now has huge backing from all the big industry hitters, with founding members Amazon, Cisco, Google, Intel, Microsoft, Mozilla and Netflix, later joined by Apple among others. Of these, only really Apple and Intel have feet in both camps – reflecting their broad customer bases. Intel’s Apollo Lake technology, for example, is used in Exterity’s GPU-based 4K HEVC encoders and there are others.
This rivalry began in the era of MPEG/ITU’s H.264, otherwise known as AVC (Advanced Video Codec) when Google developed VP8 rather than pay royalties to use the former. Both enjoyed widespread adoption in browsers and video players, but at that stage H.264/AVC emerged as the clear winner for two related reasons. Firstly, it was supported by just about every operating system, browser, player and encoding device, and secondly, Apple shunned VP8 which kept it out of its extensive ecosystem.
This battle carried through to the next generation of each, H.265/HEVC and VP9, which again offered very comparable compression efficiency for a given level of visual quality on the same hardware. But there was the crucial difference that the licensing situation around HEVC had become more complex and confusing, deterring many of the major technology companies and service providers from adopting it, Netflix being one. Since even more companies had contributed to HEVC’s development, there were more demanding royalties and they coalesced into three separate patent pools, MPEG LA, HEVC Advance and Velos Media, along with some other independent IP holders. The situation was a mess and HEVC was only saved by a number of major broadcasters having already made commitments and not being able to afford at that stage to start again.
Meanwhile, Google had been creating a successor to VP9 in VP10 but decided to open this up to a larger community as AV1, creating the AOM in September 2015. This was a key move because it won support eventually from all the big technology companies except one or two hardware makers, notably Samsung. This meant it consolidated two other open source codecs with VP10 that could have been competitive and fragmented the field, Cisco’s Thor and Mozilla’s Daala.
This did lead to some concern over potential litigation around AV1, noting that Nokia sued Apple in 2016 around patent infringement of H.264/AVC in a case settled out of court. However, fears over that have subsided after the AOM submitted AV1 to an IP review.
In the early days of HEVC, the main competition came from VP9 if we discount a few new kids on the block bringing radically different technologies that need not be specific to video compression, such as V-Nova’s Perseus. VP9 gained a much larger potential user base than VP8, or indeed than HEVC, since decoding support was incorporated within all the leading browsers, notably Firefox, Edge and Chrome. HEVC was only adopted by Edge and Safari, which – given that every Android phone comes with Chrome – left a significant deficit.
However, HEVC was adopted by major manufacturers for the next generation of contribution codecs after H.264, often for live remote productions, and this gave it a base among broadcasters. But for OTT, consumer products, and even set tops, VP9 moved ahead of HEVC.
Generally, VP9 has prevailed among pure play OTT and SVoD providers and its successor AV1 is poised to consolidate that.
HEVC has only gained traction on the OTT front from broadcasters’ catch up portals, with the BBC for example testing it for coverage of the Royal Wedding, FIFA World Cup and Wimbledon tennis championships in 2018. Viewers with compatible TVs and set tops could opt for a 4K Ultra HD HDR stream of certain games on the BBC iPlayer, although capped to “tens of thousands of people” per game to limit the network demand costs, on a first come first served basis. Live HEVC was used but bit rates were still around 36 Mbps, so that itself restricted access to viewers with at least a 40 Mbps connection. For recorded streams where there is more time for efficient encoding, the BBC was able to get bit rate down to 22.8 Mbps. But that is still quite high and is one reason Netflix has insisted that AV1 should not be released until it is at least 20% more efficient than HEVC. Having evolved from VP10, it could be said AV1 is half a generation ahead of HEVC, less mature at this stage but with greater scope for improvement and it looks as if that 20% gain over HEVC will be delivered. In fact, tests indicate AV1 is 30% ahead of VP9, which is on a par with HEVC.
Even AV1 is not considered sufficiently efficient for encoding Ultra HD content at high frame rates up to 100 fps or even 120 fps if those become supported, or even for 60 fps. For that reason, both AOM and MPEG are pushing ahead much faster than historically with their next generation codec even before AV1 is supported in commercially available products or services.
The AOM’s AV2 is building on features already supported in AV1, such as motion warping, which allows for changing shapes and perspectives of objects such as animals or human figures as they move from frame to frame. It was developed in the 1990s to help animators portray natural motion of such objects, allowing for the distortions in walking for example, or turning a corner. Applied to compression, this can allow more accurate motion prediction than traditional block-based approaches that do not identify objects or understand in any way how they move.
Further enhancements in AV2 will include more high-level content adaptive encoding, which is already being offered as a bolt on to existing codecs by some optimization vendors such as California’s Beamr. This firm has claimed 20% to 40% savings in bandwidth with a combination of HEVC and content aware encoding, compared with HEVC alone.
Other enhancements will exploit increased computational power and AI techniques based on neural networks to analyze video more deeply at the object level. The aim will be to break objects, such as say dogs, down into constituent parts such as the head and then further down to eyes and ears and on to even smaller levels of detail if the resolution is sufficient. It would also embrace “neuro fuzzy segmentation” where geometrical relationships between objects can be encoded for further efficiency, knowing for example where a nose is relative to eyes. This involves firstly clustering an image into groups of similar pixels in perhaps 4×4 blocks and then assembling them into larger areas. Then detection of objects takes place by analyzing these clusters, identifying faces by picking up on their characteristic values for chrominance or color and limited range of luminance or light intensity. Finally, refinement is applied to help identify objects in more unclear areas of a frame, such as the edges where only part of an image such as a face is showing. This principle can be extended across time so that relationships between objects within the content can be exploited for motion prediction across a number of frames.
There is also scope for refining well established and proven techniques, which is the primary focus of the ITU-T Video Coding Experts Group (VCEG) and MPEG in developing their sequel to HEVC which they consider urgent and should result in the first versions for testing in 2021. The aim is to double again the encoding efficiency of HEVC for a given video quality.
Part of this will be achieved through further refinement of block partitioning, which has been central to earlier codecs going back to MPEG2.
Most codecs split video into ever smaller blocks in a recursive process for prediction of successive frames. With HEVC, a single tree structure is supported for dividing each square block into smaller square subblocks and so on. The process stops when a block cannot be split further using that tree, yielding a limited set of rectangular predictions from that frame. VVC has elaborated the tree structure so that blocks can now split in multiple ways, starting with a four way split as in HEVC but then being able to split further both horizontally and vertically and dividing into either two or three parts. This greater geometric versatility allows finer grained analysis of the video, exploiting greater hardware power available, although that may curtail efficiency for live encoding more even than the BBC found with its HEVC trials in 2018.
It looks like it could be a game of leapfrog if VVC does succeed in doubling the efficiency of HEVC by 2021 only for AV2 to come along with a further gain two years later. However, codec technology firms such as Ateme are predicting that AV1 will mature into a codec that outperforms HEVC by an ever-greater degree as engineers gain more experience with the tools and that AOM will move ahead in the race.