Your browser is not supported. Please update it.

10 May 2018

AOMedia looks ahead to AV2 as AV1 picks up momentum

When AV1, the first version of the Alliance for Open Media (AOMedia) codec, finally arrived with a frozen code base late in March 2018 the debate over how great its impact will be continued with contradictory results over performance.

One problem is that the relevant codecs, especially the legacy H.264 and its successor HEVC from the ISO MPEG camp, along with VP9 which provided a foundation for AOMedia, in addition to AV1 itself, are all at different stages of their development, making meaningful comparisons difficult. Meanwhile AOMedia already has its next version AV2 under the wings in R&D, while MPEG also has its sequel to HEVC called JEM (Joint Exploration Model) available for early testing, as is being done by the European Broadcasting Union (EBU) alongside AV1 and HEVC.

What is becoming clear is that while some test results are worth assessing they can become smokescreens when bandied about by rival camps as arguments in favor of their codec. Given that substantial progress in codec technology is now being made on all fronts, far more important are the commercial forces that will ultimately determine success. The odds are starting to favor AOMedia as the ultimate winner simply because of the great industry force behind it, although we should note both that broadcasters have already made substantial investments in HEVC for emerging UHD services while AV1 will not actually be widely available in silicon and client devices until 2020. AV1 is also slightly hampered by the encoding overhead, although that will be a diminishing burden given progress in hardware. After all AV1 was designed to trade hardware for efficiency.

At least now it is possible to glimpse at a roadmap and implementation timeline for both HEVC and AV1 for the immediate future over the next two or three years now that the latter is here, and implementation work is underway.

Taking broadcasters first, having invested substantially in HEVC for early 4K/UHD deployments and not usually being awash with cash, they are not going to abandon this path for AV1 at this stage. AV1 is available via an open source model which brings scope for differentiation and innovation but means it is harder to create turnkey, off the shelf solutions for broadcasters. Many are put off by the risks involved and likely costs of development, maintenance and testing. These barriers will disappear if and when AV1 gains momentum, since that would make it worthwhile for relevant developers to commit the resources, but that is a year or two off.

At present most viewing is still in full 1080p HD at best in any case and for those services broadcasters are mostly sticking with H.264 and are reluctant to invest in any new codec, whether HEVC, AV1 or one of the more niche alternatives. For that reason H.264 will stick around for a while yet, until services have migrated to UHD on a large scale. Even service providers that do want to migrate to AV1 will have to wait a year or two for production workflows to be upgraded to support the codec on legacy hardware, or alternatively engage in expensive hardware upgrades. Therefore AV1’s initial adoption is likely to be confined to consumer applications such as YouTube and video on social networks, rather than those traditional broadcast workflows.

That is exactly where the focus of AV1 development is. We saw at NAB last month how enthusiastic Facebook has become over AV1, having recently published its own test results using 400 popular Facebook videos, filmed primarily on smartphones at resolutions of 1080p or lower. Facebook expanded these videos back to full size and re-compressed them to recreate more realistic real-world conditions, but then curiously perhaps only compared AV1 with legacy H.264 variants and VP9 which surely it would beat anyway, rather than HEVC which it never had any intention of using in any case because of the licensing costs. At any rate AV1 came up 30% more efficient than VP9 and racked up gains of 50.3%, 46.2% and 34.0% compared to x264 main profile, x264 high profile and libvpx-vp9, respectively. libvpx-vp9 is the VP9 video encoder for ​the open, royalty-free media file format WebM.

The Facebook tests also highlighted the encoding overhead of AV1, which will hold it back for low latency live stream encoding. Nevertheless in the immediate future the codec world will split largely between HEVC for broadcasting, at least for UHD variants including High Dynamic Range (HDR), while AV1 will sweep the board for OTT. This can be seen from Apple’s dual stance, having first announced support for HEVC in its Safari and Apple iOS products mid-2017 and then joining AOMedia in January 2018. While each camp claimed this was endorsement for their codecs, all it meant was that Apple wanted to avoid OTT services having to support multiple codecs to reach all their target platforms. Now they would be able to reach iOS as well as Android and other devices using AV1.

Apple also saw that Netflix has declared that after initially supporting AV1 in browsers as it becomes available, it would move on to connected devices once chips had been developed perhaps 18 months later. Netflix had decided it was unwilling to pay the higher and unpredictable HEVC royalties and joined AOMedia to avoid these costs but was anxious to ensure quality of experience was maintained. Yet another point is that Apple was planning to launch its own video services and encoding in AV1 would ensure these would reach non-Apple devices.

Then given the tide towards AV1, Apple was also keen to be involved from the outset in development of the successor AV2, which will be the one perhaps destined to oust HEVC from the broadcasting domain. French encoding vendor ATEME is among those appearing to favor this outcome, having pointed out recently that even AV1 comes with new optional tools with the potential for substantial improvement in efficiency given experience in tuning the codec. One of those is motion warping, allowing for the changing shapes and perspectives of objects such as animals or human figures as they move from frame to frame. It was developed in the 1990s to help animators portray natural motion of such objects, allowing for the distortions in walking for example, or turning a corner. Applied in the different context of compression this can allow more accurate motion prediction than traditional block-based approaches that do not identify objects or understand in any way how they move.

Further enhancements in subsequent codecs such as AV2 may involve more high-level content adaptive techniques, which is already being offered as a bolt-on to existing codecs by some optimization vendors such as California’s Beamr. This firm has claimed 20% to 40% savings in bandwidth with combination of HEVC and content aware encoding, compared with HEVC alone.

Meanwhile Netflix, as has been quite widely reported, has re-encoded its entire content catalog for more efficient distribution. Netflix referred to its “key ladders” which associate specific bit rates and resolutions, where for example 5.8 Mbps is the minimum acceptable bit rate for full HD or 1080p using the established H.264 codec. Netflix has realized that its key ladders were not optimal for content at either end of the complexity spectrum. For fast moving sports content quality could dip, while for talking heads it consumed unnecessary bandwidth.

Other enhancements will exploit increased computational power and AI techniques based on neural networks to analyze video more deeply at the object level. The aim will be to break objects such as say dogs down into constituent parts such as the head and then further down to eyes and ears and on to even smaller levels of detail if the resolution is sufficient. It would also embrace “neuro fuzzy segmentation” where geometrical relationships between objects can be encoded for further efficiency, exploiting for example knowledge that a face sits above a neck. This will involve firstly clustering an image into groups of similar pixels in perhaps 4 x 4 blocks and then assembling them into larger areas. Then detection of objects takes place by analyzing these clusters, identifying faces by picking up on their characteristic values for chrominance or color and limited range of luminance or light intensity. Finally refinement is applied to help identify objects in more unclear areas of a frame, such as the edges where only part of an image such as a face is showing. This principle can be extended across time so that relationships between objects within the content can be exploited for motion prediction across a number of frames.

Such enhancements will be supported by all next generation codecs, whether AV2 or JEM, but will still be overshadowed by commercial and royalty considerations. While the AV camp looks like winning in the end, HEVC has a hardware encoder base of 2 billion adding up all those Samsung Galaxy phones and Apple devices. It will be around for some years yet.