Your browser is not supported. Please update it.

8 July 2021

Netflix encoding experts revel in per-shot techniques

There was one final workshop at last week’s Picture Coding Symposium 2021 that Faultline could not resist dipping into. After all, who can ignore what Netflix has to say about the future of video compression – at such a critical juncture in streaming history.

Kyle Swanson, an encoding engineer at Netflix, posed a simple yet strange question to attendees of such a highly technical industry event – is improving video compression still relevant?

He produced a satellite image of the sun rising over the earth, showing quite a bit of banding artifacts, something you wouldn’t see with the naked eye but are common encoding artifacts observed on a daily basis by the Netflix encoding team when looking at compressed videos. Using this to answer his own question, Swanson believes improving video compression is still relevant today because video traffic is increasing at a rapid pace, with varying bandwidth availability, although there is work to be done as not everyone is lucky enough to boast a robust network.

Looking at the Netflix encoding pipeline, the SVoD royalty takes a HTTP Adaptive Streaming approach to encoding, meaning it provides the exact same content at a variety of resolutions and bitrates. “We do that to give clients options when it’s time to stream. Putting the decision on the client side avoids rebuffering, usually informed by size of chunks to download, and bandwidth available at the time, and maybe buffer that the client maintains,” said Swanson.

An example demo showed two titles at 1050 kbps bitrate. One showed a 2D animation with a perfect picture, while the second showed an unwatchable action shot. Trying the same again at 5800 kbps, there was no difference with the 2D animation image, while the action scene at 5800kbps was transformed from a blurry, blocky mess to something pleasant to the eye. “Hopefully, this convinces you enough not to build a bitrate ladder using fixed bitrates. It’s not sufficient. Fixed bitrate is extremely content dependent – simple content looks great at low bitrates, while complex content looks bad. A high fixed bitrate ladder would mean everything looks great but is very wasteful,” added Swanson.

Netflix is also a fan of content-adaptive encoding, using tools to search and probe the input signal and apply complexity analysis to this. Netflix then tunes the encoder and selects parameters to maximize quality while minimizing bitrate, based on the information to come from this analysis. Privy to this, Netflix took its experience in per-title encoding and moved into per-shot encoding, which is not a term we are familiar with.

According to Swanson, it was a reasonable assumption to make the jump to a shot-by-shot basis, so a characters’ face on screen, for example, gets a less aggressive bitrate and those bits can instead be spent on scenes that need them. He described per-shot encoding as a “really powerful concept”.

These tools have been packed together by Netflix in a framework called Dynamic Optimizer (DO). The DO allows Netflix to chunk movies according to scene boundaries and carry out complexity analysis on a shot-by-shot basis, from where it can then optimize for its objective.

“We end up with something that looks as good as it can for a video quality metric, for as cheap as possible, by which we mean lowest possible bitrate. We have reduced bitrate pretty dramatically since,” continued Swanson.

As well as being a household name, Netflix is also famous in the industry for technological innovations including its VMAF (Video Multimethod Assessment Fusion) video quality metric (credit to Beamr too). VMAF is a measure of human perceived video quality in the presence of encoding artifacts, that evaluates video quality on a scale of 0-100.

VMAF is meant to be quite robust, predicting subjective quality consistently across scenes and genres. The word fusion in VMAF is key – representing where human vision meets machine learning in a fusion of worlds. This fusion of video quality metrics is being used to train support vector machines (SVMs), with studies showing fusion methods outperforming other metrics.

VMAF has been open source since day one, and VMAF is now twice as fast as a year ago, according to Swanson, thanks to a new mode for codec evaluation called NEG (no enhancement gain) which measures gain from compression and ignores gain one might get from processing video with something like sharpening.

Not stopping there, Swanson also wants to see VMAF used in codec evaluation, even when a black box proprietary encoder is installed, to cancel out enhancement gain.

Refreshingly, he admitted VMAF is not perfect, having come across some corner cases where it fails to predict quality, leading to visual artifacts in the encodes.

Handing the baton over to Mariana Afonso, Research Scientist for video algorithms at Netflix, we heard about how Netflix has recently introduced contrast-aware multiscale banding index (CAMBI), an algorithm that takes inputs with multiple bits and handles images with spatial dithering. CAMBI came about because VMAF doesn’t currently capture banding artifacts (also known as false contouring), that appear on high PSNR/VMAF encodes.

Netflix is hoping to open source CAMBI very soon. Based on data of 86 videos, Afonso presented plotted scores showing CAMBI being highly correlated with mean opinion scores, unlike PSNR and VMAF. Compared with other open source algorithms, CAMBI has been able to outperform two others – BBAND (Blind BANding Detector) and the DBI (deep banding index).

Turning to video codec standards at Netflix, Afonso talked proudly of her team’s active contributions to AV1 in areas including providing source material from its own content catalog, as well as working on a decoder model. Even though the AV1 specification is finalized, encoders can keep improving in efficiency and speed, which is why Netflix worked on the development of SVT-AV1 (scalable video technology for AV1) together with Intel, to increase AV1 adoption and provide a platform for experimentation and research. This is the first mention of SVT-AV1 we have heard for some time.

Netflix is actively trying to expand the reach of AV1, carrying out testing on more clients (hardware and software), as well as expanding support for AV1 in the content archive itself, while also improving and optimizing encoding algorithms.

“Typically, from one generation of standards to the next, the goal is to increase compression efficiency by 50%, with 10x reduced encoding complexity and 2x reduced decoding complexity. However, achieving this is getting harder and harder. New tools mean larger encoder search space, meaning more and more coding tools in the standards, which leads to very asymmetric encoding/decoding complexity,” explained Alfonso.

If a tool or codec is too complex, software decoders may only be able to support low resolutions – and that is not ideal, she added.

Being a research-focused event, Afonso made sure to sign off PCS 2021 by highlighting a few great opportunities for research and collaboration for PCS 2021 attendees, in areas including intra-coding, improved residual coding, neural-network-based coding tools, film-grain synthesis, in-loop resolution adaptation, artificial content-specific encoding, HDR-specific encoding, and banding artifacts reduction.