When it comes to bandwidth efficiency the big OTT streamers find themselves between a rock and a hard place. On the one hand they are under mounting pressure to deliver the best quality possible given limited bandwidth, but they have also been hounded by ISPs accusing them of swamping their networks with traffic. Netflix in particular has been at loggerheads with major broadband service providers in the US, such as Comcast and Verizon and so to an extent compressing its content into the least possible bandwidth, while maintaining the baseline quality offers an element of goodwill, as well driving competition against rivals such as Amazon, HBO and Hulu. After all, Netflix usage on its own accounts for almost 40% of all data consumed during peak times in the USA, while it is constrained on quality in some countries it has recently entered, where broadband bit rates are still low.
That is why Netflix has invested heavily in re-encoding its entire catalog over the last two years plus, since December 2015, to take advantage of advances in content-aware compression algorithms that complement the established codecs such as H.264 and now HEVC/H.265. This followed four years of R&D over content aware encoding in partnership with the University of Southern California, the University of Nantes and University of Texas in Austin, so it is clear how highly Netflix values the exercise. It is not alone of course, with its rivals also working on similar projects, although usually on a more ad hoc basis focusing just on selected content.
The Netflix project is by far the biggest to date and we hear it is close to completion, but the world’s biggest SVoD provider has indicated that this will not be the last move and that the whole process will then be repeated to take advantage of yet further improvements in content-aware encoding going down to the levels of scenes rather than whole items. The idea will be the same, to exploit the vast difference in the amount of data needed to represent a given perceptual quality, both within and between frames. Some frames comprise little more than a few blobs of color while others have great texture with large subtle variations that are much more complex to encode at a high quality, while obviously fast-moving action requires more data to represent the inter-frame changes.
The challenge has always been in matching human visual perception to encoding schemes and deriving a surrogate measure that can be applied automatically without the expense and time involved in testing with focus groups. In any case human perception is by definition subjective and so liable to yield inconsistent results unless very carefully controlled and not always even then. For example, if people are shown the same clip at a specified quality alongside others at a lower quality they are likely to rank it higher. There are also factors such as ambient lighting, screen size and viewing distance to take into account.
This means that the tide has swung towards objective testing with human perception recruited just to ensure that it does represent as closely as possible the actual visual experience rather than merely technical measures of quality. This is not just much faster and cost effective but will produce more accurate and consistent results providing the objective tests are aligned with human perception. For example, the scores will always be independent of the context and avoid any distortions resulting from mood or the environment.
A key objective measure that has evolved over recent years is the Video Quality Breakdown Bitrate (VQBB), the bit rate at which artifacts just become noticeable when a video is watched under ideal subjective viewing conditions. Ideal conditions include relatively low lighting and optimum ratio between viewing distance and screen size.
Determining the VQBB itself is not an exact science because viewers will differ somewhat over when quality starts to deteriorate noticeably, but they will normally agree over a relatively narrow range of bitrates. At high bitrates above this VQBB range, the compressed video will look pristine and be considered effectively visually lossless to everyone. Then at bitrates below that VQBB range all viewers with normal eyesight will agree artifacts are present.
To understand what Netflix has just done, we need to track back to its first round of encoding up until about 2011, based on the key ladder that gained traction in the early days of adaptive bitrate streaming. This was one size fits all applied universally to the whole catalog irrespective of content type, encoding at a variety of bitrates so that the stream could adapt to varying network conditions or client playback capabilities. The key ladder equated bitrate to resolution with typically up to 10 steps, starting at 235 Kbps for the lowest 320×240 resolution and ranging up to 5.8 Mbps for 1920 x 1080 “full HD” at the top. Since then as we know 4K has come along requiring additional rungs on the ladder at the top.
For its “second generation” encode Netflix has given each title a bespoke bitrate ladder tailored to its specific level of complexity, which has been the focus of the R&D. This has involved settling several fundamental questions, such as how many quality levels should be encoded for each level to register a just-noticeable-difference (JND) on viewers, as well as what the best resolution-bitrate pair is for each quality level. Another question is the highest bitrate required to achieve the best perceivable quality for each given title. The overall challenge then was to develop a production system capable of resolving these questions for each of up to almost 10,000 titles scalably and reliably.
The resulting algorithm was bound by various constraints, such as backwards compatibility to ensure all streams are playable on previously certified Netflix devices. This limited the possible resolutions to the same set as before, that is 1920×1080, 1280×720, 720×480, 512×384, 384×288 and 320×240. The same applied to the available bitrates, which increase in increments of about 5%. In any case it is obviously essential that the bitrates be spaced such that each successive one yields a perceptual increase in quality, which Netflix decided should fall just below a single JND. This means that whenever the next bitrate up is chosen, viewers can just about notice the difference.
Of course, the obvious question then is how much bandwidth all this effort has saved. Netflix has yet to reveal the answer, but it is likely to be close to or above 20%. This may not sound ground-breaking but will make a huge difference in reducing congestion on clogged US networks as well as elevating quality to acceptable levels in some emerging markets. In subjective tests conducted by video compression technology vendor EuclidiQ where 14 video clips were scored by 30 subjects under formal test conditions, the average bandwidth saving achieved while maintaining a given MOS (Mean Opinion Score) was 22.6%. This meant here that the firm’s own content-aware adaptations had saved 22.6% bandwidth over the H.264 codec on its own. MOS evolved for testing voice quality in the earlier days of telecoms, but has since been adapted for video, defined as the arithmetical mean of individual scores on a scale 1 to 5 from bad to excellent.
Now as Netflix comes towards the end of this re-encode it is considering embarking again on a third generation, going down to the scene level, which would account for the variation in visual information density within content. This would be even more computationally intensive but could potentially yield greater bandwidth savings because the variation in complexity within some content, especially movies, but even sports to an extent, is greater than the variation between content.
There is also scope in the OTT world for reducing bitrate by adapting to specific viewing conditions and display devices. A subjective test conducted by the University of California in San Diego found that when people were viewing on tablets on their laps, bandwidth savings of 29% on top of the H.264 codec could be achieved by filtering to allow for those conditions, without sacrificing perceptual quality. Of course, this requires knowledge of the viewing conditions, which can in principle be obtained through feedback from the device but would have to be done on the fly rather than as an advance encode.
What is clear is that as Ultra HD is rolled out over OTT services, conservation of bandwidth will continue to be a major focus for some time yet.