Need for convergence in Ultra HD technology and standardization

The second in our series on UHD development by Philip Hunter

The masterclass run by the Ultra HD Forum at IBC 2017 in Amsterdam this month was supposed to set the stage for the roll out of the latest Phase B specifications over the next five years, with the understanding that most broadcasters and operators will be preoccupied with Phase A for almost the first half of this period. In almost the same breath the Forum was indicating ever closer cooperation with the UHD Alliance was needed to ensure full interoperability from camera to TV and delivery of the best experience possible. There is little point, after all, optimizing the infrastructure for phase B if the viewing devices are unable to render the full experience or the cameras cannot capture it.

Broadly the Ultra HD alliance is responsible for orchestrating standards development and deployment within the infrastructure, while the UHD alliance is in charge of the CE and camera ends as well as the content itself. The Ultra HD Forum has been eager to promote the Phase B specifications and educate both vendors and operators over the implications and issues involved in deployment.

But the Forum is very much B2B focused and so as the more consumer facing organization the UHD Alliance is charged with getting the message across to users as well as CE makers. This is all enshrined in the Alliance’s Ultra HD Premium brand, which was developed as a guarantee that not just TVs, but also connected devices would be capable of rendering UHD content at full quality. The developers of this brand were acutely aware that consumers had been let down by earlier so called 4K branding w0hich covered screen resolution only and resulted in sales of TVs incompatible with the more recently introduced features, notably Wide Color Gamut, High Dynamic Range (HDR) and High Frame Rate (HFR), as well even as Next Generation Audio (NGA).

There is an issue here of mission creep in that the remit of UHD has been expanding, making it hard for all participants in the value chain to converge on a common set of ingredients at any one time. There is little sign of that ending with the Forum’s phase B, since there are other technologies that could be embraced within its remit in a future Phase C, including Virtual Reality (VR), Augmented Reality (AR) and possibly even voice integration. What is clearly essential now is close alignment between the Forum’s Phase B and the Alliance’s Ultra HD Premium. This was largely accomplished in the case of Phase A, with both sets of specifications including 4K resolution (3840 x 2160), 10-bit color depth, BT.2020 color space representation and HDR, both HDR 10 and Dolby Vision HDR. The Alliance branding specified 10-bit capture, distribution and playback for content, which implied that cameras should be capable of recording 10-bit footage to meet the standard.

Both standards were a bit vague about audio, but that was understandable at a time when NGA was not clearly defined. There was some confusion over what surround sound really meant and an expectation at first that it would be delivered over constellations of multiple speakers. But now it is clear that only a minority of users will want to deploy multiple speakers, although there will be demand for good quality sound bars given that standard TV speakers in flat screen displays are fairly poor. The focus now is on separation of sound into different elements which can be manipulated and turned up or down independently.

With the Forum making NGA one of the core planks of its phase B specifications the Alliance clearly felt it had some catching up to do and in August 2017 admitted Xperi Corporation, formerly Tessera Holding Corporation, as a member of its board. Based in San Jose, California, Xperi is best known for its chip packaging technology, but also owns several technology firms including DTS, which has developed a variety of multichannel audio and surround sound technologies. The alliance said this was in recognition of the importance of NGA to UHD, belatedly some might say.

In some cases, the Alliance has taken the lead, as with its Mobile HDR Premium specification and logo licensing specifically for battery-operated devices, as an addition to its certification and logo program for TVs, prerecorded content and Ultra HD Blu-ray players. This specification recognized three categories, smart phones, tablets and laptops, in all cases differing from the TV and Blu-ray equivalents to cater for the small displays and allow for power saving features. It has to cope with automatic adjustment of brightness on smartphones to ambient lighting for example, as well as the lower power.

The dynamic range specifies maximum brightness of 540 nits for mobiles, or up to 600 nits for some laptops, while HDR10 for TVs supports 4000 nits peak brightness, with 1000 nits being the current target. The nit is the common currency for emitted brightness, equal to one candela per square meter, where a candela is the luminous intensity of a monochromatic light source at a frequency 540×1012 hertz. Mobile HDR Premium is already supported by some smartphones, including LG G6 with Dolby Vision support, and the Samsung Galaxy S8.

This has implications for content producers, who must in turn make sure they adhere to the specification when transmitting UHD content to mobiles. Netflix is a pioneer here, having been streaming HDR to those phones supporting the specification.

Another key area where alignment is needed between the two Ultra HD standards groups, as well as content producers, is over the area of color volume. The idea is to match displays to the capabilities of the human eye at different levels of brightness or luminance. The human eye is remarkably sensitive to color, able to recognize up to 7 million shades by some estimates under ideal conditions, but much more sensitive at intermediate to high levels of brightness than at peak levels or low levels. Color volume has been invented as a term to describe the varying ranges of colors that the eye can detect (and that are therefore worth displaying) at the widely differing levels of luminance’s on an HDR display.

But this introduces complexities, in particular to ensure that consistency is maintained across the wide variety of consumer displays, as well as during the production and distribution processes. Clearly then this requires the Ultra HD Forum and UHD Alliance to work closely together and bring in all the relevant players, including content producers, broadcasters and device makers. The core standard here is the SMPTE’s ST 2094-10, covering dynamic metadata for color volume transformation. This describes how dynamic metadata varying in principle down to frame level, should optimize color mapping, for example when bringing down HDR to SDR (Standard Dynamic Range). The goal is to enable mapping from any production color volume to any consumer device as a foundation for future proofing HDR content.

The main point about ST 2094-10 and implementations of color volume mapping in general is that they should cater for differences in luminance levels across the ecosystem while preserving as much of the quality and experience as possible. For example HDR content may extend from very low luminance levels to 10K nits and also cover the full range of primary colors specified by the BT.2020 standard, while a display may have a slimmer primary range and only reach 1000 nits.

The process of transforming the range of luminance values in the content to those available in the display is known as tone mapping and is crucial for ensuing optimum quality. This highlights the difference between static and dynamic metadata. Static metadata is cast in stone and so can only really be optimized for mapping scenes at a given luminance, typically the brightest. In other words they can only exploit 2D color volume, being unable to cater for the third dimension of luminance. Dynamic metadata as supported by ST 2094-10 in principle can optimize every scene across all luminances. This will even improve the quality of content after conversion down to SDR displays because it will preserve the variable optimization by luminance, albeit within a lower dynamic range.

So color volume is a good example of where close collaboration is needed right across the content lifecycle.