SMPTE unbundles audio, video and data in latest workflow standard

SMPTE (Society of Motion Picture and Television Engineers) has reached an important milestone in standardization of uncompressed video over IP by specifying how to unbundle video, audio and ancillary data over IP within production and editing workflows. This will help unleash the power and flexibility of IP infrastructure at the studio and contribution level especially by removing the constraints of traditional SDI (Serial Data Interface) transport for video.

The new standard just published is ST 2110 Standards for Professional Media Over Managed IP Networks, although the first three parts of it, ST 2110-10, 2110-20 and 2110-30 covering video, audio, timing and synchronization between them, were announced at IBC 2017. The big deal is flexibility through separation of streams, which in turn will encourage engineers to exploit features of IP networks to introduce new features and improve quality of the audio and video separately. It will make it easier to introduce immersive features, for example exploiting object based audio for personalization.

To some extent the new standard is a catch up, since the migration from SDI to IP has not so far brought great benefits to production engineers, because the old environment had been tried and tested over many years. This was admitted by SMPTE President Matthew Goldman, who is also senior VP of technology, TV and media at Ericsson, when he said, “Professional media is a uniquely challenging field because of its real-time nature and high quality-of-service requirements. The standardization of SMPTE ST 2110 documents provides broadcasters, producers, and media technology suppliers with the tools they need to meet these requirements while working in the IP realm.” The prize is the ability to use standard IP networks and IT infrastructure across the end to end video chain, blurring the distinction at the production and contribution end between studios and remote facilities including outside broadcast set ups.

This story began in 1989 when SMPTE introduced SDI in the first place to transport uncompressed and unencrypted digital video signals. SDI came to be used to carry video embedded not just with the audio, but also all the ancillary data including captions, subtitles, active format description, time codes and dynamic range parameters among others, known collectively as VANC (Vertical Ancillary). One big advantage was that synchronization came automatically with SDI since the embedded video, audio and VANC were tightly coupled from a timing perspective.

At first this was preserved over IP networks by simply encapsulating SDI within it. In the old SDI days the video, audio and VANC payload was unpacked on the destination device for various functions within the workflow, such as monitoring, testing and management.

For years not much changed until IP started to penetrate contribution and workflow a few years into the new Millennium. As IP contribution emerged, JPEG and MPEG 2 became used for compression at this level and standards were agreed for transporting SDI signals over IP, with SMPTE 2022 the main contender. But this merely encapsulated the SDI into an IP packet network so that Ethernet could be used as the link level transport mechanism.

At the end, SDI was unpacked in the same way as before, preserving synchronization, but failing to exploit the flexibility of IP, given that each packet after all is an independent entity and can be part of any stream. SMPTE 2022 was too rigid and inefficient.

That is why SMPTE conceived 2110 as a new mode of transportation designed from the ground up for IP. It does not affect the way video or for that matter audio and data are created at source or processed at destination, but just changes the packaging. The receiving device simply strips the payload, whether video, audio or VANC, from the IP packet stream, but this time without SDI. SMPTE 2110 therefore has to impose timing.

This is achieved by the subsidiary standard ST 2059, which distributes time stamps to all receiving devices, using the Real Time Transport Protocol (RTP), developed originally as the foundation for transport of audio in Voice over IP (VoIP). RTP runs over the User Datagram Protocol (UDP) in conjunction with RTP Control Protocol (RTCP) to monitor QoS and crucially to help synchronize different streams, whether video, audio or VANC.

The continued reliance on RTP might raise a few concerns at a time when another protocol, SRT (Secure Reliable Transport) is emerging as a favorite for low latency streaming. As we reported last week, SRT incorporates more efficient error correction and operates better in a streaming environment. However, ST 2110 is designed specifically for uncompressed video where stream latency is less of a concern. RTP was originally adopted for encapsulation of SDI over IP and was cited in an SMPTE RFC (Request for Comment) in 2005. It looks like a case of persisting with the devil you know.

There is some confusion over the extent of the standard, which is supposed to cover just transport but also higher levels relating to discovery and interoperability within a modern distributed production environment. It has five parts, SMPTE ST-2110-10 dealing with system timing and synchronization, having evolved from an earlier incarnation known as SMPTE 2059 parts 1 and 2. Second is SMPTE ST-2110 – 20 handling uncompressed video, also known or VSF TR-03 or IETF RFC 4175. Thirdly is SMPTE ST-2110 – 30 doing the same for uncompressed digital audio, based on AES (Audio Engineering Society) 67.

These last two, SMPTE ST-2110 – 20 for video and SMPTE ST-2110 – 30 have evolved into sequels, SMPTE ST 2110 -21 and SMPTE ST 2110 -31 respectively to optimize transport over managed IP networks and take full account of the packetized structure, supporting traffic shaping. SMPTE ST 2110 -31 can handle additional digital formats for representing uncompressed audio in addition to the most common PCM (Pulse Code Modulation).

Then SMPTE ST 2110 – 40 does the same job for the Ancillary Data, also being known as SMPTE ST 291 or RTP, with no equivalent to the enhancements for video and audio. Finally SMPTE ST 2110 – 50 is simply a new badge for the original SMPTE 2022 part 6 for encapsulating SDI over IP, also known as VSF TR-04.

Arguably after all this the more important piece of the standards jigsaw for IP migration in production environments is only just emerging within this SMPTE ST 2110 standards set, the one called Professional Media Over Managed IP Networks. This specifies carriage, synchronization and description of the separate elementary essence streams over IP whose transport is defined by the standards just released that we have discussed.

Only when these final standards are ratified will it be possible to achieve that goal of separately routing and breaking out audio, video and ancillary data streams to simplify tasks like adding captions, subtitles and teletext, as well as processing of multiple audio languages and types.

Confusingly this last objective, although part of the 2110 standards set, is based on a related but distinct standards activity called The Joint Task Force on Networked Media (JT-NM). This is sponsored by the SMPTE along with the Advanced Media Workflow Association (AMWA), European Broadcasting Union (EBU) and the Video Services Forum (VSF). This is concerned with interoperability above the transport layer, with two main parts, the first being IS-04 for Discovery and Registration, which is essentially a peer-to-peer mechanism allowing devices to find each other on a link by link basis. Secondly IS 05, Connection Management, allows connections to be prepared and scheduled singly or in batches in a more efficient way.

It is therefore premature to talk about completion of the tool set for migration to an all-IP production environment, given that the JT-NM’s roadmap has several years at least to run. The discovery, registration and connection management are part of what it calls phase 3 concerning auto-provisioning and automated resource management, set to run until 2020. The transport of uncompressed essences just completed was phase 2. Phase 4 is the final phase currently envisaged and deals inevitably with extension of production into the cloud, under what is called “Dematerialized Facilities”.