The SRT protocol gets its timing right to support live video streaming boom

Live video streaming is increasingly done over mobile devices, causing many strains on the network and on the user’s quality of experience. The fuss over Verizon’s updated deal with the US NFL (National Football League) highlights the importance of this type of content to consumers and their service providers. But behind the scenes of these big negotiations, carriers are also racing to improve the efficiency of delivering live video, and may find a new open source protocol called Secure Reliable Transport (SRT).

As for Verizon, the operator is reported to have paid more than $2bn for the rights to distribute live professional football games to mobile apps, on Monday, Thursday and Sunday nights, as well as post-season and Sunday afternoon games. In fact, the content will be available to customers of other operators too, as Verizon will distribute it over-the-top via its Yahoo, Yahoo Sports and Go90 mobile services, as well as on its own cellular network. The aim is to bolster its Oath digital content and advertising unit, which includes the Yahoo and AOL acquisitions, as it seeks to add new revenue streams and customers to its traditional business model. Oath generated about $2bn in revenue in the latest quarter, but its long term contribution to Verizon’s growth is still in doubt.

As part of the NFL deal, Verizon has given up its former exclusive rights all NFL streaming content, but it seems to believe the new terms will give it more opportunities in mobile advertising, which will be critical to its digital media expansion, but takes it up against Google and Facebook. Jennifer Fritzsche, an analyst at Wells Fargo Securities, wrote in a client note: “In our view, this announcement seems to be more about developing its mobile advertising platform—which has clearly been a top focus for Verizon. Between all its mobile and digital properties, Verizon estimates it reaches more than 200m monthly unique users in the US. Verizon has stated its goal for Oath to contribute ~$20bn in revenue by 2020.”

One of the technical decisions for operators which are focusing heavily on video streaming will be the best protocol to support, to prevent over-burdening of the network and to deliver a strong user experience. Providers of OTT live services are certainly not struck for choice over streaming protocols, but they may find that few, if any, combine the low latency and high quality they want without any buffering, especially over mobile links.

One protocol, Secure Reliable Transport (SRT), has emerged as a promising candidate for many live streaming services, which is why it is gaining growing support from key technology vendors in the OTT arena, such as Harmonic, Limelight, Kaltura and Brightcove, as well as founder members of the alliance promoting it, Haivision and Wowza.

Other big players are pursuing different avenues, but given the growing concern over latency for live online delivery, SRT has a good chance of becoming dominant. Of course, to succeed in contemporary OTT ecosystems a protocol has to be integrated into relevant platforms and software, so an important move was made when the SRT Alliance was founded in April – to make the protocol available to developers in open source.

This led to several early adoptions of SRT, with Canadian platform LiveScale incorporating the technology in the latest version of its enterprise SaaS (software-as-a-service) platform in October. This also involved using Haivision encoders to integrate SRT with HEVC (also known as H.265 and MPEG-H Part 2), the latest video compression protocol from the MPEG LA body – an important step as HEVC is gaining traction and likely to be a de facto standard.

So SRT is the latest in a line of protocols that attempt to solve an old problem for IP networks – how to combine the best of the two higher level network mechanisms for transporting packets from source to destination across the network via multiple nodes. These are TCP (Transport Control Protocol) and UDP (User Datagram Protocol).

TCP is connection-oriented in establishing end-to-end paths through the network, while UDP is connectionless, so that IP packets are released without any delay into the network and then each router decides which route it should take next, on the basis of traffic conditions or other priorities.

The former has the advantage of having built-in error correction with packets being re-transmitted if they are dropped or arrive corrupted, but at the expense of latency. It is also a compromise in that, unlike other transport mechanisms that had evolved for non-IP or hybrid networks (such as the ill-fated ATM), while it does establish a fixed end-to-end path for the duration of a session, it does not guarantee the bandwidth. It is therefore susceptible to congestion and needs mechanisms to respond to sudden spikes. As a result, it is still non-deterministic, so that it is impossible to tell what the latency will be or how long it will take to re-transmit packets. Therefore it needs to set a maximum time window for that to happen.

By contrast, UDP has no in-built error correction in its original form and so is faster, but often at the cost of unacceptable loss of packets with serious impact on quality. There is no guarantee of delivery and, unlike TCP, there is no flow control so that packets may be delivered out of order.

Numerous attempts have been made to blend UDP and TCP to retain the advantages of both while mitigating the downsides as far as possible, with real time traffic or high speed file transfer particularly in mind. All have only partially succeeded and been overtaken by increasing demand for performance and quality, as well as changing requirements. The latest obsession is low latency, driven by the needs of live OTT content, especially sports.

Earlier attempts to unite TCP and UDP have all largely failed on that count. Just a few are worth mentioning briefly for context, one of the first being Reliable UDP (RUDP), developed by Bell Labs in the late 1990s as an Internet draft and later adopted in a different form by Microsoft in its Mediaroom IPTV platform, which still lives on under Ericsson. This could be called TCP-Lite, in that it took UDP and added some TCP-like error correction features which helped Microsoft to improve reliability of its IPTV service in the early days, but did not gain much traction because it was too slow and unreliable over unmanaged IP networks, so in a sense inheriting weaknesses of both UDP and TCP.

Meanwhile Real Time Protocol (RTP) had emerged as the favored IETF (Internet Engineering Task Force) protocol, again based on UDP but with more robust error mitigation. It became the technical foundation for Voice over IP, but eventually ran out of steam for high resolution live video. It has the ability to compensate for jitter (variation in delay) and can also re-assemble packets into the right order at the destination, which is highly desirable for video, but at the expense of some delay.

However, RTP did not enjoy universal take-up. It was overlooked by Adobe, which instead adopted Real-Time Messaging Protocol (RTMP) as a result of its acquisition of that protocol’s developer, Macromedia. This became an integral part of Adobe’s Flash player which gained widespread adoption for early OTT services.

But use of RTMP has declined in line with the decline of Flash and this has helped create a vacuum for new protocols to fill. One is WebRTC, which is being promoted by Google and enjoys backing from much of the open source community. It was designed for peer-to-peer communication, chiefly among small groups, and is based on RTP, although with an option to default to TCP if errors exceed a set threshold.

Given the association with Google, it is no surprise that it works well with the VP8 and VP9 codecs (challengers to HEVC and the MPEG family), although also with H.264. Although it can achieve very low latency, it inherits some weaknesses of RTP, and does not scale up well to very large numbers of users on a multicast basis. Also, it does not operate efficiently in a video streaming environment where HLS or MPEG DASH are used to break the video into chunks for transmission at multiple bit rates to cater for varying network and sometimes client capacities.

This is why momentum has built so quickly behind SRT, representing the best effort yet to strike the right balance between latency, video quality, bandwidth and available computational power. There is only so far that latency can be reduced given the delay imposed by switching and the speed of light or electronic signal transmission. Equally it is impossible to avoid the trade-off between latency, quality, power and avoidance of buffering. Having a cache buffer insulates against temporary drops in network bandwidth without buffering but inevitably adds to the latency budget. Similarly, improving compression quality adds to processing overhead, which again increases delay for live transmissions. Encryption is also an overhead that can increase delay.

In fact, SRT takes account of these processes better than its predecessors, supporting standards based encryption directly for example. It strips down processes to protect against jitter, packet loss and bandwidth fluctuation into small modules which are wrapped directly around the UDP stream, which tests have shown minimize the associated delays. Since it acts just as a wrapper around the content it can work efficiently with all the principal codecs, including MPEG-2, H.264 and HEVC.

A key point is the efficiency of error recovery, which has frustrated previous efforts to improve sufficiently on TCP performance. On this account SRT has done well to build on the work of yet another earlier protocol, again an enhancement to UDP, called UDP-based Data Transfer Protocol (UDT). This was designed for transporting large datasets efficiently over IP networks, where TCP was also inefficient in this case because of the accumulative latency of multiple small retransmissions to compensate for packet loss. SRT then incorporates further improvements for live video transmission, including the encryption and measures to reduce direct link latency.

The UDT technology groups re-transmission of dropped packets into batches with just periodic acknowledgements to confirm delivery and report loss. This improved throughput of large files also provides some basis for improved error handling for video in SRT, because it is not necessary to respond to every dropped packet, since a few can be tolerated.

Furthermore, SRT supports some FEC (forward error correction), which enables almost real-time compensation for some level of errors at the cost of a little extra bandwidth consumption. This FEC is combined with an adjustable receive buffer enabling performance, reliability and latency to be tuned to varying network conditions. In fact the ability to detect and adapt to real time network conditions between the two end points is one of the factors cited by the SRT Alliance as critical for keeping latency almost constant while minimizing risk of buffering, although allowing for variations in quality which after all is embodied in the overlying streaming mechanisms such as HLS and DASH.

Of course, SRT is unlikely to be the last streaming transmission protocol but is probably the first to give such high priority to low latency and therefore looks like playing an important role in emerging live OTT services.

Meanwhile, an ETSI group, Next Generation Protocol (NGP), is looking at future replacements for TCP, especially for 5G networks. It is examining the options for a protocol that would support continuous mobile connectivity, high throughput and ultra-low latency, in order to support a range of use cases, including live video streaming and live virtual reality 360-degree video, and eventually the Tactile Internet, which will require latency below 1ms.

In its initial white paper, published last year, the ETSI NPG group wrote: “Standards bodies such as IETF and 3GPP have been the driving force behind the success of the Internet and its integration with mobile communications; however, such organizations tend to solve problems in a segmented manner, focusing on a specific protocol layer or service requirement. NGP, on the other hand, emphasizes a holistic approach and broader scope across various aspects of the current network functions and operations.”

It has its eyes on 5G as the catalyst for a new generation of low latency services, including streaming. “New network and protocol architectures will help reduce the overall end-to-end latency and enable new products and services. This applies equally to fixed networks; however, the 5G timeline is an interesting opportunity for implementing NGP and interworking with TCP/IP as a first step,” wrote the team.

The aim is to overcome the “fundamental limitations of TCP/IP and associated protocols (such as 3GPP GTP)” and the first specifications have already been published for a year, though the process of seeing them incorporated in 5G platforms will be a complex one. The NGP group aims to influence the key communications and Internet standards bodies (3GPP, ETSI, IEEE, IETF and ITU-T) to adopt its recommendations.

The group defines key scenarios to evolve the current IP suite architecture, whose protocols were defined in the 1970s, with a very different vision of the Internet in mind. ETSI now aims to drive harmonized requirements which will be optimized for multi-access communication including wireless, wired and cellular.

“Current and future use cases include 4K videos on various devices, massive IoT, drone control or virtual reality to name but a few: use cases that have nothing to do with those of the 70s,” said Andy Sutton, chairman of the NGP ISG. “A modernized network protocols architecture had to be triggered and this is why NGP ISG was created.”

Sutton, who is also principal network architect at BT/EE in the UK, added: “The TCP/IP protocol suite has undoubtedly enabled the evolution of connected computing and many other developments since its invention during the 1970s. NGP ISG aims to gather opinions on how we can build on this momentum by evolving communication systems architectures and networking protocols to provide the scale, security, mobility and ease of deployment required for the connected society of the twenty first century.”