SRT moves ahead in low latency protocol race

Providers of OTT live services are certainly not struck for choice over streaming protocols, but they may find that few, if any, so far combine the low latency and high quality they want without any buffering. One protocol called Secure Reliable Transport (SRT) has emerged as a promising candidate for many live streaming services, which is why it is gaining growing support from key technology vendors in the OTT arena, such as Harmonic, Limelight, Kaltura and Brightcove, as well as founder members of the alliance promoting it, Haivision and Wowza.

Other big players are pursuing different avenues but given the growing concern over latency for live online delivery, SRT has a good chance of becoming dominant. Of course to succeed in contemporary OTT ecosystems a protocol has to be integrated into relevant platforms and software, so a key move for success was making the SRT available to developers to open source at the time the SRT Alliance was founded in April 2017. This led to several early adoptions of SRT, with Canadian live platform LiveScale in October 2017 announcing its incorporation in the latest version of its enterprise SaaS (Software as a Service) package. This also involved integration of SRT with HEVC via Haivision encoders, which is an important requirement for streaming protocols as HEVC gains more traction.

SRT is the latest in a line of protocols that attempts to solve an old problem for IP networks, which is how to combine the best of the two higher level network mechanisms for transporting packets from source to destination across the network via multiple nodes. These are TCP (Transport Control Protocol) and UDP (User Datagram Protocol). TCP is connection oriented in establishing end to end paths through the network, while UDP is connectionless in that IP packets are released without any delay into the network and then each router decides which route it should take next if there is a choice, on the basis of traffic conditions or other priorities.

TCP has the advantage of having built in error correction with packets being retransmitted if they are dropped or arrive corrupted, but at the expense of latency. It is also a compromise in that unlike other transport mechanisms that had evolved for non-IP or hybrid networks, such as the ill-fated ATM, while it did establish a fixed end-to-end path for the duration of a session it did not guarantee the bandwidth. It is therefore susceptible to congestion and needs mechanisms to respond to sudden spikes. As a result, it is still non-deterministic, so that it is impossible to tell what the latency will be or how long it takes to retransmit packets and therefore needs to set a maximum time window for that to happen.

UDP by contrast has no in-built error correction in its original form and so is faster, but often at the cost of unacceptable loss of packets with serious impact on quality. There is no guarantee of delivery and unlike TCP no flow control so that packets may be delivered out of order.

Numerous attempts have been made to blend UDP and TCP to retain the advantages of both while mitigating the downsides as far as possible, with real time traffic or high-speed file transfer particularly in mind. All have only partially succeeded and been overtaken by increasing demand for performance and quality, as well as changing requirements. The latest obsession is low latency driven by the needs of live OTT content, especially sports.

Earlier attempts to unite TCP and UDP have all largely failed on that count. Just a few are worth mentioning briefly for context, one of the first being Reliable UDP (RUDP) developed by Bell Labs in the late 1990s as an Internet draft and later adopted in a different form by Microsoft in its Mediaroom IPTV platform, which still lives on under Ericsson. This could be called TCP-Lite, in that it took UDP and added some TCP-like error correction features which helped Microsoft to improve reliability of its IPTV service in the early days, but did not gain much traction because it was too slow and unreliable over unmanaged IP networks, so in a sense inheriting weaknesses of both UDP and TCP.

Meanwhile Real Time Protocol (RTP) had emerged as the favored IETF (Internet Engineering Task Force) protocol, again based on UDP but with more robust error mitigation. It became the technical foundation for Voice over IP, but eventually ran out of steam for high resolution live video. It has the ability to compensate for jitter (variation in delay) and can also reassemble packets into the right order at the destination, which is highly desirable for video, but at the expense of some delay.

RTP did not enjoy universal take up however. It was overlooked by Adobe, which instead adopted Real-Time Messaging Protocol (RTMP) as a result of its acquisition of that protocol’s developer Macromedia. This became an integral part of Adobe’s Flash player which gained widespread adoption for early OTT services. There were several versions, including RTMPE, which is RTMP encrypted using Adobe’s own proprietary security mechanism.

However, use of RTMP has declined in line with Flash and this has helped create the vacuum for new protocols to come in. One is WebRTC, given that it is being promoted by Google and enjoys backing from much of the open source community. It was designed for peer-to-peer communication, chiefly among small groups and is based on RTP, although with an option to default to TCP if errors exceed a set threshold. Given the association with Google, it works with the VP8 and VP9 codecs, although also with H.264. Although it can achieve very low latency, it inherits weaknesses of RTP and does not scale up well to very large numbers of users on a multicast basis. Also it does not operate efficiently in a video streaming environment where HLS or MPEG DASH are used to break the video into chunks for transmission at multiple bit rates to cater for varying network and sometimes client capacities.

This is why momentum has built so quickly behind SRT, representing the best effort yet to strike the right balance between latency, video quality, bandwidth and available computational power. There is only so far that latency can be reduced given the delay imposed by switching and the speed of light or electronic signal transmission. Equally it is impossible to avoid the trade-off between latency, quality, power and avoidance of buffering. Having a cache buffer insulates against temporary drops in network bandwidth without buffering but inevitably adds to the latency budget. Similarly, improving compression quality adds to processing overhead, which again increases delay for live transmissions. Encryption is also an overhead that can increase delay.

SRT takes account of these processes better than its predecessors, supporting standards based encryption directly for example. It strips down processes to protect against jitter, packet loss and bandwidth fluctuation into small modules which are wrapped directly around the UDP stream, which tests have shown minimize the associated delays. Since it acts just as a wrapper around the content it can work efficiently with all the principal codecs, including MPEG-2, H.264 and HEVC.

A key point is the efficiency of error recovery, which has frustrated previous efforts to improve sufficiently on TCP performance. On this account SRT has done well to build on the work of yet another earlier protocol, again an enhancement to UDP, called UDP-based Data Transfer Protocol (UDT). This was designed for transporting large datasets efficiently over IP networks, where TCP was also inefficient in this case because of the accumulative latency of multiple small retransmissions to compensate for packet loss. SRT then incorporates further improvements for live video transmission, including the encryption and measures to reduce direct link latency.

UDT batched retransmission of dropped packets into groups with just periodic acknowledgements to confirm delivery and report loss. This improved throughput of large files and also provided some basis for improved error handling for video in SRT, because it is not necessary to respond to every dropped packet, since a few can be tolerated.

Furthermore, SRT supports some FEC (Forward Error Correction), which enables almost real-time compensation for some level of errors at the cost of a little extra bandwidth consumption. This FEC is combined with an adjustable receive buffer enabling performance, reliability and latency to be tuned to varying network conditions. In fact the ability to detect and adapt to real-time network conditions between the two endpoints is one of the factors cited by the SRT Alliance as critical for keeping latency almost constant while minimizing risk of buffering, although allowing for variations in quality which after all is embodied in the overlying streaming mechanisms such as HLS and DASH.

SRT is unlikely to be the last streaming transmission protocol but is probably the first to give such high priority to low latency and therefore looks like playing an important role in emerging live OTT services.