Adaptive bitrate-optimized video delivery has blossomed into an extensively researched topic given the high stakes of QoE for OTT video services. Such ABR algorithms make use of various modern machine learning techniques, which may soon be joined by a new kid on the scene called Fugu – a continual learning algorithm for bitrate selection showing early promise.
Fugu is capable of establishing the most efficient way to send streaming video to diverse real-world clients, according to a recent academic paper published by computer science researchers at Stanford University. Fugu was paired with Puffer, a public website developed to stream live TV, as part of a nine-day study of 8,131 hours of video streamed to 3,719 users. Conclusively, Fugu was found to reduce video stalls by between 5 times and 13 times compared with other algorithms used in ABR-optimized video delivery, while also improving picture quality and reducing quality variation.
When live streaming TV on Puffer, users who were randomly assigned Fugu streamed for longer on average before quitting or reloading content, when compared with buffer-based control, MPC (model predictive control), RobustMPC and Pensieve, a system that generates ABR algorithms using reinforcement learning. Crucially, the study was carried out in real-world environments, using real internet connections from a multitude of ISPs dotted around the US. Such an in situ endeavor sets this research apart, as network algorithm research is typically carried out using simulated data.
The creation of Puffer should therefore be heralded as a research breakthrough. Without the live streaming website, Fugu alone may not have garnered such widespread acclaim.
Both Fugu and Puffer are open source software sets available to the community, which the research team believe will serve as a helpful “medium-scale” stepping-stone for new algorithms where continual learning serves as a crucial component to improving internet video services.
So, what is continual learning and how does it work? In this case, Fugu was tasked with retraining a neural network each day over the nine-day study, from its experience in deployment over the prior week. This “retrained” neural network then projected the time taken to transfer each available version of the upcoming video chunks, based on primary metrics of recent history and internal TCP (transmission control protocol) statistics.
Results included finding that Fugu mitigated the risk of model mismatch by learning in the same environment where it was deployed. “To counter dataset shift, Fugu learns continuously from its users, producing testable predictions whose accuracy can be monitored. Fugu’s key innovations lie in explicitly considering the dynamic nature of today’s networks at several time scales,” states the report.
For some background, Fugu and Puffer were developed to address a fundamental issue with internet streaming (of which video accounts for approximately two thirds), whereby an OTT video service plans to deliver a live or on-demand stream to a variety of clients, yet connections between clients differ greatly in terms of time-varying capacity for throughput. A multitude of clients means the service cannot adjust the encoder configuration accordingly in real-time, so the service instead encodes the video into various compressed versions – at different quality, target bitrate or resolution.
The report notes that client session selection is limited from a menu of between five and eight alternative encoding versions used by most commercial services, whereby video is divided into chunks, typically between 2 to 6 seconds each, then encodes each version of each chunk independently. This means it can be decoded without access to any other chunks – giving clients the chance to change between different alternative versions at each chunk boundary.
Streaming services today generally use variable bitrate encoding, where chunks vary in compressed size within each stream. This yields better quality than constant bitrate encoding, according to the research.
Within the study, video quality was measured using structural similarity, an index measurement method for which the core algorithms were built some three decades ago, but more recently we have come to associate this with a newer pioneer in that field called SSIMWave. Fugu was found to have the greatest average quality and lowest rebuffering time, summarized by the table below with results showing BBR streams (a TCP congestion control algorithm developed by Google).
Fugu’s contributions to quantitatively improving QoE of internet video streaming can be summarized as follows:
- Training in the same environment as deployment.
- Retraining daily from experience over the prior week.
- Predicting how long would it take to transmit a chunk instead of estimating throughput.
- Making fuzzy predictions instead of point estimates.
- Combining model-based control and a neural network (known as model-based reinforcement learning).
- Incorporating information not previously used in ABR, such as internal TCP and congestion-control statistics.
The June 2019 paper from Stanford University researchers, in collaboration with Beijing-based Tsinghua University, has received recent praise from the likes of Thierry Fautier, Harmonic’s VP of Video Strategy and Chair of the Ultra HD Forum, who said there are “a lot of good things coming.”
“Based on Fugu’s results, we believe that continual learning of ABR and congestion-control algorithms in situ is a compelling research direction. Accordingly, we plan to operate Puffer for several years and will open it for use by the research community, letting researchers train and test new algorithms on randomized subsets of its traffic,” says a snippet from the report, hoping to encourage adoption.