Progress towards lightweight prediction of QoE for OTT services

It is well known that OTT services require different mechanisms both for measuring and ensuring QoE (Quality of Experience) than legacy broadcast. For one thing, the factors affecting QoE are rather different, such as jerkiness, freezing and buffering, in addition to pixilation and other visual artefacts. In fact, modern OTT-based services using adaptive bit rate streaming exploit their more reliable transport protocol to eliminate various visual artefacts, but at the expense of delays and stalls due to buffering, as well as increasing latency. This means that OTT services require a different set of KPIs (Key Performance Indicators) as surrogates for the Mean Opinion Scores (MOS) representing actual viewer experience.


The challenge for providers of video QoE measurement tools has always been to find effective surrogate KPIs that represent viewer perception as closely as possible, given that there is neither time nor resource to obtain direct feedback. For this reason, the Streaming Video Alliance has introduced a set of four primary metrics that define the OTT experience. First up is video start-up time, the lag between initiating a play and rendering of the first frame. Second is re-buffer ratio, the rebuffering time divided by the total playback time, with a score of 0 being perfect. Then there is average media bitrate and finally video start failure, a condition satisfied when the first chunk of a video is received within a specified cut-off time.

Of course, in addition to these OTT-specific KPIs, there are the measures of image quality itself, where the aim is to estimate MOS scores. Several models have been developed for IP-based delivery in general, including IPTV services, with the two main categories being bitstream evaluation and direct image analysis. Bitstream evaluation can sometimes be restricted just to the IP header without even looking at the video payload at all, which is taking surrogacy to its ultimate level. On the other hand, full reference image-based QoE evaluation would be the gold standard, comparing all decoded frames against a reference source, typically the original.

In practice, for OTT services it is unfeasible to analyze the source directly or even inspect IP packet payloads and so attempts have been made to extract the bare essentials of several methodologies and resynthesize them into a lightweight QoE evaluation protocol. Such an exercise has just been conducted by the EBU (European Broadcasting Union), with promising initial results.

This has involved analysis to derive mathematical relationships between combinations of six parameters and MOS scores. The first three are packet loss, jitter measured as average difference between mean latency and latency of the sample, and then again start-up delay. Fourth comes underflow time ratio defined as cumulative duration of stalls, or the latency added as a result of them. Fifth is number of stalls, while sixth is duration of stalls and seventh resolution switches, that is number of transitions between two different segment qualities. This represents quite a broad spectrum of quality measures, so it is hard to see how it equates consistently to viewer perceptions, especially as the impact of some artefacts is so content dependent, or even frame dependent, as well as varying between diverse connected devices.

After all, a short glitch is more tolerable when the ball is out of play than when a goal is being scored. Nevertheless, initial experience of this lightweight protocol is promising and at least it gives something to work on and optimize in the field.