Your browser is not supported. Please update it.

2 May 2019

Augmented Reality still stuck on UI

There is no doubt Augmented Reality (AR) is going to make a big impact on TV just as it has already done in gaming and will do in various spheres including education and architecture. But the starry-eyed prophesies of Mark Zuckerberg and others tend to fall down over considerations of user comfort or experience, which is what sunk the ill-fated 3D revolution in TV around six years ago. Take Zuckerberg’s assertion that AR glasses or contact lenses were the logical end games for AR, with the conventional TV being toast.

Not everyone wants to don glasses, never mind headsets, even if that avoids the need for a TV set at all, while many people baulk at wearing contact lenses and are often allergic to them. 3D TV failed partly because the benefits were slight for many users and partly because the experience gave many people headaches or made them feel sick. AR delivered through glasses or contact lenses may not do that but will still be uncomfortable or inconvenient for many people, certainly for general viewing. It is precisely this UI factor that has hindered AR and VR from making as great an impact in eSports that had been predicted, given that many gamers spend hours on end on their screens and so will not embrace a device that is really only usable in short bursts.

So, for now, we can discount the notion that AR is going to displace current devices such as TVs, tablets and smart phones. AR is however making a great impact as an adjunct to the UI of such devices and, more than that, ushering in significant new use cases. Categories include sports, interactive advertising, mapping and eSports, as well as those vertical applications such as architecture, education and industrial design.

Despite the caveat just mentioned, AR has entered public consciousness through its application in gaming and especially Pokémon Go, where the AR mode exploits the camera and gyroscope on the player’s mobile device to display an image of a Pokémon as if it were in the real world. But it is the imminent addition of AR to Google Maps that is likely to complete its coming of age by guiding users towards their destination through arrows and instructions overlaid on the real image of their surroundings through their smartphone camera.

In TV, AR is making its presence felt first in live sports production by allowing commentators or pundits to superimpose possible scenarios that never occurred on to real footage, as well as allowing overlaying of graphics and information. The possibility for exploring “what if” scenarios such as what might have happened if an event such as foul had not taken place, is also being explored by sports broadcasters such as Sky, Fox, NBC and ESPN. It is a big deal for pay TV operators because they see investment in AR as a defense of their position against both OTT insurgents and sports leagues going D2C. By offering sophisticated AR facilities that enhance the viewer experience and keep sports fans engaged, operators hope they can offer more than just the highest bid for sports rights.

It was not surprising therefore that AR dominated the Sports Video Group’s (SVG) fifth-annual Sports Graphics Forum in New York earlier this year, as well as featuring strongly at the recent NAB 2019. It also figured strongly around the same time at SVG Europe’s Football Summit in Paris where AR sports graphics was overlaid on the conference’s AV output for delegates, although this was a well-tried application already demonstrated at the FIFA World Cup in 2018.

AR is defined as the overlaying of digital content on the “real world” but that can range from insertion of basic graphics to extrapolation by generating virtual or fictitious content married with footage obtained from camera in the normal way. The latter can involve blurring fantasy with reality and that is where the appeal lies for gaming.

Currently, however, mainstream applications will be confined to combining two sources of real content, actual live footage itself with some digital imposition. That is the case with Google Maps, where the user’s smart phone camera produces the live footage just by direct viewing, serving two purposes. Firstly and critically, the image taken by the camera is used by the application to improve the accuracy of GPS, which currently can be out by up to a few meters which would be too much for the guided navigation. Google takes the GPS data to determine position roughly to within a few meters as before, but then applies machine learning algorithms to the images captured by the user’s camera to refine this down to under a meter. This then enables the second function of the camera, to display arrows and pointers identifying exactly where the user should proceed.

Whether this will be as revolutionary as the original Google Maps is doubtful as it does not add to the fundamental utility, to follow a route on a smartphone, but it is an impressive display of AR implemented in a small device. One of the technical challenges lay in running compute-intensive machine learning algorithms on smart phones, which Google has achieved with the help of TensorFlow Lite, a stripped-down version of its TensorFlow, designed to facilitate efficient manipulation of the multi-dimensional arrays of data called tensors data occurring in neural network computations. Tensors became famous for their use by Einstein in developing special and general relativity involving four dimensional arrays with time in addition to 3D space. But in machine learning computations there can be even more than four dimensions so an effective way of handling these arrays was critical.

Then in 2016, Google introduced its Tensor Processing Unit (TPU), an ASIC designed specifically to run TensorFlow calculations, which are now available within the Google Cloud Platform. But some of the calculations have to run in the device and smartphones do not have TPUs. To cater for this, Google in May 2017 announced TensorFlow Lite as a stripped-down software stack for mobile applications such as its mapping. Essentially this employs techniques for compressing the models to increase processing speed by four times or more, such as intelligent quantization where variables are truncated in various ways such as rounding or conversion of floating-point numbers to integers. It can also reduce the number of variables aiming to do this with minimal impact on accuracy or inferential power.

Although smartphones lack TPUs, they do have GPUs (Graphical Processing Units) for visual rendering and these run machine algorithms much more efficiently than CPUs having been optimized for handling arrays of graphical data aligned in columns and rows. To exploit this capability Google, in January 2019, released a developer preview of a mobile GPU inference engine for TensorFlow Lite, which should open the door to more powerful AR applications in smartphones.  Such developments will enable machine learning-based AR applications to be more robust and perform accurate anchoring of the digital data to the real world in more challenging conditions, such as when lighting is poor, or objects of interest such as faces are partly occluded or viewed at oblique angles.