Facebook demystifies smartphone AI with open source building blocks

While Qualcomm, Intel and others battle for market advantage based on closely guarded R&D secrets, Facebook is pushing advanced technologies via a range of open source activities. From its OpenCellular base stations to its support for the OpenGL graphics language, the social media giant is keen to shake up the economics of hi-tech platforms to accelerate adoption. This, it hopes, will enable brand new ways of interacting with web services, which it will aim to define and monetize, in the same way that Google defined the current generation of internet usage with its search box.

At Facebook’s most recent @Scale event for software engineers, OpenGL was on show as a low cost way to deliver AI-enabled effects and user experiences to mobile devices. The aim of the event, which it holds in several US cities through the year, is to foster an open, collaborative ecosystem, to try to solve challenges of hyperscale data centers, big data and machine learning through open source solutions.

The company frequently contributes its own developments to move the process along, as it also does in OpenCellular and its Telecom Infra Project (TIP). For instance, it demonstrated image recognition running on smartphone cameras at up to 45 frames per second, using inhouse OpenGL-based inference code.

Facebook claimed these frame rates are higher than those for Qualcomm’s new Snapdragon neutral networking SDK, even though the social media firm has used only open source technology – and not the most modern option at that. Some developers are already urging it to adopt a newer API (applications programming interface) which would be easier to program. Khronos Vulkan and Apple Metal make mobile graphics programming simpler, but unlike OpenGL, are used only on a few high end smartphones.

Facebook expects to rely mainly on OpenGL for smartphone-based inference code for at least 2-3 years. With these techniques, it can deliver effects created using ML, to standard handsets, something it first showed off at its F8 developer conference in April.

The comment about Qualcomm might have emphasized the potential of low cost, open software, but it was not a hostile one. Qualcomm gave a presentation about its Neural Processing Engine (NPE) at the event, and Facebook has actually signed up to use the technology in its smartphone application. It says that using Qualcomm’s Adreno GPUs (part of Snapdragon), instead of a generic CPU design, will provide a fivefold performance improvement for augmented reality applications based on Facebook’s Caffe2 deep learning software (another technology it has outsourced). As well as Qualcomm, it has partnered with Nvidia, MediaTek and, reportedly, Intel to support Caffe2 on optimized hardware, as well as with cloud platform providers Amazon and Microsoft.

Both Qualcomm and Facebook are developing tools that will help to create the trained models that spot patterns in data, and so power the ML applications – and to adapt those models to meet a new requirement, without having to train it from scratch again. Eventually, just the model, and not the massive accompanying data set, can be exported to the end device – able to run on optimized hardware and software, instead on a massive cloud-based cluster.

Pre-prepared models from the Caffe2 ‘Model Zoo’ can be run with only a few lines of code, and so are suited to handsets, Raspberry Pis and other low power objects. Facebook said its collaborations with the chip and cloud providers “will allow the machine learning community to rapidly experiment using more complex models and deploy the next generation of AI-enhanced apps and services to optimize Caffe2 for both cloud and mobile environments.”

At F8, and now at @Scale, it was clear that Facebook’s efforts to extend its platform into every aspect of a user’s life, and even into the telco market, have included an increasingly large dose of artificial intelligence (AI). It aims to move well beyond social media and make its apps a hub for a user’s entire digital experience, and AI will be central to that.

At the 2016 edition of F8, CEO Mark Zuckerberg laid out a 10-year roadmap to connect the whole world with Facebook’s services, messages and APIs, including a plan to equip the Facebook/Messenger platform with AI, robots and virtual reality user interfaces.

This year, the firm has been talking about some of the specifics to flesh out that grand plan, and many of them rely on bringing AI to every device and user while reworking the user experience in a way that will put Facebook in pole position in the next generation, tactile Internet.

At F8, Facebook announced that it planned to turn smartphone cameras into AR platforms, showing off two of its inhouse camera designs, and a beta release of Facebook Spaces, a VR hangout lounge. This would be very much in line with Qualcomm’s own efforts. The chip giant has developed a ‘mobile brain chip’ called Zeroth, now part of the Snapdragon platform, and in October it unveiled a Snapdragon-based reference design for a camera enabled with machine vision, while Intel has acquired Movidius, a start-up and Google partner with similar capabilities (see separate item).

For Facebook and Caffe2, augmented reality (AR) is the low hanging fruit, but this is just a step on the road to redefining the user interface to web services. This is a process which has featured at F8, and in the social giant’s R&D, for some years, as it bids to outmanoeuver Google and others in shaping, and controlling, how people communicate with the web, how their experiences are personalized, and how the resulting data is analysed and monetized.

Virtual reality and AR, as well as AI-driven technologies like computer vision, voice interfaces and gesture control, will all be essential to this attempt. Voice and gestures instead of keywords, intelligent chatbots to support users, massive AI engines to return ever-more accurate and personalized responses to natural language questions – these are the keynotes of the new web and search experiences, and every major web player knows it needs to lead the way in order to retain its influence and and the best chance of monetizing the new conversations.

So Facebook expects developers to use Caffe2 to create applications which can leverage its vast user base and drive the web experience forward via chatbots, conversational interfaces (as Microsoft calls them) and robotics. And the logical next step from voice, gesture and vision interfaces is a computer-to-brain interface, to turn thoughts into text or actions. This would enable hands-free control of the environment, allowing a user to interact with menus or manipulate virtual objects. For plain old video, the functions would be as simple as pausing playback, or tagging and messaging friends to share in the experience.