Your browser is not supported. Please update it.

12 October 2018

Machine learning gets more open source wins from Microsoft and Nvidia

The AI and machine learning community is inextricable from open source software at this point, and there is no sign of this changing. In the past week, Microsoft and Nvidia have both unveiled new open source projects, driving home the idea that the majority of the money that will be made here is through providing services.

It’s a contentious point, but boils down to the fact that most companies using AI-based technologies are going to be buying it from a cloud vendor, packaged as part of a larger suite. The market for on-premises appliances seems very small, and while bespoke software integrations are useful for specialist applications, the bulk of usage is going to be through platforms like Google’s GCP, Amazon’s AWS, and Microsoft’s Azure, along with the likes of HPE, IBM, Oracle, and SAP.

However, that’s a pretty small group of companies that the hardware side of the AI world is able to sell into. In terms of volume, there’s around a dozen major clients for the server components like GPU and FPGA cards, and of course, Intel is flailing wildly trying to create a stranglehold on the data center – painfully aware that it risks being sidelined as these other architectures outperform its x86 CPUs.

Intel’s dilemma is complicated by the somewhat uncertain future of enterprise computing workloads. There won’t be much enthusiasm for rewriting those enterprise tools to run on GPUs or FPGAs, but there’s definitely interest in using these processors to accelerate certain workloads – likely ones that have very predictable functions that would suit an ML-based system.

There’s little room for startups in a world so dominated by giants. As discussed in last week’s coverage of the Cloudera and Hortonworks merger, these titans can simply snap up the more promising startups or dedicate a small team of engineers with access to vast resources to master the open source elements. They have enough cash to buy their way into any market they like, and the sales channels to quickly turn a profit.

Microsoft announced that it had open-sourced Infer.NET, releasing it via GitHub (now owned by Microsoft). Microsoft says that it lets users create bespoke learning algorithms for their specific data models, so that they don’t have to try and kludge something together using standard algorithms.

Work on the tool began in 2004, and Microsoft has used it for film and movie recommendations, player game matching in multiplayer Halo 5, as well as in elements of its Office suite and Azure. Now that it is on GitHub, developers will be able to pick it up and run with it.

Yordan Zaykov, Principal Research Software Engineer Lead at Microsoft Research, said “Infer.NET was initially envisioned as a research tool and we released it for academic use in 2008. As a result, there have been hundreds of papers published using the framework across a variety of fields, from information retrieval to healthcare. In 2012 Infer.NET won a Patents for Humanity award for aiding research in epidemiology, genetic causes of disease, deforestation and asthma.”

Zaykov goes on to say that Microsoft has already taken steps towards integrations with ML.NET, Microsoft’s other open source project that is essentially a machine learning framework for its .NET programming architecture.

Nvidia’s news is the bigger of the two announcements. At its GTC Europe conference, it unveiled Rapids, an open source platform that is aimed at promoting GPU-acceleration in data science and machine learning applications. At launch, Cisco, Dell EMC, HPE, IBM, Lenovo, NetApp, Oracle, Pure Storage, SAP, and SAS have signed on in support. Named applications include predicting credit card fraud, forecasting retail inventory, and understanding customer purchasing behavior.

Unsurprisingly, Nvidia’s GPU sales figures stand to gain if Rapids helps drive demand for the processors. It cites an estimated value for the data science and machine learning server market of $20bn annually, which it says brings the total value of the high performance computing (HPC) market to $36bn, when you factor in deep learning and scientific analysis.

Rapids is a suite of open source software libraries that are compatible with GPU-accelerated analytics. Nvidia has been working on it for the past two years, apparently in close collaboration with ‘key open source contributors.’  The benchmarks trotted out by Nvidia claim a 50x speed improvement using Nvidia’s DGX-2 system, compared to a CPU-only configuration – which is to be expected, given that the stack is optimized for GPUs, not CPUs.

Nvidia CEO Jensen Huang said “building on CUDA and its global ecosystem, and working closely with the open-source community, we have created the RAPIDS GPU-acceleration platform. It integrates seamlessly into the world’s most popular data science libraries and workflows to speed up machine learning. We are turbocharging machine learning like we have done with deep learning.”

The announcement has a very long list of supporting quotes, from companies expressing their support for Rapids. The stack is available now from its website, although Nvidia is working on creating a containerized version for its GPU Cloud. Rapids is available under an Apache license, and promises to let users execute end-to-end data science and analytics pipelines entirely on GPUs, with the DataFrame API for integrations with other applications.