Cloudera machine-learning platform wins big with Komatsu mining gig

Komatsu has signed up Cloudera to provide a machine-learning and analytics platform, to improve its heavy equipment offerings for global mining customers. Running on Microsoft’s Azure cloud, the Cloudera Enterprise system will be providing monitoring analytics for the machinery powering surface and underground mining, allowing Komatsu to better serve their customers. Komatsu says the new system has doubled its longwall mining production hours.

It’s a big win for Cloudera, a provider of a machine-learning platform and analytics tools, which is now the brains behind the JoySmart Solutions platform, which ingests, stores, and processes data from the machines. It is then used to monitor operations – working out what is ‘normal’ behavior, sending out alerts if dangers are spotted, and guiding managers on how to better use and maintain these incredibly expensive assets.

For Komatsu, the doubling of production hours is a huge benefit that it can use to sell to its customers. Being able to promise SLAs for machine uptime, to reduce the fear of the dreaded unplanned downtime, is an excellent service to sell. While Komatsu’s initial focus is on its mining offerings, the gargantuan excavators and dumper trucks that are becoming increasingly automated, it has forestry and construction verticals to expand the practice into.

The pair say that the mining machines present a significant challenge for most data processing systems. Assets like longwall miners, electric rope shovels, and wheel loaders, generate time-series metrics such as pressures, temperatures, and currents, which are then recorded as snapshots. A single machine can apparently generate thousands of these metrics each minute, resulting in around 30,000 to 50,000 unique time-stamped records every sixty seconds.

That’s a colossal amount of data, especially for deployments that comprise multiple machines. As such, the analytics platform needs to learn what counts as normal behavior, so that it doesn’t need to process every single data point equally – it can work based on deltas in the information, or on a simple parameter basis. This volume of data is also why edge-computing is attracting such attention, due to the bandwidth costs that would arise from trying to transport all the data back to a cloud.

“With increasing customer demand and more connected machines, we were looking at data growth reaching 30 terabytes per month. Our former environment limited our ability to scale, grow and innovate,” said Anthony Reid, senior manager of analytics at Komatsu. “With Cloudera’s modern platform, we use advanced data analytics and machine learning to power our IIoT success. We now provide customers with better recommendations on machine utilization and deliver services faster. In one example, we were able to make recommendations for a large coal mining company that enabled them to double the daily utilization of their Joy longwall system.”

Cloudera explains that Komatsu was struggling to scale its JoySmart Solutions IIoT offering, using its legacy data warehouse tech. This led it to partner with both Cloudera and Azure, to better analyze the data from its Joy-branded machines and also from third-party PLCs (Programmable Logic Controllers). Currently, the system ingests around 3TB of data per month, but this is expected to scale up to 30TB. All of that data needs to be available to be searched at a moment’s notice, and that’s a key part of what Cloudera brings to the table.

The deployment has apparently led to significant cost savings, with Lead Architect at JoySmart Solutions Shawn Terry saying that the new system allowed them to make decisions based solely on customer needs, rather than on what the infrastructure could manage. “We can now scale and grow incrementally at a reasonable cost. This allows us to extend our user base, delivering services faster and better.” The information will also help Komatsu to better design its next generation of mining equipment.

In a promotional video, Terry went on to say that Komatsu could now properly use machine-learning algorithms on the data, after consolidating its data – which was previously stored across multiple silos. Now that it is housed in a single Azure environment, it can deliver all of the data for all of its customers, and run learning algorithms on the data properly.

“Komatsu required a modern data platform that not only delivers next-generation machine learning and advanced analytics capabilities in the cloud, but is also scalable to increases in their customers’ equipment utilization and productivity,” said Dave Shuman, IoT and manufacturing industry leader at Cloudera. “With Cloudera, Komatsu can now use the vast amounts of data to help its mining customers manage increasing pressure to be environmentally smarter and more productive at a lower operation cost.”

Cloudera’s Wim Stoop, senior product marketing manager, said that most organizations know that IoT and machine-learning will bring them something, but that they are not sure exactly what. He pointed out that it is not as simple as slapping a sensor on a thing, and that sensor data is only useful when it is understood.

This is where the machine-learning aspect comes in, by combining the data with other contextual data sources, so that a platform can better understand the snapshots of the machines it is seeing. Stoop said that Cloudera’s namesake approach was greatly beneficial for data scientists, who tend to work in a very different manner to traditional scientists.

By this, Stoop meant that the data scientists’ demand for compute resources was very intermittent – requiring huge clusters for short periods of time, but then not needing them after their experiment or discovery. This approach would be prohibitive if you were buying the hardware yourself, but with cloud computing, the pay-as-you-go model makes it possible.

As for the Cloudera Enterprise platform, Stoop explained that it is comprised of around 25 open source projects, mostly involving Hadoop – a big data processing tool for distributed storage and computing clusters. In the Komatsu project, Cloudera used Apache Hbase, Apache Impala, Apache Kafka, Apache Spark, Cloudera Navigator Optimizer, and Open TSDB – programs that you can lose hours researching.