Why is it that music and radio streaming services have been so slow to adopt voice functionality? Music apps offered by Amazon, Apple and Google have proprietary voice-activated digital assistants, which the likes of Spotify and others have integrated support for on smart speakers and other devices, but this week Pandora has – bizarrely – become the industry’s first pure play audio streaming outfit to offer in-app voice control.
There are early signs of a resurgence following the $3.5 billion purchase of Pandora by SiriusXM in October. We were the only outlet at the time to suggest the takeover by the Liberty Media-owned firm, if handled correctly, could create a third surprise contender in the music streaming space. This week’s move is exactly the sort of direction we had hoped Pandora would head under new ownership.
About 3 months after Pandora first added support for Alexa, the troubled US internet radio darling has decided the opportunities are too great and therefore tightly integrating voice into the app itself without Alexa is the way to go, becoming the second company to do so only after Amazon with its Music app. However, Pandora used its announcement to take a dig at Alexa, claiming its new feature, called Voice Mode, comes with more functionality including open-ended queries, interactive requests and directional requests.
Voice Mode is built on a voice and conversational AI platform from song recognition firm SoundHound called Houndify, which before today had all but vanished with Shazam taking the market (now under Apple ownership), along with what Pandora calls proprietary Speech-to-Meaning and Deep Meaning Understanding voice recognition and understanding technologies.
What we can gather from SoundHound’s technology suggests the Houndify AI is rather simpler than the company would want to let off. It has taken traditional terms used for conversational technology techniques and churned out buzzwords like “Speech-to-Meaning” and “Deep Meaning Understanding”.
Unfortunately, the Houndify website is more than reluctant to explain how the engine works. By meaning, does it mean picking up and processing notoriously tricky conversational nuances like sarcasm and irony, or does it mean no accent is too strong, or perhaps it has mastered the art of every single English language dialect?
Faultline Online Reporter has reached out to Pandora for clarity but in the meantime, our gut instinct is that Speech-to-Meaning is essentially a pretty bog standard prediction engine, processing speech in a generic manner and replying in such a manner that it feels more human-like. Nevertheless, becoming only the second company to add in-app voice control is a major achievement and one more streaming services are bound to have in the pipeline.
Pandora has certainly made a small step in the right direction to overcoming the countless minuscule obstacles in the art of conversation, yet both Pandora and SoundHound have slacked somewhat on the technological background we really crave, most likely well aware of how competitive the voice market is and therefore are keeping their cards close to their chests. Pandora users can find out exactly how sophisticated the system is for themselves this week though, with Voice Mode already going live on iOS and Android.
So, after waking Voice Mode with the phrase “Hey Pandora”, users can ask open-ended queries such as “play something different” or “play more stuff like this”. We’re pretty sure Alexa caters for these types of simple requests. Thematic requests are another feature, delivering personalized music based on each user’s unique tastes, moods, and favorite activities like “play something for my workout” or “play music for relaxing.”
Pandora may not be the first name on people’s lips when talking technology trailblazers, but the company is notable for being one of the only players in the streaming space to have built its own CDN entirely. Apple, Facebook, Amazon, Google, Microsoft, Twitter and Netflix – all significantly larger entities – have too built their own CDNs from scratch, which we think speaks volumes.
Another contributing factor to Pandora’s recovery was the deal struck with Comcast at the tail end of 2017, prior to its acquisition, in what was the first full integration of music streaming with a voice interface by a major legacy pay TV operator. At the time, Comcast pointed to explosive growth in music streaming on TVs and this is precisely the market SiriusXM is infiltrating.
Two years ago, SoundHound raised $75 million in funding to further the development of its Houndify voice-enabled AI platform for businesses and developers. At the time, it claimed Houndify had the world’s fastest speech recognition software and said its Collective AI tool enables customers to extend the functionality of existing knowledge domains without having to fully access underlying libraries. Investors in the round included Nvidia GPU Ventures, Samsung Catalyst Fund, Nomura Holdings, Sompo Japan Nipponkoa Insurance, and RSI Fund.
Pandora Chief Product Officer Chris Phillips said, “Pandora is the leader in personalized audio entertainment, and millions of our listeners are already loving the experience we’ve created on smart speakers and other voice-enabled connected devices. With Voice Mode, we are introducing an even more natural and conversational way for listeners to discover new music and enhance their experience directly in the Pandora mobile app, like getting recommendations from a friend who really knows you.”