How adding voice to MAM can make live sports smarter

Can voice help live sports content become smarter instantly? That was the burning question used to hook attendees into a webinar jointly hosted by machine learning specialist Speechmatics and media asset management (MAM) firm Tedial this week. As Faultline Online Reporter delved into MAM technology last week, where we discussed the triggers behind the trend, sliding voice technology into the equation was therefore irresistible, although we’re still unsure whether the question in case was actually answered.

It comes two months after Speechmatics incorporated its Automatic Speech Recognition (ASR) technology into the SmartLive sports system from Tedial. Tedial launched SmartLive at IBC last year, promising a revolution in sports production. It hasn’t quite delivered a revolution as far as we can see, but what SmartLive does is allow operators to search for comments made during a sporting match by automatically creating file locators and importing a file into a watch folder – from where it generates speech-to-text from the commentary track. Speechmatics says by allowing quicker access to content, production teams have more time for creative endeavors.

Ironically, this week’s webinar encountered more audio issues than we’ve had hot dinners, but we’ll give Speechmatics and Tedial the benefit of the doubt and put the technical problems down to misbehaving conference call software. Between bouts of technical problems, the webinar included a demo of the SmartLive ASR system, which in short can be used to create a piece of content from what a commentator said, which didn’t look infinitely complex even from a non-engineer perspective. So that’s a plus, but there are of course a few hurdles to overcome along the way.

Context is a big problem for the voice technology sector as we have mentioned in previous issues, so Speechmatics cited good results from a recent test in providing context from a transcript. For sports, broadcasters have a wealth of content to play with, but these come on different granularities. For example, identifying the word “Federer” would automatically put the context as tennis. This is a very basic example but adding context eases the entire process.

The second key use case for the combined technology is producing personalized highlights in seconds. SmartLive does this by taking metadata related to the event, team and player data (often from a third party data provider like Opta). After tagging, actions are then auto-clipped, for example a red card or goal in a soccer match, which helps create an automated storytelling system, for example a 2-minute highlights clip. Tedial said you can simply ask the system to tell a specific story and it will do the clipping for you. AI can improve the experience further, by taking components like actions, faces, sentiment, objects, and places into account.

But how can the use of transcription bring a real editorial value? Player names is a particular paint point apparently, so Speechmatics has this covered with millions of names in its database. Of course, sometimes pronunciations are not as expected but the technology can provide pronunciation hints to the speech-to-text system, added on a per-transcript basis meaning customers only need to worry about the event in process – not all player names in the world at the same time.

Making content consumable by the audience is often considered an additional step at an additional cost, according to the webinar hosts. This could be, for example, producing social streams with no sound or to viewers whose first language is not English. “We have enough info to produce captions but rely on partners to convert into captions, supporting output formats like SRT. There is a difference between getting words into chunks and captioning, with additional editing often required for closed captioning.”

Perhaps showing the value of MAM technology, the SmartLive features can either be bundled with the full package or Tedial can provide auto highlights with metadata plugged directly into a customer’s ingest workflow or video servers.

As for customer case studies, only one unnamed UK customer was referenced as using the joint transcription system from Speechmatics and Tedial right now and no details were provided.

So, can voice help live sports content become smarter instantly? Replace instantly with eventually and you have something closer to the mark.