Your browser is not supported. Please update it.

22 June 2018

IBM Debater shows how far natural language AI still has to go

IBM’s Project Debater should be congratulated on its success – matching two seasoned humans in debate. But on closer inspection, and contrary to media euphoria, it shows how much further AI-based natural language has to go before it really can match people, at least at the highest level. In some ways it flattered AI by coming out with what superficially sounded like compelling arguments, but were more like reasonable efforts in a school debating society. It represented significant progress but also reminded us of challenges like approaching absolute zero or reaching the speed of light – where it is those last few degrees or meters per second that are hardest to achieve.

It is not fair to call media reaction to IBM Debater a false dawn, more an overstatement of what was achieved. This has been a recurring theme throughout the history of natural language AI, dating back to the first demonstrations in the 1950s when false hopes were raised just because the systems were quite successful within very limited domains, confined for example just to small vocabularies and requiring gaps between spoken words.

The infamous example was the Georgetown–IBM experiment in 1954 where 60 Russian sentences were translated successfully into English, leading the authors to predict that machine translation would be completely solved within five years by 1959. This as we soon discovered grotesquely underestimated the exponential increase in complexity with expanding vocabulary and scope, with the same applying to speech recognition.

The same mistake has been made with Project Debater, not least in IBM’s assertion that this success was superior to Google’s AlphaGo in defeating Ke Jie, world champion at the game Go. This just sounded like sour grapes, given that previously IBM itself was considered the world champion of computer game playing after its Deep Blue had beaten Gary Kasparov at chess in 1997.

Go is a much harder challenge for computers to master because of the massive search space, making it impossible to out-calculate human opponents just by brute force. AlphaGo, with the help of deep machine learning, had scaled a summit of human achievement, while Project Debater has a long way to go to match say Oscar Wilde or for that matter Barack Obama. It looks superficially good compared only with less accomplished human orators, including some current holders of major office.

The debate pitted IBM’s Project Debater against two humans, one being Noa Ovadia, who was Israel’s national debating champion in 2016 and began working with IBM recently to oppose its machine. It took place on a stage in San Francisco and featured two debates involving each of the human speakers, one on whether there should be more publicly funded space exploration and the other whether more should be invested in telemedicine technologies. Each participant had four minutes to make an opening statement, then a four-minute rebuttal, and then a two minute-conclusion, with the time pressure naturally not being an issue for the machine.

When Ovadia argued money should be spent on more pressing needs than space travel, the machine replied, “It is very easy to say that there are more important things to spend money on, and I do not dispute this. No one is claiming that this is the only item on our expense list. But that is beside the point. As subsidizing space exploration would clearly benefit society, I maintain that this is something the government should pursue.”

Although a reasonable answer, this also exposed the limitation of the machine’s methods, essentially just cutting and pasting sentences from a massive database of documents, mostly newspaper articles and academic journals, amounting to hundreds of millions of words. This is what many humans do both when debating and also composing articles, but it does show up the big gap yet to be surmounted between that and the higher of level of spontaneous creation or synthesis that the greatest human orators or writers are capable of. That might seem like a small distinction but like those last few fractions of a degree to absolute zero is actually a yawning chasm.