AI filters and extremism – another question for ML censorship ethics

It is easy to see why the UK government chose an algorithm from London-based ASI Data Science as its recommended tool for lesser social media platforms to use in identifying and removing Islamic State (IS) propaganda. But the UK Home Secretary Amber Rudd has also indicated that the government may enforce the use of this tool for smaller online media platforms – that don’t have the resources to develop effective ways of identifying content deemed objectionable themselves, like YouTube and Facebook.

But they are employing a variety of methods designed to provide layers of defense against infringing content and it is a mistake for the UK government to place over-reliance on any one tool – which could also help stoke up a new kind of arms race over content that can bypass the filter. They risk creating a pointless exercise, if the AI-based filter is not good enough, and more concerningly, setting a precedent for filtering any content from the internet that a government disapproves of.

Riot pointed out recently how tools such as deepfakes, which map faces of people to others’ bodies, have opened up the potential for creating all manner of doctored or completely fake content with malicious intent – which would not necessarily be picked up by any software trained to recognize material originated by IS/Daesh, no matter how good. Such malicious content could, for example, prompt civil unrest by injecting a person into a controversial scene where he or she was not actually present, without alerting any algorithm designed to recognize certain characteristic features of a given genre of content.

Fortunately, terrorist groups have not been noted for their subtlety and in practice tools like the one ASI Data Science has provided will do a good job of rooting out a lot of subversive content designed to radicalize or rouse individuals. The UK government contributed £600,000 towards the creation of this tool, having already been convinced that the underlying machine-learning methodology would yield good results. This has turned out to be the case, with the tool in trials correctly identifying 94% of Daesh material and only incorrectly flagging 0.005% of uploaded videos that were not produced by IS.

This means the false negative rate is 6% and the false positive rate is 0.005%. In practice it is usually possible to tweak such algorithms to decrease the false positive rate at the expense of the false negative rate and vice versa. But for tasks such as this where the number of target videos is small compared to the total, it is most effective and efficient to minimize false positives even if that increases false negatives.

Suppose, for example, a mid-sized site had 100,000 daily uploads and of these 0.01%, or 10 originated from IS, which might be a typical proportion. With a false negative rate of 6%, this would mean less than 1 of those videos on average would fail to be identified and this might be picked up by other means, or perhaps a different tool, which is another reason for not relying on just one. Meanwhile the low false positive rate of 0.005% means that only 5 non-IS videos are incorrectly flagged for review before taking down, which is certainly a manageable number. If the false positive rate increased to say 0.5%, which still sounds quite low, the site would face having to assess 500 videos a day, which might be unmanageable.

The particular ML technique chosen by ASI Data Science is significant because it has gained a lot of traction over the last few years within the AI community for identifying relatively small numbers of items from large data sets. The tool is based on ASI’s customizable product called SherlockML, employing various techniques under the ML banner. The most important is one called t-Distributed Stochastic Neighbor Embedding (t-SNE), which has been taking the ML world by storm. This is despite the fact that t-SNE is really just a data exploration algorithm, paring down the data to make the subsequent training of the system for classifying objects such as images or videos easier and more likely to be accurate.

Its main point is that it has proved the most effective of any statistical method employed so far at preparing data sets where only a tiny proportion of the elements are of interest. This is the case for systems designed to recognize fraudulent transactions and also for identifying videos originated by Daesh among all the content uploaded to an OTT video or social media platform.

This is a complex problem because it is not just a case of training the system to distinguish between given features such as faces, but any videos produced by Daesh from any others. The first task is to strip the content of information that will not help with the identification, such as color given the videos are often of poor quality, and also aspect ratio, which is irrelevant to the content’s identity. Then t-SNE kicks in by reducing video sequences to their bare essentials, which is effectively a kind of high level compression. Essentially, it converts all the videos into a common form which differ as little as possible in the amount of information they contain, reduced to essential components suitable for classification algorithms to work on.

It is then possible to pick out combinations of characteristics that identify Daesh videos, each of which would not be sufficient on its own but which cumulatively are unlikely to be found in non-Daesh videos. The principle is similar to that of DNA typing in forensic science to identify individuals against a DNA database. This relies on combinations of DNA points which have one of four possible values for each human being. Obviously large numbers of people have the same value for any given one but as they are independent of each other it is highly unlikely two individuals other than identical twins would have say nine all the same.

The same principle explains why it requires extreme luck to win a major lottery, even though many tickets would have some of the numbers right. Given these examples it is easy to see that the great complexity and challenge lies in seeing the wood for the trees and reducing the data into clusters of patterns that can be compared along similar basic statistical principles.

Even so there is some way to go before the challenge can be considered solved, quite apart from the issue of fake content. For the big platforms like Facebook and YouTube, the number of false positives would still be large because of the sheer volume of uploads they get. Ensuring that none slip through the net is not yet realistic.