Researchers from Japan’s Kyushu University have discovered that they can trick three-quarters of image-recognizing deep neural networks by altering just a single pixel in an image. Using three pixels achieved an 82% success rate, and five netted 87.3%. While difficult to weaponize, the research shows how fragile these systems are – despite what you will have heard through marketing materials.
Setting out with two objectives, to predictably fool a Deep Neural Network (DNN) and to automate said attack, the researchers found that an ‘adversarial perturbation’ of a single pixel could trick the DNN – an attack that would be very difficult for a human inspector to spot and address.
As the paper’s abstract notes, “the output of DNNs is not continuous and very sensitive to tiny perturbation on the input vectors, and accordingly, several methods have been proposed for crafting effective perturbation against the networks. […] It requires much less adversarial information and works with a broader class of DNN models.”
The paper also found that the attack would work without knowledge of the inner functions of the DNN, meaning that it could be used in a generic fashion, on many systems. All they needed to know were the ‘black box’ probability labels that the DNN outputs, and they could manipulate the images in the data set to trick the DNN into outputting incorrect labels.
However, the pixel changes were made to adversarial images comprised of 1024 pixels (32×32) – which are rather small. That single pixel represents 0.1% of the image data, but the researchers have apparently demonstrated that their new method of generating the adversarial images needed to trick DNNs is very effective in constrained environments. They argue that the findings can be beneficial for understanding the geometrical features of the DNN inputs.
A typical 10-megapixel camera will capture an image using ten-million pixel values, which provides a whole lot more data for the recognition algorithms to use in its analysis. As such, changing just a few pixels in a typical image will have a much smaller affect than the one seen in the 1024-pixel image that the attack was carried out on. When scaled up, that 0.1% threshold would be 10,000 pixels – interference that the system might be able to detect or discredit using other information available to it.
In autonomous systems like vehicles, a system would be fusing data from multiple sources (sensor fusion), in order to inform its decisions. Similarly, these images wouldn’t be static, as they would be being refreshed many times per second. Hopefully, multiple cameras, combined with LiDAR, radar, and the refreshing view of the object in question would prevent such a system from being bamboozled by such an attack.
But virtualized cloud-based applications might be better victims to the exploit, especially those without human oversight. On average, a modified image could be perturbed into 2.3 other classifications, with the best result seeing a picture of a dog labeled as all nine categories – airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck.
The attack seems proof of the vulnerability of processing low resolution images, which isn’t great news for training times. The huge training data sets that are used to teach the neural networks take up an awful lot of space, and an easy way of cutting cloud computing costs is to reduce the volume of that data – to reduce both storage and computation billings.
Compressing the images would be the most straightforward method of reducing the size of the training data set, but that introduces the problem of correcting for the bias of the compression method. Similarly, there’s a divide between the raw data captured by a camera and its digital representation. All manner of software can alter the pixels originally captured by the camera, so even ‘realistic’ images might be quite far removed from the sensor output provided by a camera in the field. That translation might complicate things for the systems.
The conclusion here is that a 0.1% change in pixels is enough to fool about 75% of DNN image recognition systems – a surprising one, given the praises often sung about the capabilities of AI. The counter to this seems to be more complex training data, both at a higher resolution, and more examples of ‘things’ to examine, including a lot more negative results and false positives. In larger images, that 0.1% might be enough for a system to detect a malicious attempt to fool it, but humans are going to struggle to spot these attacks with their naked eyes.
The pixel-based attack is exploiting the hashing functions that the neural networks use to spot common features shared between images The paper suggests that the hashing functions are not mature enough to avoid being exploited, and that the tiny change “catastrophically destroyed the ability of the algorithm to categorize the image.” It is reminiscent of another recent experiment that altered road signs to confuse the cameras inside self-driving cars.
Currently, image recognition algorithms are very good at spotting a particular object, while simultaneously being very bad at spotting others – the most used example being cats and dogs. Both have very similar characteristics, but to such an algorithm, one would be entirely alien, and the other very easy to categorize correctly.
The larger problem here is that the machine-learning functions behave in ways that their developers can’t explain. They know that a system can do a thing very well, but can’t actually show an outsider how it works. Compounding this issue is that this attack suggests that the functions are not complex enough to avoid being exploited by such a simple attack, and if the researchers can’t actually explain what’s going on behind the scenes, it’s hard to see how they could improve on that situation.
The only solution would seem to be a lot more training time for the algorithm and its model, using higher quality files and a lot more negative results. Unfortunately, that sounds like an expensive process to undertake. However, Max Planck researchers have just announced something of a reverse process, using a new algorithmic system to enhance pixelated images – which does look rather impressive.