Ubisoft flaw-spotter AI helps debug software, including itself

The highlight of French gaming company Ubisoft’s recent Developer Conference in Montreal was not the latest gaming titles, but a new AI tool to assist with debugging. This spotlighted how crucial and time-consuming testing and debugging has become in videogame software development, although that is true for almost all other fields of computation. Depending on the project type and scale, software testing and debugging consumes 50% to 75% of time and cost, so it is not surprising that a lot of effort has already been devoted to automating as much of this as possible.

Gaming, with its dependence on highly efficient software at the cutting of animation and graphics, tends to come in at the higher end here, with Ubisoft reckoning bug hunting during the development phase consumes 70% of costs. For this reason, it has been collaborating with universities in its region including McGill and Concordia, to develop the software for its new development debugging tool called Commit Assistant.

Noting that eliminating bugs retrospectively from code during field testing is even more expensive than during development, Commit Assistant aims to avoid bugs being committed to the new code in the first place. This challenges the assumption that bugs are inevitable, which has underlined software development throughout its history so far, certainly for safety critical systems where the consequences of errors can be fatal. When fly-by-wire was developed by Airbus in the 1980s software for functions such as controlling flaps and engine thrust was written twice or sometimes three times independently by different teams who were not allowed to communicate during development. The aim was to create duplicate or triplicate paths to avoid bugs creating single points of failure in a system pathway.

On the whole this proved successful, but Ubisoft’s experience suggests there is a systemic aspect to bug creation with a tendency for the same ones to crop up more than once, which would tend to compromise the value of the forced diversity approach employed in “fly by wire”. Ubisoft exhaustively analyzed code from its software library dating back 10 years, applying techniques under the banner of Machine-Learning to collect instances where mistakes had been made by programmers in the past, as well as the fixes that were applied.

This was then incorporated into Commit Assistant and is now being applied to ongoing software development in the hope of spotting bugs at source as they are coded. Of course, it won’t catch all bugs so at this stage there will still be a need for vigilance by programmers and thorough testing afterwards, which raises one of the issues as such tools become more widely employed. They will be counterproductive if they result in programmers become lazier or subsequent system and field testing being less rigorous.

Ubisoft’s head of R&D, Yves Jacquier, noted that this approach required not just a lot of data but also huge computational power to work. The latter has been lacking until recently and its availability is the reason for all the excitement and hype that has been generated around AI in general. For Ubisoft this is early days and it is too soon to judge how successful its Commit Assistant will prove to be.

But there are plenty of other AI related initiatives in the area of software testing and debugging. The underling principle is not so different from other application areas such as security monitoring, where the objective is first to catch threats already known about, then extrapolate to related threats and finally start detecting totally novel threats by identifying associated unusual patterns of activity. The same applies to debugging, with Ubisoft perhaps currently between the first and second stages, getting to the point where it can detect bugs related to previous ones but not yet totally unexpected ones. Bugs vary from routine coding errors to fundamental flaws in system design and a tool such as Commit Assistant cannot be expected at present to combat the latter.

Interest in AI-based testing has been sparked not just by the potential cost savings but also by fundamental changes in the way software is being developed, driven particularly by the trend towards so called agile methods and DevOps where development and operations are more closely aligned. The idea is to be able to bring out new features more quickly and shorten the cycle of specification, development and testing. As the cycle shortens testing itself becomes an increasing bottleneck because the time taken to debug does not reduce in proportion to code size. As a lot of the testing is relatively routine and repetitive it can be automated and is being, although we would hesitate to call this AI but merely application of rule-based programming, generating and optimizing test cases, while computerizing tedious analysis tasks.

There are also a number of projects that do more greatly deserve the epithets of AI and ML, with one of the most promising being DeepXplore developed by Columbia University in New York. This was designed to combat the increasing challenge of testing AI and ML applications themselves, with a particular focus on self-driving cars and malware detection, but is equally applicable to any sophisticated software.

The idea was to apply neural techniques to so called deep learning systems that adapt to operational feedback employed in safety critical systems. Neural networks combine multiple layers of points that represent events connected to each other with different weightings which can be adjusted through the learning process to improve accuracy of output or predictions. They had been applied before to software testing but Colombia University’s innovation was to integrate several such neural network-based systems to cross check each other’s results. This is almost the automated equivalent of the forced diversity used in fly-by-wire, but applied to testing, reducing the chance of bugs being missed, while cutting down on the need for manual checks. In effect the neural networks automate not just the testing but also the checking.

This leads to the point that AI and ML based systems themselves raise the bar for testing and create new challenges resulting from the lack of transparency over their logic. By definition, ML-based systems adapt and therefore in effect or even in reality program themselves and potentially introduce new bugs, as well as encoding biases of human origin. The appeal is that they yield valuable applications more quickly than traditional methods but at the cost of losing some control over testing and verification. This lack of transparency over logic, or the inability of the system to explain its reasoning, has been called the dark secret at the heart of AI.

This deficit is widely recognized by the leading players in the field with respect to debugging, so it is not surprising to hear Google’s Director of Research, Peter Norvig, recently calling for radical overhaul of the entire debugging tools set.

One problem, according to Norvig, is that in some applications of AI or ML it is not always obvious what the correct answer or prediction is so that the training set cannot always be determined without ambiguity. Humans may disagree whether the color of a turquoise dress is closer to green or blue and there is no definitive answer, so testing has to take account of some uncertainties, which has not been the case for conventional software.

On the other hand, AI can extend and improve the accuracy of traditional software testing. There is a well-known and long-established principle of debugging called code coverage, which involves ensuring that every active line of code is exposed to as much testing as possible with different inputs. There is no substitute for longer exposure in the field, but testing tries to anticipate as far as possible these conditions without waiting for them to arise in operation with more serious consequences. Advanced testing systems can verify that all the code has been thoroughly covered.

The objective is then to home in on just those parts of the code set that do need some manual attention, with the future aim being to reduce the level of human intervention further through deeper automation and even automated code correction. The Massachusetts Institute of Technology (MIT) has developed a package which it describes grandly as the precursor of an immune system for AI applications. It sounds somewhat like Ubisoft’s process of exhaustively mining existing code, but taking it a stage further by identifying extracts of code than can be snipped out of one program and inserted in another to effect a repair when a bug has been found.

This has been designed to be independent of the programming language by applying a common symbolic expression to describe test sets, applied first to the recipient software to identify a fault then to the “donor software” to find the fix and make sure it works, before transplanting the code and finally verifying it in the new setting. While there are obvious questions like being sure the code transplantation does not have unintended consequences elsewhere in the recipient application, these are being addressed and it points the way forward to a world of intelligent software testing that can even cope with elaborate AI based systems themselves.