There has always been a touch of the surreal about China’s facial recognition industry with the eye-popping sums raised by young startups and the extravagant claims of accuracy by a regime unconcerned over invasion of privacy. As we have noted before, when it comes to accuracy as well as scale the reality does not quite match the claims, but it suits the government to portray its systems as better than they are. It could even be argued that the perception of ubiquitous 24×7 surveillance will itself meet the objective of cutting street crime to the bone and there is some evidence that has been achieved.
What is beyond doubt is that the great Chinese CCTV deployment combined with the absence of squeamishness over privacy is providing massive amounts of data as fuel for the country’s big start-ups that are majoring in facial recognition. While their counterparts elsewhere are drawing back from involvement with any government surveillance initiatives, under pressure from lobbyists and customers, they are piling in.
Competition at the top table has become intense and is driving demand for cash, with Megvii, better known by the name of its facial recognition subsidiary Face++, currently seeking another $500 million from investors including Alibaba, China’s answer to Amazon. This would add to the $607 million already raised in six rounds since 2013, although the company was founded in 2011. It received its last major injection of $460 million in November 2017 when it was worth $1.5 billion but since then its value has soared to $3.5 billion today.
Its objective and motivation for the latest funding is to close the gap on the country’s runaway leader in facial recognition, SenseTime, which is the world’s most valuable AI start-up at $4.5 billion, having raised $2.6 billion since its founding in 2014 and also backed by Alibaba. Apart from being even more richly endowed, SenseTime has the advantage of being one of five companies selected by the Ministry of Science and Technology of China to establish the National Open Innovation Platform for Next-Generation Artificial Intelligence on Intelligent Vision, the others being Baidu, Alibaba Cloud, Tencent, and iFLYTEK.
But Face++ has made significant inroads into China’s nationwide CCTV roll out program targeting 600 million cameras equating to roughly one per 100 meters if they were laid out in an even grid, although of course far denser than that in urban areas. Face++ cameras are widely deployed now across its home city of Beijing in various settings, including the extensive metro network where they are used alongside palm scanning systems to speed up passenger throughput but also identify passengers attempting to board trains without paying.
The company is also expanding abroad, through distribution agreements to accelerate growth, for example in Thailand whose police departments have been seduced by the promise of being able to bear down on crime with less effort on the ground.
However, Thailand’s government might want to consider that the power of Face++ technology, along with that of its rivals, has clearly been exaggerated. The main issue concerns the combination of accuracy and recognition at the large scale of China’s 1.4 billion population. While recognition accuracy rates of 97% have been claimed, this is only the case at limited scale at present, even if that is improving all the time. Earlier in 2018 Face++ admitted it could only search about 1000 faces at a time against a national database given the computing resources it has available. To remedy this the company is investing in distribution of data preprocessing to hardware in the cameras themselves, which it believes will help scale up to much larger numbers of faces at a time.
It is also working on improving efficiency of its machine learning algorithms, having published papers on streamlining deep convolutional neural networks (DCNN) even further by slimming down the input variables in size but not number. DCNNs have even more layers than standard CNNs to give greater sensitivity in training but at the cost of generating still more parameters and threatening to overwhelm either network bandwidth or the resources of embedded processors in local processing.
As Megvii noted, a lot of R&D has already been performed to reduce computational effort on both general-purpose and specialized hardware with some success. More recently, Megvii published results suggesting that the storage and computational burden associated with executing DCNNs can be cut significantly depending on the scale of the task by reducing the number of bits used to represent each number in the calculations.
These are often small numbers that can be represented in just one or two bits. As the relevant paper pointed out, the complexity of say a multiplier in the execution is proportional to the square of the bit widths, so there is potential for great increases in the speed at which inferences can be drawn, as well as enabling models to be implemented in resource constrained processors suitable say for smartphones.
The approach is not confined to facial recognition but is applicable to other domains of machine learning such as identification of objects of any kind and segmentation of instances. The latter, sometimes known as semantic segmentation, is itself crucial to deeper image processing by enabling identification of individual objects within a frame, which could be items of food on a plate or the contents of a still life painting. Development on this front is crucial for many fields, including autonomous driving in distinguishing between objects that must be avoided at all costs and those that can more safely be run over.