Close
Close

Published

Graphcore builds dedicated ML accelerators as alternatives to GPUs

Graphcore has grown rapidly since being founded in 2016 in Bristol, UK, with over 100 staff and $110mn raised in funding, including a $50mn injection from Sequoia Capital, which has big investments in Apple and Google. Its success hinges partly on the pedigree of its cofounders, Nigel Toon and Simon Knowles, the latter having launched semiconductor company Icera in 2002 before selling it to chip-maker Nvidia for $435mn in 2011.

Nvidia is in fact a competitor to Graphcore, although the disparity in size means that the target markets, initially at least, will be quite different. Nvidia has to develop chips with wide applicability and therefore cannot dedicate them so much to machine learning (ML). Graphcore is doing exactly that with its imminent intelligence processing unit (IPU), whose architecture was designed in step with the proprietary software called Poplar that moves data between the multiple processors themselves and to external memory. Being dedicated to ML, the chips only need to perform low-precision floating-point computing for calculations and do this across a massively parallel structure comprising over 1,000 processors.

The firm contends that by matching the hardware design to the ML models the training process can be optimized better, and it has been tweaking the learning process to take advantage. The focus is on the method called stochastic gradient descent (SGD), which is widely used for training ML models. Gradient descent involves taking small iterative steps groping towards a local minimum, from which the curve ascends either side, which in ML equates to adjusting the weights of lines joining points in a graph such that the difference between predictions and actual events is as small as possible. In the case of facial recognition this would result in a system that identifies individuals with the lowest possible number of false negatives and positives, although with scope for trading between the two.

Then the word stochastic implies that training samples are either selected at random or shuffled to create a similar effect, rather than just using a single group. It calculates the error and updates the model for each example in the training dataset, which can mean it converges quicker so that learning is faster. It also has some technical advantages such as avoidance of “false” local minima, in a graph that might yield false positives in the trained system.

SGD also has some disadvantages, such as high computational cost because of the model updating for every example. That is where Graphcore comes in after its research found that smaller batch sizes worked even better than had been thought for a large number of ML applications, yielding more stable and reliable training. It has found that batch sizes as small as 2 and 4 are most effective (powers of two tend to work best). Such small batch sizes are very inefficient on standard GPUs but Graphcore argues that its radically different processor can execute the training much faster by a factor of at least 10 today, and potentially up to 100 in future, while needing less memory and converging much faster.

Then looking more toward applications, Graphcore has been studying the one-shot learning problem, noting that current deep neural networks are very poor at recognizing new objects they have only encountered a few times, which humans are much better at.

The company has taken its cue from a few recent papers and seminars that have developed so-called meta-learning, which sets out to find the models or algorithms best suited for a given task. A hierarchical approach is needed where the meta-learning system first finds the best network architecture, then the optimization algorithms that work best within that, followed by the parameters most appropriate for those algorithms. The model selected by this process is then applied to learn the relevant task or make the best predictions from a given set of data.

With the first chip only just coming, it is too soon to judge success, but the buy-in from prestigious VC partners means it has credibility and the funds to keep improving its processor. Even so, it has a long way to go to fulfil an ambition of becoming the UK’s next ARM Holdings or Imagination Technologies, both of which have quite recently fallen into foreign hands.

Close