BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

We Need Next Generation Algorithms To Harness The Power Of Today's AI Chips

POST WRITTEN BY
Greg Diamos
This article is more than 6 years old.

At the GTC technology conference this year, NVIDIA launched their latest and most advanced GPU called Volta. At the center of this chip is Tensor Core, an Artificial Intelligence accelerator that that is poised to usher in the next phase of AI applications. However, our current AI algorithms are not fully utilizing this accelerator, and for us to achieve another major breakthrough in AI, we need to change our software.

The realization of this computing resource will advance and even create AI applications that might otherwise not exist. For example, by utilizing this resource, AI algorithms could better understand and synthesize human speech. Speech recognition systems will improve drastically, transcription of audio will be much more accurate, and computers will be given human-like voices that convey style and emotion.

The enormous potential of AI has prompted many companies to build powerful chips to enable a wide range of AI applications. For example, GPUs by NVIDIA and TPUs by Google.

One common thing about these chips is that they all have optimizations for a computing principle called locality. In order to achieve the performance benefits of locality, both AI chips and AI algorithms need to support it. At present, emerging AI chips provide the infrastructure (E.g. Volta’s Tensor Core) for this capability while many AI algorithms do not. In other words, the speed of advancement of these chips has exceeded the ability of AI algorithms to fully utilize them.

The first phase of AI chips was driven by parallelism, or performing many tasks simultaneously.

Training large neural networks on massive datasets exposed significant parallelism that could be readily utilized by existing parallel chips such as GPUs. However, there is only so much performance that can be extracted from parallelism. Eventually these chips run into the memory wall, or, the widening gap between computational and memory throughput.

To move to the next phase, AI chips also need to exploit locality. Locality is performing many tasks on the same data. For example, if you are at a grocery store and want to get every item on your shopping list, you might try to speed this up by asking each one of your friends to get one item on the list. This approach would be very parallel, but also inefficient, because you would probably end up sending different friends to pick up items that are right next to each other. A better approach would be to ask each friend to go to a different aisle and get all of the items in that aisle. This efficiency boost from locality allows algorithms to scale the memory wall.

The emerging generation of AI chips need algorithms with significant locality, but not all AI algorithms are currently up to the task. Some AI algorithms do not expose enough locality to fully exploit this new generation of AI chips. Computer vision algorithms have a leg up in locality due to their heavy use of convolutional neural networks, but the recurrent neural networks used in speech and language applications will need some changes to improve locality, especially for inference.

At Baidu’s Silicon Valley AI Lab, we are proactively trying several approaches to change our algorithms to harness the potential of locality, and early experiments show very promising signs of overcoming this challenge. For example, we developed Persistent RNN networks that improve the locality of normal RNNs, delivering a 30x speedup at low batch sizes. This is a good first step, but future AI chips will require an even bigger boost. Another possible direction is to merge ideas from convolutional and recurrent neural networks, but the best approach is yet to be seen.

AI algorithms based on deep learning are compute limited and new breakthroughs are enabled by faster computers. The current generation of algorithms has already enabled significant breakthroughs, creating gains in speech recognition, machine translation, and synthesis of realistic human speech. The hardware requirement for next phase of AI is already in place, and promising signs of early experiments lead us to believe we are also on the cusp of developing next generation algorithms that can harness the computing power of today’s AI chips and that can lead us to another breakthrough.