In a move that could shift the course of multiple technology markets, Google will soon launch a cloud computing service that provides exclusive access to a new kind of artificial-intelligence chip designed by its own engineers.
CEO Sundar Pichai revealed the new chip and service this morning in Silicon Valley during his keynote at Google I/O, the company’s annual developer conference.
This new processor is a unique creation designed to both train and execute deep neural networks—machine learning systems behind the rapid evolution of everything from image and speech recognition to automated translation to robotics. Google says it will not sell the chip directly to others. Instead, through its new cloud service, set to arrive sometime before the end of the year, any business or developer can build and operate software via the internet that taps into hundreds and perhaps thousands of these processors, all packed into Google data centers.
The new chips and the new cloud service are in keeping with the longterm evolution of the internet’s most powerful company. For more than a decade, Google has developed new data center hardware, from computer servers to network gear, to more efficiently drive its online empire. And more recently, it has worked to sell time on this hardware via the cloud—massive computing power anyone can use to build and operate websites, apps, and other software online. Most of Google’s revenue still comes from advertising, but the company sees cloud computing as another major source of revenue that will carry a large part of its future.
Dubbed TPU 2.0 or the Cloud TPU, the new chip is a sequel to a custom-built processor that has helped drive Google’s own AI services, including its image recognition and machine translation tools, for more than two years. Unlike the original TPU, it can be used to train neural networks, not just run them once they’re trained. Also setting the new chip apart: it’s available through a dedicated cloud service.
Today, businesses and developers typically train their neural networks using large farms of GPUs—chips originally designed to render graphics for games and other software. The Silicon Valley chip maker nVidia has come to dominate this market. Now Google is providing some serious competition with a chip specifically designed to train neural networks. The TPU 2.0 chip can train them at a rate several times faster than existing processors, cutting times from as much as day down to a several hours, says Jeff Dean, who oversees Google Brain, the company’s central AI lab.
Amazon and Microsoft offer GPU processing via their own cloud services, but they don’t offer bespoke AI chips for both training and executing neural networks. But Google could see more competition soon. Several companies, including chip giant Intel and a long list of startups, are now developing dedicated AI chips that could provide alternatives to the Google TPU. “This is the good side of capitalism,” says Chris Nicholson, the CEO and founder of a deep learning startup called Skymind. “Google is trying to do something better than Amazon—and I hope it really is better. That will mean the whole market will start moving faster.”
Still, arriving at a new chip first doesn’t guarantee Google success. To take advantage of TPU 2.0, developers will have to learn a new way of building and executing neural networks. It’s not just that this is a new chip. TPU 2.0 is also designed specifically for TensorFlow, software for running neural networks that was developed at Google. Though Tensorflow is open source software available to anyone, many researchers use competing software engines, such as Torch and Caffe. “New forms of hardware require new optimizations,” Nicholson says. “Every time we optimize for a new chip, it takes months.”
A few weeks before Google introduced TPU 2.0, Yann LeCun, Facebook’s head of AI research, questioned whether the market would move towards new AI-specific chips because researchers were already so familiar with the tools needed to work with GPUs. “They are going to be very hard to unseat,” he said of GPUs, “because you need an entire ecosystem.” All that said, Google will continue to offer access to GPUs via its cloud services as the blossoming market for AI chips spans many different processors in the years to come.
A New Way
Neural networks are complex mathematical systems that can learn discrete tasks by analyzing large amounts of data. By analyzing millions of cat photos, for instance, they can learn to identify a cat. By analyzing a vast database of spoken words, they can learn to recognize the commands you speak to your digital assistant. At Google, neural networks even help choose search results, the heart of its online empire.
These systems are fundamentally changing the way technology is built and operated, all the way down to the hardware. Unlike traditional software, these system must be trained. They must, say, analyze a few hundred million cat photo to learn what a cat is. Companies and developers undertake this training with help from GPUs, sometimes thousands of them, running inside the massive computer data centers that underpin the world’s internet services. Training on traditional CPU processors—the generalist chips inside the computer servers that drive online software—just takes too much time and electrical power.
For similar reasons, CPUs are ill-suited to executing neural networks—that is, taking what they’ve learned about how to identify cats in photos, for example, and identifying them in new ones. Google designed its original TPU for this execution stage. Offering a chip that handles training as well represents a major step forward.
Dean said the company built this new chip at least in part because its machine translation models were too large to train as fast as the company wanted to. According to Dean, Google’s new “TPU device,” which spans four chips, can handle 180 trillion floating point operations per second, or 180 teraflops, and the company uses a new form of computer networking to connect several of these chips together, creating a “TPU pod” that provides about 11,500 teraflops of computing power. In the past, Dean said, the company’s machine translation model took about a day to train on 32 state-of-the-art CPU boards. Now, it can train in about six hours using only a portion of a pod.
That kind of speed advantage could certainly attract outside AI researchers. AI research is an enormously experimental process that involves extensive trial and error across enormous amounts of hardware. “We’re currently limited by our computational resources,” says Isaac Kohane, a professor of biomedical informatics and pediatrics who is exploring the use of neural networks in healthcare and has discussed the new chip with Google.
But the success of Google’s cloud service will depend not only on how fast the chip is and how easy it is to use, but on how much it costs. Nicholson believes that if Google offers the service at a much lower cost than existing GPU services, it could build a significant foothold for its larger cloud computing efforts. “If they make it free or next to free, people will use it,” he says,” and they will make people dependent on their cloud infrastructure.”
Along those lines, Google has already said that it will offer free access to researchers willing to share their research with the world at large. That’s good for the world’s AI researchers. And it’s good for Google.