OpenAI, the billion-dollar San Francisco artificial intelligence lab backed by Tesla CEO Elon Musk, just unveiled a new virtual world. It’s called Universe, and it’s a virtual world like no other. This isn’t a digital playground for humans. It’s a school for artificial intelligence. It’s a place where AI can learn to do just about anything.
Other AI labs have built similar worlds where AI agents can learn on their own. Researchers at the University of Alberta offer the Atari Learning Environment, where agents can learn to play old Atari games like Breakout and Space Invaders. Microsoft offers Malmo, based on the game Minecraft. But Universe is bigger than any of these. It’s an AI proving ground that spans any software running on any machine, from games to web browsers to protein folders.
“The domain we chose is everything that a human can do with a computer,” says Greg Brockman, OpenAI’s chief technology officer.
In coder-speak, Universe is a software platform—software for running other software—and much of it is now open source, so anyone can use and even modify it. In theory, AI researchers can plug any application into Universe, which then provides a common way for AI “agents” to interact with these applications. That means researchers can build bots that learn to navigate one application and then another and then another.
For OpenAI, the hope is that Universe can drive the development of machines with “general intelligence”—the same kind of flexible brain power that humans have. “An AI should be able to solve any problem you throw at it,” says OpenAI researcher and former Googler Ilya Sutskever. That’s a ridiculously ambitious goal. And if it’s ever realized, it won’t happen for a very long time. But Sutskever argues that it’s already routine for AI systems to do things that seemed ridiculously ambitious just a few years ago.
He compares Universe to the ImageNet project created by Stanford computer scientist Fei-Fei Li in 2009. The goal of ImageNet was to help computers “see” like humans. At the time, that seemed impossible. But today, Google’s Photo app routinely recognizes faces, places, and objects in digital images. So does Facebook. Now, OpenAI wants to expand artificial intelligence to every dimension of the digital realm—and possibly beyond.
In Universe, AI agents interact with the virtual world by sending simulated mouse and keyboard strokes via what’s called Virtual Network Computing, or VNC. In this way, Universe facilitates reinforcement learning, an AI technique where agents learn tasks by trial and error, carefully keeping tabs on what works and what doesn’t, what brings the highest score or wins a game or grabs some other reward. It’s a powerful technology: Reinforcement learning is how Google’s DeepMind lab built AlphaGo, the AI that recently beat one of the world’s top players at the ancient game of Go.
But with Universe, reinforcement learning can happen inside any piece of software. Agents can readily move between applications, learning to crack one and then another. In the long run, Sutskever says, they can even practice “transfer learning,” in which an agent takes what it has learned in one application and applies it to another. OpenAI, he says, is already building agents that can transfer at least some learning from one driving game to another.
Michael Bowling, a University of Alberta professor who helped create the Atari Learning Environment, questions how well Universe will work in practice, if only because he hasn’t used it. But he applauds the concept—an AI proving ground that spans not just games but everything else. “It crystallizes an important idea: Games are a helpful benchmark, but the goal is AI.”
Hello, Grand Theft Auto
Still, games are where it starts. OpenAI has seeded Universe with about a thousand games, securing approval from publishers like Valve and Microsoft. It’s also working with Microsoft to add Malmo.
Games have always served as a natural training tool for AI. They’re more contained than the real world, and there’s a clear system of rewards, so that can AI agents can readily learn which actions to take and which to avoid. Games aren’t ends in and of themselves, but they’ve already helped create AI that has a meaningful effect on the real world. After building AI that can play old Atari games better than any human ever could, DeepMind used much the same technology to refine the operation of Google’s worldwide network of computer data centers, reducing its energy bill by hundreds of millions of dollars.
The digitized chaos of Grand Theft Auto, the thinking goes, can help autonomous vehicles learn to handle the unexpected.
Craig Quiter is using Universe with a similar goal in mind. Quiter helped build the platform at OpenAI before moving across town to Otto, the self-driving truck startup Uber acquired this summer in a deal worth about $680 million. Last month, drawing on work from several engineers who worked on autonomous cars inside Google, Otto’s driverless 18-wheeler delivered 50,000 cans of Budweiser down 120 miles of highway from Fort Collins to Colorado Springs. But Quiter is looking well beyond the $30,000 in hardware and software that made this delivery possible. With help from Universe, he’s building an AI that can play Grand Theft Auto V.
Today, Otto’s truck can navigate a relatively calm interstate. But in the years to come, the company hopes to build autonomous vehicles that can respond to just about anything they encounter on the road, including cars spinning out of control across several lanes of traffic. The digitized chaos of Grand Theft Auto, the thinking goes, can help the AI controlling those vehicles learn to handle the unexpected.
Meanwhile, researchers at OpenAI are already pushing Universe beyond games into web browsers and protein folding apps used by biologists. Andrej Karpathy, the lead researcher of this sub-project, dubbed World of Bits, questions how useful games will be in building AI for the real world. But an AI that learns how to use a web browser is, in a sense, already learning to participate in the real world. The web is part of our daily lives. Navigating a browser web services both motor skills and language skills. It’s a gateway to any software or any person.
The rub is that reinforcement learning inside a web browser is a far more difficult to pull off. Universe includes a deep neural network that can automatically read scores from a game screen in much the same way neural nets can recognizes objects or faces in photos. But web services have no score. Researchers must define their own reward functions. Universe allows for this, but it’s still unclear what rewards will help agents, say, sign into a website or look up facts on Wikipedia, tasks that OpenAI is already exploring.
But if we can teach machines these more amorphous tasks—teach AI agents to do anything on a computer—Sutskever believes we can teach them to do just about anything else. After all, an AI that can’t browse the internet unless it understands the natural way we humans talk. It can’t play Grand Theft Auto without the motor skills of a human. And like so many others, Quiter argues that navigating virtual worlds isn’t so different from navigating the real world. If Universe reaches is goal, then general intelligence isn’t that far away. It’s a ridiculous aim—but it may not be ridiculous for long.