OpenAI’s recent reveal of its stunning generative model Sora pushed the envelope of what’s possible with text-to-video technology. Now Google DeepMind brings us text-to-video games.
The brand new model, called Genie, can take a brief description, a hand-drawn sketch, or a photograph and switch it right into a playable video game within the variety of classic 2D platformers like Super Mario Bros. But don’t expect anything fast-paced. The games run at one frame per second, versus the everyday 30 to 60 frames per second of latest games.
“It’s cool work,” says Matthew Guzdial, an AI researcher on the University of Alberta, who developed an analogous game generator a number of years ago.
Genie was trained on 30,000 hours of video of tons of of 2D platform games taken from the web. Others have taken that approach before, says Guzdial. His own game generator learned from videos to create abstract platformers. Nivida used video data to coach a model called GameGAN, which could produce clones of games like Pac-Man.
But all these examples trained the model with input actions and button presses on a controller, in addition to video footage: a video frame showing Mario jumping was paired with the “jump” motion, and so forth. Tagging video footage with input actions takes lots of work, which has limited the quantity of coaching data available.
In contrast, Genie was trained on video footage alone. It then learned which of eight possible actions would cause the sport character in a video to vary its position. This turned countless hours of existing online video into potential training data.
Genie generates each recent frame of the sport on the fly depending on the motion the player takes. Press Jump, and Genie updates the present image to indicate the sport character jumping; press Left and the image changes to indicate the character moved to the left. The sport ticks along motion by motion, each recent frame generated from scratch because the player plays.
Future versions of Genie could run faster. “There isn’t a fundamental limitation that stops us from reaching 30 frames per second,” says Tim Rocktäschel, a research scientist at Google DeepMind who leads the team behind the work. “Genie uses lots of the same technologies as contemporary large language models, where there was significant progress in improving inference speed.”
Genie learned some common visual quirks present in platformers. Many games of this sort use parallax, where the foreground moves sideways faster than the background. Genie often adds this effect to the games it generates.
While Genie is an in-house research project and won’t be released, Guzdial notes that the Google DeepMind team says it could in the future be changed into a game-making tool—something he’s working on too. “I’m definitely interested to see what they construct,” he says.
Virtual playgrounds
However the Google DeepMind researchers are enthusiastic about greater than just game generation. The team behind Genie works on open-ended learning, where AI-controlled bots are dropped right into a virtual environment and left to resolve various tasks by trial and error (a method referred to as reinforcement learning).
In 2021, the team developed a virtual playground called XLand, by which bots learned methods to cooperate on easy tasks resembling moving obstacles. Virtual environments like XLand can be crucial for training future bots on a variety of various challenges before pitting them against real-world scenarios. The video-game example proves that Genie can produce these virtual sandboxes for bots to play in.
Others have developed similar world-building tools. For instance, David Ha at Google Brain and Jürgen Schmidhuber on the AI lab IDSIA in Switzerland developed a tool in 2018 that trained bots in game-based virtual environments called world models. But again, unlike Genie, these required the training data to incorporate input actions.
The team demonstrated how this ability is beneficial in robotics, too. When Genie was shown videos of real robot arms manipulating a wide range of household objects, the model learned what actions that arm could do and methods to control it. Future robots could learn recent tasks by watching video tutorials.
“It is difficult to predict what use cases can be enabled,” says Rocktäschel. “We hope projects like Genie will eventually provide individuals with recent tools to specific their creativity.”