Home Learn MIT Technology Review How three filmmakers created Sora’s latest stunning videos

MIT Technology Review How three filmmakers created Sora’s latest stunning videos

0
MIT Technology Review
How three filmmakers created Sora’s latest stunning videos

Within the last month, a handful of filmmakers have taken Sora for a test drive. The outcomes, which OpenAI published this week, are amazing. The short movies are a giant jump up even from the cherry-picked demo videos that OpenAI used to tease its recent generative model just six weeks ago. Here’s how three of the filmmakers did it.

Air Head” by Shy Kids

Shy Kids is a pop band and filmmaking collective based in Toronto that describes its style as “punk-rock Pixar.” The group has experimented with generative video tech before. Last yr it made a music video for considered one of its songs using an open-source tool called Stable Warpfusion. It’s cool, but low-res and glitchy. The film it made with Sora, called “Air Head,” could pass for real footage—if it didn’t feature a person with a balloon for a face.

One problem with most generative video tools is that it’s hard to keep up consistency across frames. When OpenAI asked Shy Kids to check out Sora, the band desired to see how far they may push it. “We thought a fun, interesting experiment can be—could we make a consistent character?” says Shy Kids member Walter Woodman. “We expect it was mostly successful.”

Generative models may struggle with anatomical details like hands and faces. But within the video there’s a scene showing a train automotive filled with passengers, and the faces are near perfect. “It’s mind-blowing what it may do,” says Woodman. “Those faces on the train were all Sora.”

Has generative video’s problem with faces and hands been solved? Not quite. We still get glimpses of warped body parts. And text remains to be an issue (in one other video, by the creative agency Native Foreign, we see a motorbike repair shop with the sign “Biycle Repaich”). But all the pieces in “Air Head” is raw output from Sora. After editing together many alternative clips produced with the tool, Shy Kids did a bunch of post-processing to make the film look even higher. They used visual effects tools to repair certain shots of the fundamental character’s balloon face, for instance.

Woodman also thinks that the music (which they wrote and performed) and the voice-over (which additionally they wrote and performed) help to lift the standard of the film much more. Mixing these human touches in with Sora’s output is what makes the film feel alive, says Woodman. “The technology is nothing without you,” he says. “It’s a robust tool, but you’re the person driving it.”

Abstract“ by Paul Trillo

Paul Trillo, an artist and filmmaker, desired to stretch what Sora could do with the look of a movie. His video is a mash-up of retro-style footage with shots of a figure who morphs right into a glitter ball and a breakdancing trash man. He says that all the pieces you see is raw output from Sora: “No color correction or post FX.” Even the jump-cut edits in the primary a part of the film were produced using the generative model.

Trillo felt that the demos that OpenAI put out last month got here across too very like clips from video games. “I desired to see what other aesthetics were possible,” he says. The result’s a video that appears like something shot with vintage 16-millimeter film. “It took a good amount of experimenting, but I stumbled upon a series of prompts that helps make the video feel more organic or filmic,” he says.

Beyond Our Reality” by Don Allen Stevenson

View this post on Instagram

A post shared by Don Allen Stevenson III (@donalleniii)

Don Allen Stevenson III is a filmmaker and visual effects artist. He was considered one of the artists invited by OpenAI to check out DALL-E 2, its text-to-image model, a few years ago. Stevenson’s film is a NatGeo-style nature documentary that introduces us to a menagerie of imaginary animals, from the girafflamingo to the eel cat.

In some ways working with text-to-video is like working with text-to-image, says Stevenson. “You enter a text prompt and you then tweak your prompt a bunch of times,” he says. But there’s an added hurdle. If you’re trying out different prompts, Sora produces low-res video. If you hit on something you want, you’ll be able to then increase the resolution. But going from low to high res is involves one other round of generation, and what you liked within the low-res version could be lost.

Sometimes the camera angle is different or the objects within the shot have moved, says Stevenson. Hallucination remains to be a feature of Sora, because it is in any generative model. With still images this might produce weird visual defects; with video those defects can appear across time as well, with weird jumps between frames.

Stevenson also needed to work out the right way to speak Sora’s language. It takes prompts very literally, he says. In a single experiment he tried to create a shot that zoomed in on a helicopter. Sora produced a clip during which it mixed together a helicopter with a camera’s zoom lens. But Stevenson says that with a variety of creative prompting, Sora is less complicated to manage than previous models.

Even so, he thinks that surprises are a part of what makes the technology fun to make use of: “I like having less control. I just like the chaos of it,” he says. There are numerous other video-making tools that provide you with control over editing and visual effects. For Stevenson, the purpose of a generative model like Sora is to provide you with strange, unexpected material to work with in the primary place.

The clips of the animals were all generated with Sora. Stevenson tried many alternative prompts until the tool produced something he liked. “I directed it, nevertheless it’s more like a nudge,” he says. He then went forwards and backwards, trying out variations.

Stevenson pictured his fox crow having 4 legs, for instance. But Sora gave it two, which worked even higher. (It’s not perfect: sharp-eyed viewers will see that at one point within the video the fox crow switches from two legs to 4, then back again.) Sora also produced several versions that he thought were too creepy to make use of.

When he had a set of animals he really liked, he edited them together. Then he added captions and a voice-over on top. Stevenson could have created his made-up menagerie with existing tools. However it would have taken hours, even days, he says. With Sora the method was far quicker.

“I used to be trying to consider something that will look cool and experimented with a variety of different characters,” he says. “I even have so many clips of random creatures.” Things really clicked when he saw what Sora did with the girafflamingo. “I began considering: What’s the narrative around this creature? What does it eat, where does it live?” he says. He plans to place out a series of prolonged movies following each of the fantasy animals in additional detail.

Stevenson also hopes his fantastical animals will make an even bigger point. “There’s going to be a variety of recent varieties of content flooding feeds,” he says. “How are we going to show people what’s real? In my view, a method is to inform stories which are clearly fantasy.”

Stevenson points out that his film might be the primary time a variety of people see a video created by a generative model. He wants that first impression to make one thing very clear: This is just not real.

LEAVE A REPLY

Please enter your comment!
Please enter your name here