Home Learn Google’s new edition of Gemini can handle far greater amounts of information

Google’s new edition of Gemini can handle far greater amounts of information

0
Google’s new edition of Gemini can handle far greater amounts of information

Google DeepMind today launched the subsequent generation of its powerful artificial intelligence model Gemini, which has an enhanced ability to work with large amounts of video, text, and pictures.

It’s an advancement from the three versions of Gemini 1.0 that Google announced back in December, ranging in size and complexity from Nano to Pro to Ultra. (It rolled out Gemini 1.0 Pro and 1.0 Ultra across a lot of its products last week.) Google is now releasing a preview of Gemini 1.5 Pro to pick developers and business customers. The corporate says that the mid-tier Gemini 1.5 Pro matches its previous top-tier model, Gemini 1.0 Ultra, in performance, but uses less computing power (yes, the names are confusing!). 

Crucially, the 1.5 Pro model can handle much larger amounts of information from users, including the dimensions of prompts. While every AI model has a ceiling of how much data it may possibly digest, the usual version of the brand new Gemini 1.5 Pro can handle inputs as large as 128,000 tokens, that are words or parts of words that an AI model breaks inputs into. That’s on a par with one of the best version of GPT-4 (GPT-4 Turbo). 

Nonetheless, a limited group of developers will give you the chance to submit as much as 1 million tokens to Gemini 1.5 Pro, which equates to roughly 1 hour of video, 11 hours of audio, or 700,000 words of text. That’s a major jump that makes it possible to do things that no other models are currently able to.

In a single demonstration video shown by Google, using the million-token version, researchers fed the model a 402-page transcript of the Apollo moon landing mission. Then they showed Gemini a hand-drawn sketch of a boot, and asked it to discover the moment within the transcript that the drawing represents.

“That is the moment Neil Armstrong landed on the moon,” the chatbot responded accurately. “He said, ‘One small step for man, one giant leap for mankind.’”

The model was also capable of discover moments of humor. When asked by the researchers to seek out a funny moment within the Apollo transcript, it picked out when astronaut Mike Collins referred to Armstrong as “the Czar.” (Probably not one of the best line, but you get the purpose).  

In one other demonstration, the team uploaded a 44-minute silent film featuring Buster Keaton and asked the AI to discover what information was on a chunk of paper that, in some unspecified time in the future within the movie, is faraway from a personality’s pocket. In lower than a minute, the model found the scene and accurately recalled the text written on the paper. Researchers also repeated an analogous task from the Apollo experiment, asking the model to seek out a scene within the film based on a drawing, which it accomplished. 

Google says it put Gemini 1.5 Pro through the standard battery of tests it uses when developing large language models, including evaluations that mix text, code, images, audio and video. It found that 1.5 Pro outperformed 1.0 Pro on 87% of the benchmarks and kind of matched 1.0 Ultra across all of them while using less computing power. 

The power to handle larger inputs, Google says, is a results of progress in what’s called mixture-of-experts architecture. An AI using this design divides its neural network into chunks, only activating the parts which might be relevant to the duty at hand, fairly than firing up the entire network without delay. (Google is just not alone in using this architecture; French AI firm Mistral released a model using it, and GPT-4 is rumored to employ the tech as well.)

“In a technique it operates very like our brain does, where not the entire brain prompts on a regular basis,” says Oriol Vinyals, a deep learning team lead at DeepMind. This compartmentalizing saves the AI computing power and might generate responses faster.

“That sort of fluidity going forwards and backwards across different modalities, and using that to go looking and understand, may be very impressive,” says Oren Etzioni, former technical director of the Allen Institute for Artificial Intelligence, who was not involved within the work. “That is stuff I actually have not seen before.”

An AI that may operate across modalities would more closely resemble the best way that human beings behave. “Persons are naturally multimodal,” Etzioni says, because we are able to effortlessly switch between speaking, writing, and drawing images or charts to convey ideas. 

Etzioni cautioned against taking an excessive amount of meaning from the developments, nonetheless. “There’s a famous line,” he says. “Never trust an AI demo.” 

For one, it’s not clear how much the demonstration videos overlooked or cherry-picked from various tasks (Google indeed received criticism for its early Gemini launch for not disclosing that the video was sped up). It’s also possible the model wouldn’t give you the chance to copy among the demonstrations if the input wording were barely tweaked. AI models basically, says Etzioni, are brittle. 

Today’s release of Gemini 1.5 Pro is restricted to developers and enterprise customers. Google didn’t specify when it should be available for wider release. 

LEAVE A REPLY

Please enter your comment!
Please enter your name here