
Meta goes all in on open-source AI. The corporate is today unveiling LLaMA 2, its first large language model that’s available for anyone to make use of—without cost.
Since OpenAI released its hugely popular AI chatbot ChatGPT last November, tech firms have been racing to release models in hopes of overthrowing its supremacy. Meta has been within the slow lane. In February when competitors Microsoft and Google announced their AI chatbots, Meta rolled out the primary, smaller version of LLaMA, restricted to researchers. However it hopes that releasing LLaMA 2, and making it free for anyone to construct industrial products on top of, will help it catch up.
The corporate is definitely releasing a set of AI models, which include versions of LLaMA 2 in several sizes, in addition to a version of the AI model that individuals can construct right into a chatbot, just like ChatGPT. Unlike ChatGPT, which individuals can access through OpenAI’s website, the model have to be downloaded from Meta’s launch partners Microsoft Azure, Amazon Web Services, and Hugging Face.
“This advantages your complete AI community and provides people options to go along with closed-source approaches or open-source approaches for whatever suits their particular application,” says Ahmad Al-Dahle, a vp at Meta who’s leading the corporate’s generative AI work. “It is a really, really big moment for us.”
But many caveats still remain. Meta just isn’t releasing information in regards to the data set that it used to coach LLaMA 2 and can’t guarantee that it didn’t include copyrighted works or personal data, in response to an organization research paper shared exclusively with MIT Technology Review. LLaMA 2 also has the identical problems that plague all large language models: a propensity to supply falsehoods and offensive language.
The concept, Al-Dahle says, is that by releasing the model into the wild and letting developers and corporations tinker with it, Meta will learn vital lessons about the way to make its models safer, less biased, and more efficient.
A robust open-source model like LLaMA 2 poses a substantial threat to OpenAI, says Percy Liang, director of Stanford’s Center for Research on Foundation Models. Liang was a part of the team of researchers who developed Alpaca, an open-source competitor to GPT-3, an earlier version of OpenAI’s language model.
“LLaMA 2 isn’t GPT-4,” says Liang. And in its research paper, Meta admits there remains to be a big gap in performance between LLaMA 2 and GPT-4, which is now OpenAI’s state-of-the-art AI language model. “But for a lot of use cases, you don’t need GPT-4,” he adds.
A more customizable and transparent model, similar to LLaMA 2, might help firms create services and products faster than an enormous, sophisticated proprietary model, he says.
“To have LLaMA 2 grow to be the leading open-source alternative to OpenAI can be an enormous win for Meta,” says Steve Weber, a professor on the University of California, Berkeley.
Under the hood
Getting LLaMA 2 able to launch required a variety of tweaking to make the model safer and fewer more likely to spew toxic falsehoods than its predecessor, Al-Dahle says.
Meta has loads of past gaffes to learn from. Its language model for science, Galactica, was taken offline after only three days, and its previous LlaMA model, which was meant just for research purposes, was leaked online, sparking criticism from politicians who questioned whether Meta was taking proper account of the risks related to AI language models, similar to disinformation and harassment.
To mitigate the chance of repeating these mistakes, Meta applied a combination of various machine learning techniques geared toward improving helpfulness and safety.
Meta’s approach to training LLaMA 2 had more steps than usual for generative AI models, says Sasha Luccioni, a researcher at AI startup Hugging Face.
The model was trained on 40% more data than its predecessor. Al-Dahle says there have been two sources of coaching data: data that was scraped online, and an information set fine-tuned and tweaked in response to feedback from human annotators to behave in a more desirable way. The corporate says it didn’t use Meta user data in LLaMA 2, and excluded data from sites it knew had a lot of personal information.
Despite that, LLaMA 2 still spews offensive, harmful, and otherwise problematic language, similar to rival models. Meta says it didn’t remove toxic data from the information set, because leaving it in might help LLaMA 2 detect hate speech higher, and removing it could risk by accident filtering out some demographic groups.
Nevertheless, Meta’s commitment to openness is exciting, says Luccioni, since it allows researchers like herself to review AI models’ biases, ethics, and efficiency properly.
The proven fact that LLaMA 2 is an open-source model may even allow external researchers and developers to probe it for security flaws, which can make it safer than proprietary models, Al-Dahle says.
Liang agrees. “I’m very excited to try things out and I feel it’ll be helpful for the community,” he says.