Meta has built AI models that may recognize and produce speech for greater than 1,000 languages—a tenfold increase on what’s currently available. It’s a major step toward preserving languages which can be liable to disappearing, the corporate says.
Meta is releasing its models to the general public via the code hosting service GitHub. It claims that making them open source will help developers working in several languages to construct latest speech applications—like messaging services that understand everyone, or virtual-reality systems that may be utilized in any language.
There are around 7,000 languages on the planet, but existing speech recognition models cover only about 100 of them comprehensively. It is because these sorts of models are inclined to require huge amounts of labeled training data, which is out there for less than a small variety of languages, including English, Spanish, and Chinese.
Meta researchers got around this problem by retraining an existing AI model developed by the corporate in 2020 that’s in a position to learn speech patterns from audio without requiring large amounts of labeled data, resembling transcripts.
They trained it on two latest data sets: one which comprises audio recordings of the Recent Testament Bible and its corresponding text taken from the web in 1,107 languages, and one other containing unlabeled Recent Testament audio recordings in 3,809 languages. The team processed the speech audio and the text data to enhance its quality before running an algorithm designed to align audio recordings with accompanying text. They then repeated this process with a second algorithm trained on the newly aligned data. With this method, the researchers were in a position to teach the algorithm to learn a brand new language more easily, even without the accompanying text.
“We will use what that model learned to then quickly construct speech systems with very, little or no data,” says Michael Auli, a research scientist at Meta who worked on the project.
“For English, now we have lots and plenty of good data sets, and now we have that for just a few more languages, but we just don’t have that for languages which can be spoken by, say, 1,000 people.”
The researchers say their models can converse in over 1,000 languages but recognize greater than 4,000.
They compared the models with those from rival corporations, including OpenAI Whisper, and claim theirs had half the error rate, despite covering 11 times more languages.
Nonetheless, the team warns the model continues to be liable to mistranscribing certain words or phrases, which could lead to inaccurate or potentially offensive labels. In addition they acknowledge that their speech recognition models yielded more biased words than other models, albeit only 0.7% more.
While the scope of the research is impressive, the use of spiritual texts to coach AI models may be controversial, says Chris Emezue, a researcher at Masakhane, a company working on natural-language processing for African languages, who was not involved within the project.
“The Bible has plenty of bias and misrepresentations,” he says.