Home Community Meta AI Unveils SeamlessM4T: A Foundational Multilingual and Multitask Model that Seamlessly Translates and Transcribes Across Speech and Text

Meta AI Unveils SeamlessM4T: A Foundational Multilingual and Multitask Model that Seamlessly Translates and Transcribes Across Speech and Text

0
Meta AI Unveils SeamlessM4T: A Foundational Multilingual and Multitask Model that Seamlessly Translates and Transcribes Across Speech and Text

In a world where interactions are increasingly global, being multilingual can bridge gaps, foster understanding, and open doors to diverse opportunities. Learning multiple languages can provide insights into language structure and linguistics, deepening one’s understanding of the mechanics of communication and thought. This could be especially useful in today’s globalized world, where cross-cultural interactions are common. Don’t you’re thinking that this bridge must be filled even between the humans and the AI?

Researchers from MetaAI and UC Berkley propose a foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text. They call it “SeamlessM4T”. The M4T within the name stands for Massively Multilingual and Multimodal Machine Translation. It’s an AI model with speech-to-text, speech-to-speech, text-to-speech, text-to-text translation, and automatic speech recognition for as much as 100 languages. 

Who isn’t acquainted with Babel Fish ( a web-based translator )? What’s the problem with it? Babel Fish is a speech-to-speech translation system. Various existing systems of such kind are likely to deal with high-resource languages resembling English, Spanish, and French, leaving many low-resource languages behind. Their services are mostly translations from English to other languages and never vice-versa. These systems depend on cascade systems composed of multiple subsystems, so their performance doesn’t match their cascade counterparts.

To resolve these limitations, researchers used over 1 million hours of open speech audio data to learn self-supervised speech. They created a multimodal corpus of robotically aligned speech translations of greater than 470,000 hours! To guage the model’s robustness against the background noises and speaker, they created open robustness benchmarks and located an improvement of 38% and 49%, respectively.

Researchers say that they maintained systematic evaluations for his or her system throughout their workflow to make sure protected and robust performance. They used parallel data mining alternative to using closed data. This method involves encoding sentences from various languages right into a fixed-size embedding space and finding parallel instances based on a similarity metric.

Making a unified large model that may handle the complete suite of tasks involved in text and speech translation lays the necessary groundwork for the subsequent generation of on-device and on-demand multimodal translation. They are saying that when language technologies are developed primarily with this idealogy in mind, the needs of half of the world’s population are resolved, and their future work involves bridging this gap between those that speak high and low-resource languages to guide the world in a direction that has never been more interconnected. 

Researchers say that their model SeamlessM4T performance may must be more consistent with regards to translating slang or proper nouns across high and low-resource languages. Their future work would resolve this limitation to have a more friendly and moderate conversation based on one’s mother tongue and slang. 


Take a look at the Paper, Project, and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to affix our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

➡️ Hostinger AI Website Builder: User-Friendly Drag-and-Drop Editor. Try Now (Sponsored)


Arshad is an intern at MarktechPost. He’s currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the elemental level results in recent discoveries which result in advancement in technology. He’s enthusiastic about understanding the character fundamentally with the assistance of tools like mathematical models, ML models and AI.


🚀 CodiumAI enables busy developers to generate meaningful tests (Sponsored)

LEAVE A REPLY

Please enter your comment!
Please enter your name here