Home Community CMU Researchers Introduce OWSM v3.1: A Higher and Faster Open Whisper-Style Speech Model-Based on E-Branchformer

CMU Researchers Introduce OWSM v3.1: A Higher and Faster Open Whisper-Style Speech Model-Based on E-Branchformer

0
CMU Researchers Introduce OWSM v3.1: A Higher and Faster Open Whisper-Style Speech Model-Based on E-Branchformer

Speech recognition technology has turn out to be a cornerstone for various applications, enabling machines to grasp and process human speech. The sector repeatedly seeks advancements in algorithms and models to enhance accuracy and efficiency in recognizing speech across multiple languages and contexts. The important challenge in speech recognition is developing models that accurately transcribe speech from various languages and dialects. Models often need assistance with the variability of speech, including accents, intonation, and background noise, resulting in a requirement for more robust and versatile solutions.

Researchers have been exploring various methods to boost speech recognition systems. Existing solutions have often relied on complex architectures like Transformers, which, despite their effectiveness, face limitations, particularly in processing speed and the nuanced task of accurately recognizing and interpreting a wide selection of speech nuances, including dialects, accents, and variations in speech patterns. 

The Carnegie Mellon University and Honda Research Institute Japan research team introduced a brand new model, OWSM v3.1, utilizing the E-Branchformer architecture to deal with these challenges. OWSM v3.1 is an improved and faster Open Whisper-style Speech Model that achieves higher results than the previous OWSM v3 in most evaluation conditions. 

The previous OWSM v3 and Whisper each utilize the usual Transformer encoder-decoder architecture. Nonetheless, recent advancements in speech encoders equivalent to Conformer and Branchformer have improved performance in speech processing tasks. Hence, the E-Branchformer is employed because the encoder in OWSM v3.1, demonstrating its effectiveness at a scale of 1B parameters. OWSM v3.1 excludes the WSJ training data utilized in OWSM v3, which had fully uppercased transcripts. This exclusion results in a significantly lower Word Error Rate (WER) in OWSM v3.1. It also demonstrates as much as 25% faster inference speed.

OWSM v3.1 demonstrated significant achievements in performance metrics. It outperformed its predecessor, OWSM v3, in most evaluation benchmarks, achieving higher accuracy in speech recognition tasks across multiple languages. In comparison with OWSM v3, OWSM v3.1 shows improvements in English-to-X translation in 9 out of 15 directions. Although there could also be a slight degradation in some directions, the common BLEU rating is barely improved from 13.0 to 13.3.

In conclusion, the research significantly strides towards enhancing speech recognition technology. By leveraging the E-Branchformer architecture, the OWSM v3.1 model improves upon previous models by way of accuracy and efficiency and sets a brand new standard for open-source speech recognition solutions. By releasing the model and training details publicly, the researchers’ commitment to transparency and open science further enriches the sphere and paves the best way for future advancements.


Try the Paper and Demo. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you happen to like our work, you’ll love our newsletter..

Don’t Forget to affix our Telegram Channel


Nikhil is an intern consultant at Marktechpost. He’s pursuing an integrated dual degree in Materials on the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who’s at all times researching applications in fields like biomaterials and biomedical science. With a robust background in Material Science, he’s exploring latest advancements and creating opportunities to contribute.


🎯 [FREE AI WEBINAR] ‘Actions in GPTs: Developer Suggestions, Tricks & Techniques’ (Feb 12, 2024)

LEAVE A REPLY

Please enter your comment!
Please enter your name here