Home Artificial Intelligence Introducing Whisper

Introducing Whisper

0
Introducing Whisper

Other existing approaches steadily use smaller, more closely paired audio-text training datasets,[^reference-1] [^reference-2][^reference-3] or use broad but unsupervised audio pretraining.[^reference-4][^reference-5][^reference-6] Because Whisper was trained on a big and diverse dataset and was not fine-tuned to any specific one, it doesn’t beat models that concentrate on LibriSpeech performance, a famously competitive benchmark in speech recognition. Nevertheless, after we measure Whisper’s zero-shot performance across many diverse datasets we discover it’s way more robust and makes 50% fewer errors than those models.

A few third of Whisper’s audio dataset is non-English, and it’s alternately given the duty of transcribing in the unique language or translating to English. We discover this approach is especially effective at learning speech to text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.

LEAVE A REPLY

Please enter your comment!
Please enter your name here