Introducing Whisper

neub9
By neub9
1 Min Read

Other existing approaches often rely on smaller, more closely paired audio-text training datasets or use broad but unsupervised audio pretraining. These approaches are effective in specific benchmarks like LibriSpeech but lack robustness across diverse datasets. However, our model Whisper was trained on a large and diverse dataset without fine-tuning to any specific benchmark, resulting in a much more robust performance with 50% fewer errors across various datasets.

Approximately one-third of Whisper’s audio dataset is non-English, allowing it to excel in tasks like speech to text translation and outperforming the supervised SOTA on CoVoST2 to English translation in a zero-shot scenario. This approach makes Whisper particularly effective at handling diverse languages and outperforming other models in translation tasks.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *