ElevenLabs Unveils New Speech-to-Text Model
ElevenLabs, an AI startup that recently secured a whopping $180 million in funding, is famous for its audio-generation skills. But now, they've taken a bold step into new territory by launching their first stand-alone speech-to-text model, called Scribe.
Valued at $3.3 billion, ElevenLabs has been a go-to for many companies needing text-to-speech services, thanks to their huge collection of voices. Now, they're setting their sights on speech detection, aiming to take on big names like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper models.
Scribe isn't messing around—it supports over 99 languages right out of the gate. ElevenLabs says it's got excellent accuracy for over 25 languages, with a word error rate of less than 5%. We're talking English (with a claimed accuracy of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese, among others. The rest fall into different accuracy categories: high (5% to 10% word error rate), good (10% to 20% word error rate), and moderate (25% to 50%).
The company claims Scribe beats out Google Gemini 2.0 Flash and Whisper Large V3 in multiple languages, according to FLEURS & Common Voice benchmark tests.

Image Credits:ElevenLabs
ElevenLabs actually built the speech-to-text part for their AI conversational agent platform last year, but Scribe is their first go at a stand-alone speech detection model. In a chat with TechCrunch last month, CEO Mati Staniszewski spilled the beans on their plans to beef up their speech detection tech.
"We want to get better at understanding what you're saying in a conversation. We're not just about generating content anymore; we're moving into understanding and transcribing speech," Staniszewski said. "A lot of folks think speech-to-text is old news, but for many languages, it's still pretty rough. We think we can do better because we've got in-house teams to label data and give us quick feedback."
Scribe's got some cool features too, like smart speaker diarization to tell you who's talking, timestamps at the word level for spot-on subtitles, and auto-tagging of sound events like audience laughter. Plus, ElevenLabs is letting customers directly transcribe video content to add subtitles or captions in their studio.
Right now, Scribe only works with pre-recorded audio. But don't worry, the company says they're working on a low-latency real-time version soon. So, it's not quite ready for meeting transcriptions or voice note-taking just yet.
ElevenLabs is charging $0.40 per hour of transcribed audio for Scribe. It's a competitive price, but some rivals are offering cheaper rates for audio transcriptions, with a few different features thrown in.
Related article
AI Voice Translator G5 Pro: Seamless Global Communication
In a world where global connectivity is essential, bridging language gaps is more important than ever. The AI Voice Translator G5 Pro offers a practical solution with its real-time translation feature
Elevate Your Images with HitPaw AI Photo Enhancer: A Comprehensive Guide
Want to transform your photo editing experience? Thanks to cutting-edge artificial intelligence, improving your images is now effortless. This detailed guide explores the HitPaw AI Photo Enhancer, an
AI-Powered Music Creation: Craft Songs and Videos Effortlessly
Music creation can be complex, demanding time, resources, and expertise. Artificial intelligence has transformed this process, making it simple and accessible. This guide highlights how AI enables any
Comments (27)
0/200
TimothyMartínez
August 21, 2025 at 9:01:20 AM EDT
Scribe sounds like a game-changer! I'm curious if it'll handle my thick accent as well as it claims. Excited to try it for podcast transcriptions! 😎
0
MatthewTaylor
August 12, 2025 at 5:00:59 PM EDT
Just saw ElevenLabs' Scribe model news—97% accuracy in English is wild! 😮 I'm curious how it'll handle my thick accent in meetings. Hope they drop that real-time version soon!
0
RogerRoberts
April 20, 2025 at 9:44:55 PM EDT
¡El Scribe de ElevenLabs es genial! Es increíble cómo han entrado en el mercado de voz a texto con un modelo tan sólido. Mi única queja es que a veces tiene problemas con acentos fuertes. Pero, para ser el primer intento, es bastante impresionante. ¡Sigan así, ElevenLabs! 🚀
0
RalphHill
April 20, 2025 at 4:36:44 PM EDT
O novo modelo Scribe do ElevenLabs é incrível! Eles passaram da geração de áudio para o reconhecimento de fala de forma tão suave. Testei e a precisão é boa, mas tropeça um pouco com sotaques fortes. Vale a pena conferir se você gosta de IA! 😊
0
PaulBrown
April 19, 2025 at 10:47:04 PM EDT
ElevenLabsのScribeはすごいね!音声からテキストへの変換がこんなにスムーズだなんて。ただ、少しだけ重いアクセントだと苦手なところがあるかな。でも、初挑戦にしてはかなり良いと思うよ!頑張ってね、ElevenLabs!🚀
0
HarryLewis
April 19, 2025 at 9:23:49 PM EDT
ElevenLabs의 Scribe 정말 멋지네요! 음성에서 텍스트로 변환하는 게 이렇게 부드럽다니. 다만, 조금 무거운 억양은 어려워하는 것 같아요. 그래도 첫 시도치고는 꽤 훌륭해요! 계속 화이팅, ElevenLabs! 🚀
0
ElevenLabs, an AI startup that recently secured a whopping $180 million in funding, is famous for its audio-generation skills. But now, they've taken a bold step into new territory by launching their first stand-alone speech-to-text model, called Scribe.
Valued at $3.3 billion, ElevenLabs has been a go-to for many companies needing text-to-speech services, thanks to their huge collection of voices. Now, they're setting their sights on speech detection, aiming to take on big names like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper models.
Scribe isn't messing around—it supports over 99 languages right out of the gate. ElevenLabs says it's got excellent accuracy for over 25 languages, with a word error rate of less than 5%. We're talking English (with a claimed accuracy of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese, among others. The rest fall into different accuracy categories: high (5% to 10% word error rate), good (10% to 20% word error rate), and moderate (25% to 50%).
The company claims Scribe beats out Google Gemini 2.0 Flash and Whisper Large V3 in multiple languages, according to FLEURS & Common Voice benchmark tests.
ElevenLabs actually built the speech-to-text part for their AI conversational agent platform last year, but Scribe is their first go at a stand-alone speech detection model. In a chat with TechCrunch last month, CEO Mati Staniszewski spilled the beans on their plans to beef up their speech detection tech.
"We want to get better at understanding what you're saying in a conversation. We're not just about generating content anymore; we're moving into understanding and transcribing speech," Staniszewski said. "A lot of folks think speech-to-text is old news, but for many languages, it's still pretty rough. We think we can do better because we've got in-house teams to label data and give us quick feedback."
Scribe's got some cool features too, like smart speaker diarization to tell you who's talking, timestamps at the word level for spot-on subtitles, and auto-tagging of sound events like audience laughter. Plus, ElevenLabs is letting customers directly transcribe video content to add subtitles or captions in their studio.
Right now, Scribe only works with pre-recorded audio. But don't worry, the company says they're working on a low-latency real-time version soon. So, it's not quite ready for meeting transcriptions or voice note-taking just yet.
ElevenLabs is charging $0.40 per hour of transcribed audio for Scribe. It's a competitive price, but some rivals are offering cheaper rates for audio transcriptions, with a few different features thrown in.




Scribe sounds like a game-changer! I'm curious if it'll handle my thick accent as well as it claims. Excited to try it for podcast transcriptions! 😎




Just saw ElevenLabs' Scribe model news—97% accuracy in English is wild! 😮 I'm curious how it'll handle my thick accent in meetings. Hope they drop that real-time version soon!




¡El Scribe de ElevenLabs es genial! Es increíble cómo han entrado en el mercado de voz a texto con un modelo tan sólido. Mi única queja es que a veces tiene problemas con acentos fuertes. Pero, para ser el primer intento, es bastante impresionante. ¡Sigan así, ElevenLabs! 🚀




O novo modelo Scribe do ElevenLabs é incrível! Eles passaram da geração de áudio para o reconhecimento de fala de forma tão suave. Testei e a precisão é boa, mas tropeça um pouco com sotaques fortes. Vale a pena conferir se você gosta de IA! 😊




ElevenLabsのScribeはすごいね!音声からテキストへの変換がこんなにスムーズだなんて。ただ、少しだけ重いアクセントだと苦手なところがあるかな。でも、初挑戦にしてはかなり良いと思うよ!頑張ってね、ElevenLabs!🚀




ElevenLabs의 Scribe 정말 멋지네요! 음성에서 텍스트로 변환하는 게 이렇게 부드럽다니. 다만, 조금 무거운 억양은 어려워하는 것 같아요. 그래도 첫 시도치고는 꽤 훌륭해요! 계속 화이팅, ElevenLabs! 🚀












