ElevenLabs Unveils New Speech-to-Text Model
ElevenLabs, an AI startup that recently secured a whopping $180 million in funding, is famous for its audio-generation skills. But now, they've taken a bold step into new territory by launching their first stand-alone speech-to-text model, called Scribe.
Valued at $3.3 billion, ElevenLabs has been a go-to for many companies needing text-to-speech services, thanks to their huge collection of voices. Now, they're setting their sights on speech detection, aiming to take on big names like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper models.
Scribe isn't messing around—it supports over 99 languages right out of the gate. ElevenLabs says it's got excellent accuracy for over 25 languages, with a word error rate of less than 5%. We're talking English (with a claimed accuracy of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese, among others. The rest fall into different accuracy categories: high (5% to 10% word error rate), good (10% to 20% word error rate), and moderate (25% to 50%).
The company claims Scribe beats out Google Gemini 2.0 Flash and Whisper Large V3 in multiple languages, according to FLEURS & Common Voice benchmark tests.

Image Credits:ElevenLabs
ElevenLabs actually built the speech-to-text part for their AI conversational agent platform last year, but Scribe is their first go at a stand-alone speech detection model. In a chat with TechCrunch last month, CEO Mati Staniszewski spilled the beans on their plans to beef up their speech detection tech.
"We want to get better at understanding what you're saying in a conversation. We're not just about generating content anymore; we're moving into understanding and transcribing speech," Staniszewski said. "A lot of folks think speech-to-text is old news, but for many languages, it's still pretty rough. We think we can do better because we've got in-house teams to label data and give us quick feedback."
Scribe's got some cool features too, like smart speaker diarization to tell you who's talking, timestamps at the word level for spot-on subtitles, and auto-tagging of sound events like audience laughter. Plus, ElevenLabs is letting customers directly transcribe video content to add subtitles or captions in their studio.
Right now, Scribe only works with pre-recorded audio. But don't worry, the company says they're working on a low-latency real-time version soon. So, it's not quite ready for meeting transcriptions or voice note-taking just yet.
ElevenLabs is charging $0.40 per hour of transcribed audio for Scribe. It's a competitive price, but some rivals are offering cheaper rates for audio transcriptions, with a few different features thrown in.
Related article
Google's AI Now Handles Phone Calls for You
Google has expanded its AI calling feature to all US users through Search, enabling customers to inquire about pricing and availability with local businesses without phone conversations. Initially tested in January, this capability currently supports
Trump Exempts Smartphones, Computers, and Chips from Tariff Hikes
The Trump administration has granted exclusions for smartphones, computers, and various electronic devices from recent tariff increases, even when imported from China, according to Bloomberg reporting. However, these products remain subject to earlie
AI Reimagines Michael Jackson in the Metaverse with Stunning Digital Transformations
Artificial intelligence is fundamentally reshaping our understanding of creativity, entertainment, and cultural legacy. This exploration into AI-generated interpretations of Michael Jackson reveals how cutting-edge technology can breathe new life int
Comments (29)
0/200
MiaDavis
September 4, 2025 at 8:30:33 PM EDT
스타트업이 이렇게 빠르게 성장하는 걸 보면 놀랍네요 ㅎㅎ 음성 분야는 경쟁이 심한데, ElevenLabs가 STT 시장에서도 성공할 수 있을까요? 투자금 1억 8천만 달러로 뭔가 특별한 기술을 만들겠죠? 🤔
0
LawrenceLopez
August 30, 2025 at 4:30:33 PM EDT
A ElevenLabs não para de inovar! Esse novo modelo de speech-to-text parece promissor, mas fico pensando... será que vai conseguir competir com gigantes como Google e OpenAI no mercado de transcrição? 🤔 Espero que ofereça algo único pra justificar o hype!
0
TimothyMartínez
August 21, 2025 at 9:01:20 AM EDT
Scribe sounds like a game-changer! I'm curious if it'll handle my thick accent as well as it claims. Excited to try it for podcast transcriptions! 😎
0
MatthewTaylor
August 12, 2025 at 5:00:59 PM EDT
Just saw ElevenLabs' Scribe model news—97% accuracy in English is wild! 😮 I'm curious how it'll handle my thick accent in meetings. Hope they drop that real-time version soon!
0
RogerRoberts
April 20, 2025 at 9:44:55 PM EDT
¡El Scribe de ElevenLabs es genial! Es increíble cómo han entrado en el mercado de voz a texto con un modelo tan sólido. Mi única queja es que a veces tiene problemas con acentos fuertes. Pero, para ser el primer intento, es bastante impresionante. ¡Sigan así, ElevenLabs! 🚀
0
RalphHill
April 20, 2025 at 4:36:44 PM EDT
O novo modelo Scribe do ElevenLabs é incrível! Eles passaram da geração de áudio para o reconhecimento de fala de forma tão suave. Testei e a precisão é boa, mas tropeça um pouco com sotaques fortes. Vale a pena conferir se você gosta de IA! 😊
0
ElevenLabs, an AI startup that recently secured a whopping $180 million in funding, is famous for its audio-generation skills. But now, they've taken a bold step into new territory by launching their first stand-alone speech-to-text model, called Scribe.
Valued at $3.3 billion, ElevenLabs has been a go-to for many companies needing text-to-speech services, thanks to their huge collection of voices. Now, they're setting their sights on speech detection, aiming to take on big names like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper models.
Scribe isn't messing around—it supports over 99 languages right out of the gate. ElevenLabs says it's got excellent accuracy for over 25 languages, with a word error rate of less than 5%. We're talking English (with a claimed accuracy of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese, among others. The rest fall into different accuracy categories: high (5% to 10% word error rate), good (10% to 20% word error rate), and moderate (25% to 50%).
The company claims Scribe beats out Google Gemini 2.0 Flash and Whisper Large V3 in multiple languages, according to FLEURS & Common Voice benchmark tests.
ElevenLabs actually built the speech-to-text part for their AI conversational agent platform last year, but Scribe is their first go at a stand-alone speech detection model. In a chat with TechCrunch last month, CEO Mati Staniszewski spilled the beans on their plans to beef up their speech detection tech.
"We want to get better at understanding what you're saying in a conversation. We're not just about generating content anymore; we're moving into understanding and transcribing speech," Staniszewski said. "A lot of folks think speech-to-text is old news, but for many languages, it's still pretty rough. We think we can do better because we've got in-house teams to label data and give us quick feedback."
Scribe's got some cool features too, like smart speaker diarization to tell you who's talking, timestamps at the word level for spot-on subtitles, and auto-tagging of sound events like audience laughter. Plus, ElevenLabs is letting customers directly transcribe video content to add subtitles or captions in their studio.
Right now, Scribe only works with pre-recorded audio. But don't worry, the company says they're working on a low-latency real-time version soon. So, it's not quite ready for meeting transcriptions or voice note-taking just yet.
ElevenLabs is charging $0.40 per hour of transcribed audio for Scribe. It's a competitive price, but some rivals are offering cheaper rates for audio transcriptions, with a few different features thrown in.




스타트업이 이렇게 빠르게 성장하는 걸 보면 놀랍네요 ㅎㅎ 음성 분야는 경쟁이 심한데, ElevenLabs가 STT 시장에서도 성공할 수 있을까요? 투자금 1억 8천만 달러로 뭔가 특별한 기술을 만들겠죠? 🤔




A ElevenLabs não para de inovar! Esse novo modelo de speech-to-text parece promissor, mas fico pensando... será que vai conseguir competir com gigantes como Google e OpenAI no mercado de transcrição? 🤔 Espero que ofereça algo único pra justificar o hype!




Scribe sounds like a game-changer! I'm curious if it'll handle my thick accent as well as it claims. Excited to try it for podcast transcriptions! 😎




Just saw ElevenLabs' Scribe model news—97% accuracy in English is wild! 😮 I'm curious how it'll handle my thick accent in meetings. Hope they drop that real-time version soon!




¡El Scribe de ElevenLabs es genial! Es increíble cómo han entrado en el mercado de voz a texto con un modelo tan sólido. Mi única queja es que a veces tiene problemas con acentos fuertes. Pero, para ser el primer intento, es bastante impresionante. ¡Sigan así, ElevenLabs! 🚀




O novo modelo Scribe do ElevenLabs é incrível! Eles passaram da geração de áudio para o reconhecimento de fala de forma tão suave. Testei e a precisão é boa, mas tropeça um pouco com sotaques fortes. Vale a pena conferir se você gosta de IA! 😊












