ElevenLabs Unveils New Speech-to-Text Model
ElevenLabs, an AI startup that recently secured a whopping $180 million in funding, is famous for its audio-generation skills. But now, they've taken a bold step into new territory by launching their first stand-alone speech-to-text model, called Scribe.
Valued at $3.3 billion, ElevenLabs has been a go-to for many companies needing text-to-speech services, thanks to their huge collection of voices. Now, they're setting their sights on speech detection, aiming to take on big names like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper models.
Scribe isn't messing around—it supports over 99 languages right out of the gate. ElevenLabs says it's got excellent accuracy for over 25 languages, with a word error rate of less than 5%. We're talking English (with a claimed accuracy of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese, among others. The rest fall into different accuracy categories: high (5% to 10% word error rate), good (10% to 20% word error rate), and moderate (25% to 50%).
The company claims Scribe beats out Google Gemini 2.0 Flash and Whisper Large V3 in multiple languages, according to FLEURS & Common Voice benchmark tests.

Image Credits:ElevenLabs
ElevenLabs actually built the speech-to-text part for their AI conversational agent platform last year, but Scribe is their first go at a stand-alone speech detection model. In a chat with TechCrunch last month, CEO Mati Staniszewski spilled the beans on their plans to beef up their speech detection tech.
"We want to get better at understanding what you're saying in a conversation. We're not just about generating content anymore; we're moving into understanding and transcribing speech," Staniszewski said. "A lot of folks think speech-to-text is old news, but for many languages, it's still pretty rough. We think we can do better because we've got in-house teams to label data and give us quick feedback."
Scribe's got some cool features too, like smart speaker diarization to tell you who's talking, timestamps at the word level for spot-on subtitles, and auto-tagging of sound events like audience laughter. Plus, ElevenLabs is letting customers directly transcribe video content to add subtitles or captions in their studio.
Right now, Scribe only works with pre-recorded audio. But don't worry, the company says they're working on a low-latency real-time version soon. So, it's not quite ready for meeting transcriptions or voice note-taking just yet.
ElevenLabs is charging $0.40 per hour of transcribed audio for Scribe. It's a competitive price, but some rivals are offering cheaper rates for audio transcriptions, with a few different features thrown in.
Related article
Anthropic's experimental AI Claude completes negotiations and transactions in e-commerce test
As artificial intelligence advances rapidly, Anthropic quietly rolled out an internal experiment called "Project Deal" last Friday, showcasing AI's potential in e-commerce. The experiment had its AI model Claude autonomously handle buying, selling, a
DeepSeek Code poised for launch
As AI technology accelerates, DeepSeek is at a thrilling juncture. The AI company recently revealed it has secured over 70 billion yuan in funding. Leadership has emphasized a commitment to groundbreaking AI research over immediate commercial gains.
Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff?
Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look
Related Special Topic Recommendations
Comments (29)
0/500
스타트업이 이렇게 빠르게 성장하는 걸 보면 놀랍네요 ㅎㅎ 음성 분야는 경쟁이 심한데, ElevenLabs가 STT 시장에서도 성공할 수 있을까요? 투자금 1억 8천만 달러로 뭔가 특별한 기술을 만들겠죠? 🤔
A ElevenLabs não para de inovar! Esse novo modelo de speech-to-text parece promissor, mas fico pensando... será que vai conseguir competir com gigantes como Google e OpenAI no mercado de transcrição? 🤔 Espero que ofereça algo único pra justificar o hype!
Scribe sounds like a game-changer! I'm curious if it'll handle my thick accent as well as it claims. Excited to try it for podcast transcriptions! 😎
Just saw ElevenLabs' Scribe model news—97% accuracy in English is wild! 😮 I'm curious how it'll handle my thick accent in meetings. Hope they drop that real-time version soon!
¡El Scribe de ElevenLabs es genial! Es increíble cómo han entrado en el mercado de voz a texto con un modelo tan sólido. Mi única queja es que a veces tiene problemas con acentos fuertes. Pero, para ser el primer intento, es bastante impresionante. ¡Sigan así, ElevenLabs! 🚀
ElevenLabs, an AI startup that recently secured a whopping $180 million in funding, is famous for its audio-generation skills. But now, they've taken a bold step into new territory by launching their first stand-alone speech-to-text model, called Scribe.
Valued at $3.3 billion, ElevenLabs has been a go-to for many companies needing text-to-speech services, thanks to their huge collection of voices. Now, they're setting their sights on speech detection, aiming to take on big names like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper models.
Scribe isn't messing around—it supports over 99 languages right out of the gate. ElevenLabs says it's got excellent accuracy for over 25 languages, with a word error rate of less than 5%. We're talking English (with a claimed accuracy of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese, among others. The rest fall into different accuracy categories: high (5% to 10% word error rate), good (10% to 20% word error rate), and moderate (25% to 50%).
The company claims Scribe beats out Google Gemini 2.0 Flash and Whisper Large V3 in multiple languages, according to FLEURS & Common Voice benchmark tests.

ElevenLabs actually built the speech-to-text part for their AI conversational agent platform last year, but Scribe is their first go at a stand-alone speech detection model. In a chat with TechCrunch last month, CEO Mati Staniszewski spilled the beans on their plans to beef up their speech detection tech.
"We want to get better at understanding what you're saying in a conversation. We're not just about generating content anymore; we're moving into understanding and transcribing speech," Staniszewski said. "A lot of folks think speech-to-text is old news, but for many languages, it's still pretty rough. We think we can do better because we've got in-house teams to label data and give us quick feedback."
Scribe's got some cool features too, like smart speaker diarization to tell you who's talking, timestamps at the word level for spot-on subtitles, and auto-tagging of sound events like audience laughter. Plus, ElevenLabs is letting customers directly transcribe video content to add subtitles or captions in their studio.
Right now, Scribe only works with pre-recorded audio. But don't worry, the company says they're working on a low-latency real-time version soon. So, it's not quite ready for meeting transcriptions or voice note-taking just yet.
ElevenLabs is charging $0.40 per hour of transcribed audio for Scribe. It's a competitive price, but some rivals are offering cheaper rates for audio transcriptions, with a few different features thrown in.
Anthropic's experimental AI Claude completes negotiations and transactions in e-commerce test
As artificial intelligence advances rapidly, Anthropic quietly rolled out an internal experiment called "Project Deal" last Friday, showcasing AI's potential in e-commerce. The experiment had its AI model Claude autonomously handle buying, selling, a
DeepSeek Code poised for launch
As AI technology accelerates, DeepSeek is at a thrilling juncture. The AI company recently revealed it has secured over 70 billion yuan in funding. Leadership has emphasized a commitment to groundbreaking AI research over immediate commercial gains.
Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff?
Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look
스타트업이 이렇게 빠르게 성장하는 걸 보면 놀랍네요 ㅎㅎ 음성 분야는 경쟁이 심한데, ElevenLabs가 STT 시장에서도 성공할 수 있을까요? 투자금 1억 8천만 달러로 뭔가 특별한 기술을 만들겠죠? 🤔
A ElevenLabs não para de inovar! Esse novo modelo de speech-to-text parece promissor, mas fico pensando... será que vai conseguir competir com gigantes como Google e OpenAI no mercado de transcrição? 🤔 Espero que ofereça algo único pra justificar o hype!
Scribe sounds like a game-changer! I'm curious if it'll handle my thick accent as well as it claims. Excited to try it for podcast transcriptions! 😎
Just saw ElevenLabs' Scribe model news—97% accuracy in English is wild! 😮 I'm curious how it'll handle my thick accent in meetings. Hope they drop that real-time version soon!
¡El Scribe de ElevenLabs es genial! Es increíble cómo han entrado en el mercado de voz a texto con un modelo tan sólido. Mi única queja es que a veces tiene problemas con acentos fuertes. Pero, para ser el primer intento, es bastante impresionante. ¡Sigan así, ElevenLabs! 🚀





Home






