Home
ElevenLabs Sets New Speech-to-Text Benchmark; Google Gemini Follows with Broad Capabilities
Artificial Analysis has released the latest version of its speech-to-text benchmark, AA-WER v2.0. The findings highlight ElevenLabs and Google as clear leaders in audio transcription performance.

When measured by the core word error rate (WER), ElevenLabs' Scribe v2 achieved the top spot with an impressively low 2.3% error rate. Close behind was Google's Gemini3Pro at 2.9%. It's worth noting that Google did not fine-tune Gemini for transcription; this result stems purely from its robust multimodal general capabilities.
Other leading models showed the following results:
Mistral Voxtral Small: Took third place with a 3.0% error rate.
Google Gemini3Flash: Delivered a solid performance with a 3.1% error rate.
OpenAI Whisper Large v3: The most widely-used open-source model placed in the middle of the pack with a 4.2% error rate.
Lowest performers: Alibaba's Qwen3ASR Flash (5.9%), Amazon's Nova2Omni (6.0%), and Rev AI (6.1%) rounded out the bottom of the rankings.

In the dedicated AA-AgentTalk benchmark for voice assistant commands, the leaderboard remained consistent. ElevenLabs' Scribe v2 and Google's Gemini3Pro maintained their lead with error rates of 1.6% and 1.7% respectively, proving highly reliable for short, direct voice interactions.
Related article
Anthropic Study Links Polished AI Content to Reduced Human Thinking
When you see AI instantly produce a well-structured, logically clear piece of code or document, are you tempted to trust it without a second thought? According to AIbase, the leading AI company Anthropic recently published a research report titled "A
UK Government Departments Clash Over Energy Needs for AI Data Centers
The UK government is grappling with a major challenge: advancing clean energy while aiming to become a global leader in artificial intelligence. Yet serious inconsistencies appear between the departments responsible for these goals. The Department fo
Cyberspace Administration of China mandates tagging of AI-generated and fictional short videos
The Cyberspace Administration of China has rolled out a comprehensive plan to standardize short video content labeling, mandating that platforms offer six required tags—including "AI-generated content"—ushering in a new era of mandatory transparency
Related Special Topic Recommendations
Comments (1)
0/500
Artificial Analysis has released the latest version of its speech-to-text benchmark, AA-WER v2.0. The findings highlight ElevenLabs and Google as clear leaders in audio transcription performance.

When measured by the core word error rate (WER), ElevenLabs' Scribe v2 achieved the top spot with an impressively low 2.3% error rate. Close behind was Google's Gemini3Pro at 2.9%. It's worth noting that Google did not fine-tune Gemini for transcription; this result stems purely from its robust multimodal general capabilities.
Other leading models showed the following results:
Mistral Voxtral Small: Took third place with a 3.0% error rate.
Google Gemini3Flash: Delivered a solid performance with a 3.1% error rate.
OpenAI Whisper Large v3: The most widely-used open-source model placed in the middle of the pack with a 4.2% error rate.
Lowest performers: Alibaba's Qwen3ASR Flash (5.9%), Amazon's Nova2Omni (6.0%), and Rev AI (6.1%) rounded out the bottom of the rankings.

In the dedicated AA-AgentTalk benchmark for voice assistant commands, the leaderboard remained consistent. ElevenLabs' Scribe v2 and Google's Gemini3Pro maintained their lead with error rates of 1.6% and 1.7% respectively, proving highly reliable for short, direct voice interactions.
Anthropic Study Links Polished AI Content to Reduced Human Thinking
When you see AI instantly produce a well-structured, logically clear piece of code or document, are you tempted to trust it without a second thought? According to AIbase, the leading AI company Anthropic recently published a research report titled "A
UK Government Departments Clash Over Energy Needs for AI Data Centers
The UK government is grappling with a major challenge: advancing clean energy while aiming to become a global leader in artificial intelligence. Yet serious inconsistencies appear between the departments responsible for these goals. The Department fo
Cyberspace Administration of China mandates tagging of AI-generated and fictional short videos
The Cyberspace Administration of China has rolled out a comprehensive plan to standardize short video content labeling, mandating that platforms offer six required tags—including "AI-generated content"—ushering in a new era of mandatory transparency











