option
Home
News
OpenAI upgrades its transcription and voice-generating AI models

OpenAI upgrades its transcription and voice-generating AI models

April 10, 2025
227

OpenAI is rolling out new AI models for transcription and voice generation via its API, promising significant improvements over their earlier versions. These updates are part of OpenAI's larger "agentic" vision, which focuses on creating autonomous systems capable of performing tasks independently for users. While the term "agent" can be debated, OpenAI's Head of Product, Olivier Godement, sees it as a chatbot that can interact with a business's customers.

"We're going to see more and more agents emerge in the coming months," Godement shared with TechCrunch during a briefing. "The overarching goal is to assist customers and developers in utilizing agents that are useful, accessible, and precise."

OpenAI's latest text-to-speech model, dubbed "gpt-4o-mini-tts," not only aims to produce more lifelike and nuanced speech but is also more adaptable than its predecessors. Developers can now guide the model using natural language commands, such as "speak like a mad scientist" or "use a serene voice, like a mindfulness teacher." This level of control allows for a more personalized voice experience.

Here’s a sample of a "true crime-style," weathered voice:

And here’s an example of a female "professional" voice:

Jeff Harris, a member of OpenAI's product team, emphasized to TechCrunch that the objective is to enable developers to customize both the voice "experience" and "context." "In various scenarios, you don't want a monotonous voice," Harris explained. "For instance, in a customer support setting where the voice needs to sound apologetic for a mistake, you can infuse that emotion into the voice. We strongly believe that developers and users want to control not just the content, but the manner of speech."

Moving to OpenAI's new speech-to-text offerings, "gpt-4o-transcribe" and "gpt-4o-mini-transcribe," these models are set to replace the outdated Whisper transcription model. Trained on a diverse array of high-quality audio data, they claim to better handle accented and varied speech, even in noisy settings. Additionally, these models are less prone to "hallucinations," a problem where Whisper would sometimes invent words or entire passages, adding inaccuracies like racial commentary or fictitious medical treatments to transcripts.

"These models show significant improvement over Whisper in this regard," Harris noted. "Ensuring model accuracy is crucial for a dependable voice experience, and by accuracy, we mean the models correctly capture the spoken words without adding unvoiced content."

However, performance may vary across languages. OpenAI's internal benchmarks indicate that gpt-4o-transcribe, the more precise of the two, has a "word error rate" nearing 30% for Indic and Dravidian languages like Tamil, Telugu, Malayalam, and Kannada. This suggests that about three out of every ten words might differ from a human transcription in these languages.

OpenAI transcription results

The results from OpenAI transcription benchmarking. Image Credits: OpenAI

In a departure from their usual practice, OpenAI won't be making these new transcription models freely available. Historically, they released new Whisper versions under an MIT license for commercial use. Harris pointed out that gpt-4o-transcribe and gpt-4o-mini-transcribe are significantly larger than Whisper, making them unsuitable for open release.

"These models are too big to run on a typical laptop like Whisper could," Harris added. "When we release models openly, we want to do it thoughtfully, ensuring they're tailored for specific needs. We see end-user devices as a prime area for open-source models."

Updated March 20, 2025, 11:54 a.m. PT to clarify the language around word error rate and update the benchmark results chart with a more recent version.

Related article
Satya Nadella ready to exploit new OpenAI deal Satya Nadella ready to exploit new OpenAI deal On Wednesday, a Wall Street analyst asked Microsoft CEO Satya Nadella directly how the revised OpenAI partnership would affect the company’s financials.Nadella described the new agreement as a win for everyone. “We feel good about our partnership wit
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Greg Brockman reveals how Elon Musk departed OpenAI Greg Brockman reveals how Elon Musk departed OpenAI In late August 2017, key figures at OpenAI—then a small nonprofit research lab—met to discuss how they would establish a for-profit entity to commercialize their technology and raise the capital needed to achieve AGI.Elon Musk was demanding full cont
Related Special Topic Recommendations
Business Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically
Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically

2026 Latest Best AI Expense Trackers: Top-rated tools to scan receipts & categorize corporate spend automatically. Discover powerful, game-changing solutions for effortless expense management, accurate financial tracking, and streamlined compliance. Our curated, weekly-updated comparison of free vs paid options helps you find the perfect fit. Unlock your AI edge with XIX.AI's expert picks.

10 tools
xix.ai
Business Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling
Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling

Discover the 2026 latest top-rated AI recruiting tools on XIX.AI. Our curated list features powerful, game-changing solutions for screening resumes and automating candidate interview scheduling. Compare free vs paid options with real-world tests and weekly updated rankings. Find your perfect hiring assistant and streamline your recruitment today!

10 tools
xix.ai
Productivity AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels
AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels

Discover the 2026 best AI personal wellness and focus coaches on XIX.AI. Our curated rankings feature top-rated, game-changing tools to manage burnout and boost mental energy. Compare free vs paid options with real-world insights. Unlock your path to peak productivity and well-being today.

10 tools
xix.ai
chatbot Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities
Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities

Discover the 2026 latest top-rated AI romantic chatbots for building genuine, long-term connections. Our curated list features powerful, consistent personalities, free vs paid comparisons, and real-world tests. Find your perfect companion and start building today at XIX.AI.

10 tools
xix.ai
Education and Learning Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows
Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows

Discover the 2026 best AI data science mentors to master SQL, Pandas & ML workflows. Explore our top-rated, curated selection at XIX.AI for powerful, game-changing guidance. Compare free vs paid options with real-world insights. Unlock your data science mastery today.

10 tools
xix.ai
chatbot Best AI Flirting & Conversation Trainers: Improve Social Charisma and Confidence in Real-Time
Best AI Flirting & Conversation Trainers: Improve Social Charisma and Confidence in Real-Time

Discover the 2026 best AI flirting and conversation trainers on XIX.AI. Our curated, top-rated selection helps you build social charisma and confidence in real-time. Explore must-try, game-changing tools with free vs paid comparisons and weekly updated rankings. Unlock your social edge today.

10 tools
xix.ai
Comments (33)
0/500
LeviKing
LeviKing December 22, 2025 at 7:30:38 PM EST

음성 생성 모델 향상이라... 이게 결국 콜센터 직원 대체 같은 데 쓰이면 실업률 걱정이네요. 기술 좋지만 사회적 영향도 고민해야 할 문제 같아요.

FrankMartínez
FrankMartínez August 19, 2025 at 4:01:39 AM EDT

The new OpenAI models sound like a game-changer for voice tech! Can't wait to see how devs use this to make apps talk smoother than ever. 😎

BenHernández
BenHernández July 23, 2025 at 4:50:48 AM EDT

Wow, OpenAI's new transcription and voice models sound like a game-changer! I'm curious how these 'agentic' systems will stack up against real-world tasks. Could they finally nail natural-sounding convos? 🤔

GeorgeTaylor
GeorgeTaylor April 20, 2025 at 3:57:07 PM EDT

Os novos modelos de transcrição e geração de voz da OpenAI são um divisor de águas! Estou usando no meu podcast e as melhorias são impressionantes. O único ponto negativo? São um pouco caros, mas se você puder pagar, vale cada centavo! 🎙️💸

GregoryAllen
GregoryAllen April 17, 2025 at 12:50:37 AM EDT

OpenAI's new transcription and voice models are a game changer! I've been using them for my podcast and the improvements are night and day. The only downside? They're a bit pricey, but if you can swing it, they're worth every penny! 🎙️💸

StevenAllen
StevenAllen April 17, 2025 at 12:38:26 AM EDT

OpenAI의 새로운 음성 인식 및 음성 생성 모델은 정말 혁신적이에요! 제 팟캐스트에서 사용 중인데, 개선이 눈에 띄어요. 단점은 조금 비싸다는 건데, 감당할 수 있다면 그만한 가치가 있어요! 🎙️💸

OR