OpenAI Yet to Release Voice Cloning Tool a Year Later
OpenAI's Voice Engine: A Long-Awaited Release?
Late last March, OpenAI introduced a "small-scale preview" of its AI service, Voice Engine, which promised to clone a person's voice using just 15 seconds of speech. Fast forward a year, and the tool is still in preview mode, with no clear timeline for a full launch—or even confirmation that it will ever see the light of day.
The hesitation to roll out Voice Engine widely could stem from concerns about misuse, or perhaps an attempt to sidestep regulatory scrutiny. OpenAI has faced criticism in the past for prioritizing flashy products over safety and for rushing to market ahead of competitors.
An OpenAI spokesperson told TechCrunch that the company is still testing Voice Engine with a select group of "trusted partners." "We're learning from how our partners are using the technology to enhance the model's utility and safety," the spokesperson explained. "It's been exciting to see its applications, ranging from speech therapy and language learning to customer support, video game characters, and AI avatars."
Voice Engine: The Journey So Far
Voice Engine, which drives the voices in OpenAI's text-to-speech API and ChatGPT's Voice Mode, creates remarkably natural-sounding speech that closely mimics the original speaker. It converts text into speech, constrained only by certain content guidelines. However, the rollout has been plagued by delays and shifting release dates from the start.
In a June 2024 blog post, OpenAI detailed how the Voice Engine model learns to predict the sounds a speaker would likely make for a given text, considering various voices, accents, and speaking styles. This allows the model not just to generate speech from text but also to produce "spoken utterances" that reflect how different speakers would voice the text aloud.
Originally, Voice Engine, then called Custom Voices, was set to join OpenAI's API on March 7, 2024, according to a draft blog post seen by TechCrunch. The plan was to initially offer access to up to 100 "trusted developers," prioritizing those developing apps with social benefits or showing innovative and responsible use of the technology. OpenAI had already trademarked the service and set pricing at $15 per million characters for "standard" voices and $30 per million characters for "HD quality" voices.
But at the last moment, the announcement was delayed. A few weeks later, OpenAI unveiled Voice Engine without a sign-up option, limiting access to a small group of developers they had been working with since late 2023.
"We hope to start a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities," OpenAI stated in the late March 2024 announcement blog post. "Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale."
A Long Development Road
Voice Engine has been in development since 2022, with OpenAI showcasing its potential—and risks—to global policymakers in the summer of 2023. Today, several partners have access to Voice Engine, including startup Livox, which aims to help people with disabilities communicate more naturally. However, Livox CEO Carlos Pereira noted that they couldn't integrate Voice Engine into their products because it requires an internet connection, which many of their customers lack. "The quality of the voice and the ability to have the voices speak in different languages is unique—especially for our customers with disabilities," Pereira told TechCrunch via email. "It's really the most impressive and easy-to-use tool to create voices that I've seen... We hope that OpenAI develops an offline version soon."
Pereira has not received any indication from OpenAI about a potential launch date or plans to charge for the service, and so far, Livox has not had to pay for its usage.
In a June 2024 post, OpenAI suggested that one reason for delaying Voice Engine was the potential for abuse during the U.S. election cycle. The company has implemented safety measures, including watermarking to trace the origin of generated audio. Developers must obtain "explicit consent" from the original speaker and make "clear disclosures" to their audience that voices are AI-generated. However, OpenAI has not detailed how these policies will be enforced at scale, which could be a significant challenge.
OpenAI also hinted at building a "voice authentication experience" to verify speakers and a "no-go" list to prevent the creation of voices resembling prominent figures. These are ambitious projects, and any missteps could further damage OpenAI's reputation regarding safety initiatives.
Effective filtering and ID verification are becoming essential for responsibly releasing voice cloning technology. AI voice cloning was the third fastest-growing scam of 2024, leading to fraud and bypassing bank security checks as privacy and copyright laws struggle to keep pace. Malicious actors have used voice cloning to create deepfakes of celebrities and politicians, which have spread rapidly on social media.
OpenAI might release Voice Engine next week, or it might never happen. The company has mentioned considering keeping the service small in scope. But one thing is certain: whether for optics, safety, or both, Voice Engine's limited preview has become one of the longest in OpenAI's history.
Related article
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week
As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Greg Brockman reveals how Elon Musk departed OpenAI
In late August 2017, key figures at OpenAI—then a small nonprofit research lab—met to discuss how they would establish a for-profit entity to commercialize their technology and raise the capital needed to achieve AGI.Elon Musk was demanding full cont
Pentagon signs deals with Nvidia, Microsoft, AWS to deploy AI on classified networks
After previously reaching agreements with Google, SpaceX, and OpenAI, the U.S. Defense Department announced Friday that it has now signed deals with Nvidia, Microsoft, Amazon Web Services, and Reflection AI to deploy their AI technologies and models
Related Special Topic Recommendations
Comments (15)
0/500
これ、もう1年も経つのにまだプレビュー版なんだね。音声クローン技術って倫理的にすごくデリケートな問題だから、慎重に進めるのは理解できるけど、市場の期待はずっと先送りされてる感じ。他のAI企業はどんどん類似機能をリリースしてるのに、OpenAIは何を待ってるんだろう?🤔 もしかしたら、悪用防止の仕組みを完璧にしたいのかな。でも、待たされるユーザーとしては少しイライラするかも…
Ça fait un an qu'ils promettent cette technologie et toujours rien ? 😅 Moi qui voulais créer une voix IA de mon chat, je crois que je vais devoir attendre encore longtemps. C'est bizarre cette absence de calendrier, peut-être qu'ils ont des problèmes éthiques à régler ?
これ、去年発表されたまま音沙汰ないんですね🤔 声の合成技術は確かにすごいけど、どんな懸念があって公開をためらっているのか気になります。もしかして悪用されそうで怖いからかな?早く使ってみたいけど、慎重になる気持ちもわかる…
¿Un año y todavía no han soltado esa herramienta de clonación de voz? 🤔 Me pregunto si será por problemas técnicos o por miedo al mal uso. Suena a que tiene mucho potencial, pero también da un poco de miedo pensando en el deepfake.
Why's OpenAI dragging their feet on Voice Engine? A year later and still just a preview? Sounds like they're scared of the ethical mess this could stir up. 😬
OpenAI's Voice Engine: A Long-Awaited Release?
Late last March, OpenAI introduced a "small-scale preview" of its AI service, Voice Engine, which promised to clone a person's voice using just 15 seconds of speech. Fast forward a year, and the tool is still in preview mode, with no clear timeline for a full launch—or even confirmation that it will ever see the light of day.
The hesitation to roll out Voice Engine widely could stem from concerns about misuse, or perhaps an attempt to sidestep regulatory scrutiny. OpenAI has faced criticism in the past for prioritizing flashy products over safety and for rushing to market ahead of competitors.
An OpenAI spokesperson told TechCrunch that the company is still testing Voice Engine with a select group of "trusted partners." "We're learning from how our partners are using the technology to enhance the model's utility and safety," the spokesperson explained. "It's been exciting to see its applications, ranging from speech therapy and language learning to customer support, video game characters, and AI avatars."
Voice Engine: The Journey So Far
Voice Engine, which drives the voices in OpenAI's text-to-speech API and ChatGPT's Voice Mode, creates remarkably natural-sounding speech that closely mimics the original speaker. It converts text into speech, constrained only by certain content guidelines. However, the rollout has been plagued by delays and shifting release dates from the start.
In a June 2024 blog post, OpenAI detailed how the Voice Engine model learns to predict the sounds a speaker would likely make for a given text, considering various voices, accents, and speaking styles. This allows the model not just to generate speech from text but also to produce "spoken utterances" that reflect how different speakers would voice the text aloud.
Originally, Voice Engine, then called Custom Voices, was set to join OpenAI's API on March 7, 2024, according to a draft blog post seen by TechCrunch. The plan was to initially offer access to up to 100 "trusted developers," prioritizing those developing apps with social benefits or showing innovative and responsible use of the technology. OpenAI had already trademarked the service and set pricing at $15 per million characters for "standard" voices and $30 per million characters for "HD quality" voices.
But at the last moment, the announcement was delayed. A few weeks later, OpenAI unveiled Voice Engine without a sign-up option, limiting access to a small group of developers they had been working with since late 2023.
"We hope to start a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities," OpenAI stated in the late March 2024 announcement blog post. "Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale."
A Long Development Road
Voice Engine has been in development since 2022, with OpenAI showcasing its potential—and risks—to global policymakers in the summer of 2023. Today, several partners have access to Voice Engine, including startup Livox, which aims to help people with disabilities communicate more naturally. However, Livox CEO Carlos Pereira noted that they couldn't integrate Voice Engine into their products because it requires an internet connection, which many of their customers lack. "The quality of the voice and the ability to have the voices speak in different languages is unique—especially for our customers with disabilities," Pereira told TechCrunch via email. "It's really the most impressive and easy-to-use tool to create voices that I've seen... We hope that OpenAI develops an offline version soon."
Pereira has not received any indication from OpenAI about a potential launch date or plans to charge for the service, and so far, Livox has not had to pay for its usage.
In a June 2024 post, OpenAI suggested that one reason for delaying Voice Engine was the potential for abuse during the U.S. election cycle. The company has implemented safety measures, including watermarking to trace the origin of generated audio. Developers must obtain "explicit consent" from the original speaker and make "clear disclosures" to their audience that voices are AI-generated. However, OpenAI has not detailed how these policies will be enforced at scale, which could be a significant challenge.
OpenAI also hinted at building a "voice authentication experience" to verify speakers and a "no-go" list to prevent the creation of voices resembling prominent figures. These are ambitious projects, and any missteps could further damage OpenAI's reputation regarding safety initiatives.
Effective filtering and ID verification are becoming essential for responsibly releasing voice cloning technology. AI voice cloning was the third fastest-growing scam of 2024, leading to fraud and bypassing bank security checks as privacy and copyright laws struggle to keep pace. Malicious actors have used voice cloning to create deepfakes of celebrities and politicians, which have spread rapidly on social media.
OpenAI might release Voice Engine next week, or it might never happen. The company has mentioned considering keeping the service small in scope. But one thing is certain: whether for optics, safety, or both, Voice Engine's limited preview has become one of the longest in OpenAI's history.
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week
As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Greg Brockman reveals how Elon Musk departed OpenAI
In late August 2017, key figures at OpenAI—then a small nonprofit research lab—met to discuss how they would establish a for-profit entity to commercialize their technology and raise the capital needed to achieve AGI.Elon Musk was demanding full cont
Pentagon signs deals with Nvidia, Microsoft, AWS to deploy AI on classified networks
After previously reaching agreements with Google, SpaceX, and OpenAI, the U.S. Defense Department announced Friday that it has now signed deals with Nvidia, Microsoft, Amazon Web Services, and Reflection AI to deploy their AI technologies and models
これ、もう1年も経つのにまだプレビュー版なんだね。音声クローン技術って倫理的にすごくデリケートな問題だから、慎重に進めるのは理解できるけど、市場の期待はずっと先送りされてる感じ。他のAI企業はどんどん類似機能をリリースしてるのに、OpenAIは何を待ってるんだろう?🤔 もしかしたら、悪用防止の仕組みを完璧にしたいのかな。でも、待たされるユーザーとしては少しイライラするかも…
Ça fait un an qu'ils promettent cette technologie et toujours rien ? 😅 Moi qui voulais créer une voix IA de mon chat, je crois que je vais devoir attendre encore longtemps. C'est bizarre cette absence de calendrier, peut-être qu'ils ont des problèmes éthiques à régler ?
これ、去年発表されたまま音沙汰ないんですね🤔 声の合成技術は確かにすごいけど、どんな懸念があって公開をためらっているのか気になります。もしかして悪用されそうで怖いからかな?早く使ってみたいけど、慎重になる気持ちもわかる…
¿Un año y todavía no han soltado esa herramienta de clonación de voz? 🤔 Me pregunto si será por problemas técnicos o por miedo al mal uso. Suena a que tiene mucho potencial, pero también da un poco de miedo pensando en el deepfake.
Why's OpenAI dragging their feet on Voice Engine? A year later and still just a preview? Sounds like they're scared of the ethical mess this could stir up. 😬





Home






