AI Mental Health Tool Stumbles Upon Effective Deepfake Detection

With tech giant OpenAI releasing its flagship Sora 2 video and audio generation model in September 2025, deepfake videos have overwhelmed social media, making viewers more accustomed to potentially harmful hyper-realistic content.
While OpenAI emphasized responsible deployment of Sora 2 as a key goal—pledging to provide users "tools and choices to manage their feed content" and full control over their likeness—an October 2025 study revealed the model generated misleading videos 80% of the time.
From fake news segments showing a Moldovan election official destroying ballots to fabricated footage of a toddler detained by immigration authorities or a Coca-Cola spokesperson announcing the company would not sponsor the Super Bowl, the risks of misinformation in our connected world have never been greater.
Beyond Sora: The Rise of Vishing
Even before OpenAI’s tool debuted, the creation and spread of deepfake material was accelerating. A September 2025 report from cybersecurity firm DeepStrike noted deepfake content jumped from 500,000 instances in 2023 to 8 million in 2025, with much used for fraud.
This trend shows no slowdown; AI-related fraud in the United States is projected to hit $40 billion by 2027.
The surge isn’t just in volume. Thanks to tools like Sora 2 and Google’s Veo 3, AI-generated faces, voices, and full-body performances appear more convincing than ever. As noted by computer scientist and deepfake expert Siwei Lyu, current models can generate stable, distortion-free faces, while voice cloning has reached an "indistinguishable" level.
The reality is, deepfakes are evolving faster than detection methods. What tech firms market as entertaining tools for creating Olympic gymnastics routines or rich audio backdrops is also being exploited by criminals targeting businesses and individuals. In just the first half of 2025, deepfake scams caused $356 million in corporate losses and $541 million in personal losses.
Conventional deepfake detection—such as checking watermarks, airbrushed faces, and metadata—is falling short. Meanwhile, voice deepfakes rank as the second most common type of AI-enabled fraud, with voice phishing (vishing) attacks soaring 442% in 2025, making the impact felt widely.
"Just a few seconds of audio can now produce a believable clone—complete with natural intonation, rhythm, emphasis, emotion, pauses, and even breathing sounds," Lyu stated.
Listening to the Human Voice
Kintsugi, a healthtech startup, develops AI voice biomarker technology to identify signs of clinical depression and anxiety. Their work began with a straightforward idea: we need to truly listen to people.
"I founded Kintsugi based on my own experience. It took me almost five months of calling my provider just to schedule a first therapy appointment, and no one called back. I kept trying—but I remember thinking that if this were my dad or brother, they would have given up much sooner," CEO Grace Chang told Unite.AI.
The California-based company launched in 2019 to address what Chang called a "triage bottleneck." She believed early, passive detection of severity could help direct people to appropriate care more quickly. Through Kintsugi Voice, vocal biomarkers help identify clinical depression and anxiety.
Multiple studies support the use of AI-powered speech analysis as a biomarker for mental health. One May 2025 paper, for instance, showed that acoustic biomarkers can spot early signs of mental health issues and neurodivergence, and advocated for singing analysis in clinical settings to evaluate possible cognitive decline.
According to the American Psychiatric Association, voice analysis accurately distinguishes people with depression from those without 78% to 96% of the time. Another study used a one-minute verbal fluency test—where a person names as many words as possible in a category—and achieved 70% to 83% accuracy in detecting co-occurring depression and anxiety.
To evaluate mental health, Kintsugi collects a brief voice sample. Its vocal biomarker technology then examines pitch, intonation, tone, and pauses—features linked to depression, anxiety, bipolar disorder, and dementia.
What Chang didn’t anticipate was that this technology also addressed a critical challenge in security: pinpointing what makes a voice genuinely human.
From Mental Health to Cybersecurity
While at a New York summit in late 2025, Chang mentioned to a cybersecurity friend that her team’s tests with synthetic voices had been underwhelming.
"We were trying synthetic data to enhance training for our mental health models, but the generated voices were so unlike real human speech we could spot them almost every time," she explained.
"He stopped me and said, 'Grace—that’s an unsolved problem in security.' That’s when everything connected. Since then, discussions with security, financial, and telecom firms have highlighted how fast deepfake voice attacks are growing—and how critical it is to tell human from synthetic voices in live calls," the CEO added.
In April last year, the FBI alerted the public to a malicious text and voice campaign impersonating senior U.S. officials and targeting former government employees and their contacts. Major U.S. banks faced an average of 5.5 daily voice fraud attempts, and staff at Vanderbilt University Medical Center reported vishing attacks from impostors posing as friends, supervisors, and coworkers.
Initially, deepfakes weren’t a focus for Kintsugi. Although the team used models like Cartesia, Sesame, and ElevenLabs to simulate synthetic voices for call center agents and workflows, deepfake fraud wasn’t a priority in a market filled with accessible tools like Sora.
Yet, the cues that confirm voice authenticity are the same biomarkers that define human speech. Regardless of language or meaning, Kintsugi Voice analyzes signal processing and physical speech latency, capturing subtle timing, prosodic variation, cognitive load, and physiological traits—focusing on how speech is formed, not what is said.
"Synthetic voices may sound fluent, but they lack the same biological and cognitive nuances," Chang noted. The company’s model ranks in the top 10% for detection accuracy, needing only 3 to 5 seconds of audio.
Kintsugi’s innovation offers promise for those facing mental health challenges, especially where accessing professional care is difficult. Similarly, its technology could transform deepfake detection and cybersecurity by verifying authenticity rather than spotting deepfakes.
Human-Centered Technology as the Future
Cybersecurity has traditionally concentrated on malicious uses or perpetrators. Kintsugi’s unexpected breakthrough, however, relies on human nature itself.
"We’re working on a totally different front: human authenticity. LLMs can’t consistently identify LLM-generated content, and artifact-based techniques are brittle. Gathering large, clinically annotated datasets that capture genuine human variation is costly, slow, and beyond most security firms’ expertise—making our method hard to copy," Chang explained.
The startup’s strategy also points to a wider shift: cross-industry innovation. Leaders in healthcare could pioneer AI-based vishing detection, just as space tech innovators may aid emergency response systems, or gaming architecture might influence urban planning.
As for Chang, she aims to set a standard for confirming real human presence—and eventually, genuine intent—through voice interactions.
"Just as HTTPS became the web’s trust standard, we believe 'proof of human' will become essential for voice-based systems. Signal processing is the start of that framework," she said.
As generative AI advances, the strongest protections may come from grasping what truly makes us human.
Related article
OpenAI CEO Altman Blasts Anthropic for Panic-Driven Marketing Tactics
The ongoing public dispute between AI leaders OpenAI and Anthropic has intensified. Sam Altman, OpenAI's CEO, recently challenged his competitor's latest safety model during a podcast.Altman argues that Anthropic leverages public fear of technology t
Cursor AI Coding Startup to Hire 200 in Asia-Pacific After Significant Investment from SpaceX
AI coding startup Cursor has unveiled a major global expansion, planning to hire 200 employees across the Asia-Pacific region over the next six months. Key roles include marketing engineers, field engineers, and AI deployment engineers. This move und
Claude Used to Create Malicious npm Packages: Over 670 Compromised Threaten Open Source
A recent cybersecurity incident reveals how large language models (LLMs) are being weaponized for malicious software development. Security researcher Sibi Moosa spotted an attacker using the alias "mousie-5212-super-formatter" leveraging Anthropic's
Related Special Topic Recommendations
Comments (1)
0/500

With tech giant OpenAI releasing its flagship Sora 2 video and audio generation model in September 2025, deepfake videos have overwhelmed social media, making viewers more accustomed to potentially harmful hyper-realistic content.
While OpenAI emphasized responsible deployment of Sora 2 as a key goal—pledging to provide users "tools and choices to manage their feed content" and full control over their likeness—an October 2025 study revealed the model generated misleading videos 80% of the time.
From fake news segments showing a Moldovan election official destroying ballots to fabricated footage of a toddler detained by immigration authorities or a Coca-Cola spokesperson announcing the company would not sponsor the Super Bowl, the risks of misinformation in our connected world have never been greater.
Beyond Sora: The Rise of Vishing
Even before OpenAI’s tool debuted, the creation and spread of deepfake material was accelerating. A September 2025 report from cybersecurity firm DeepStrike noted deepfake content jumped from 500,000 instances in 2023 to 8 million in 2025, with much used for fraud.
This trend shows no slowdown; AI-related fraud in the United States is projected to hit $40 billion by 2027.
The surge isn’t just in volume. Thanks to tools like Sora 2 and Google’s Veo 3, AI-generated faces, voices, and full-body performances appear more convincing than ever. As noted by computer scientist and deepfake expert Siwei Lyu, current models can generate stable, distortion-free faces, while voice cloning has reached an "indistinguishable" level.
The reality is, deepfakes are evolving faster than detection methods. What tech firms market as entertaining tools for creating Olympic gymnastics routines or rich audio backdrops is also being exploited by criminals targeting businesses and individuals. In just the first half of 2025, deepfake scams caused $356 million in corporate losses and $541 million in personal losses.
Conventional deepfake detection—such as checking watermarks, airbrushed faces, and metadata—is falling short. Meanwhile, voice deepfakes rank as the second most common type of AI-enabled fraud, with voice phishing (vishing) attacks soaring 442% in 2025, making the impact felt widely.
"Just a few seconds of audio can now produce a believable clone—complete with natural intonation, rhythm, emphasis, emotion, pauses, and even breathing sounds," Lyu stated.
Listening to the Human Voice
Kintsugi, a healthtech startup, develops AI voice biomarker technology to identify signs of clinical depression and anxiety. Their work began with a straightforward idea: we need to truly listen to people.
"I founded Kintsugi based on my own experience. It took me almost five months of calling my provider just to schedule a first therapy appointment, and no one called back. I kept trying—but I remember thinking that if this were my dad or brother, they would have given up much sooner," CEO Grace Chang told Unite.AI.
The California-based company launched in 2019 to address what Chang called a "triage bottleneck." She believed early, passive detection of severity could help direct people to appropriate care more quickly. Through Kintsugi Voice, vocal biomarkers help identify clinical depression and anxiety.
Multiple studies support the use of AI-powered speech analysis as a biomarker for mental health. One May 2025 paper, for instance, showed that acoustic biomarkers can spot early signs of mental health issues and neurodivergence, and advocated for singing analysis in clinical settings to evaluate possible cognitive decline.
According to the American Psychiatric Association, voice analysis accurately distinguishes people with depression from those without 78% to 96% of the time. Another study used a one-minute verbal fluency test—where a person names as many words as possible in a category—and achieved 70% to 83% accuracy in detecting co-occurring depression and anxiety.
To evaluate mental health, Kintsugi collects a brief voice sample. Its vocal biomarker technology then examines pitch, intonation, tone, and pauses—features linked to depression, anxiety, bipolar disorder, and dementia.
What Chang didn’t anticipate was that this technology also addressed a critical challenge in security: pinpointing what makes a voice genuinely human.
From Mental Health to Cybersecurity
While at a New York summit in late 2025, Chang mentioned to a cybersecurity friend that her team’s tests with synthetic voices had been underwhelming.
"We were trying synthetic data to enhance training for our mental health models, but the generated voices were so unlike real human speech we could spot them almost every time," she explained.
"He stopped me and said, 'Grace—that’s an unsolved problem in security.' That’s when everything connected. Since then, discussions with security, financial, and telecom firms have highlighted how fast deepfake voice attacks are growing—and how critical it is to tell human from synthetic voices in live calls," the CEO added.
In April last year, the FBI alerted the public to a malicious text and voice campaign impersonating senior U.S. officials and targeting former government employees and their contacts. Major U.S. banks faced an average of 5.5 daily voice fraud attempts, and staff at Vanderbilt University Medical Center reported vishing attacks from impostors posing as friends, supervisors, and coworkers.
Initially, deepfakes weren’t a focus for Kintsugi. Although the team used models like Cartesia, Sesame, and ElevenLabs to simulate synthetic voices for call center agents and workflows, deepfake fraud wasn’t a priority in a market filled with accessible tools like Sora.
Yet, the cues that confirm voice authenticity are the same biomarkers that define human speech. Regardless of language or meaning, Kintsugi Voice analyzes signal processing and physical speech latency, capturing subtle timing, prosodic variation, cognitive load, and physiological traits—focusing on how speech is formed, not what is said.
"Synthetic voices may sound fluent, but they lack the same biological and cognitive nuances," Chang noted. The company’s model ranks in the top 10% for detection accuracy, needing only 3 to 5 seconds of audio.
Kintsugi’s innovation offers promise for those facing mental health challenges, especially where accessing professional care is difficult. Similarly, its technology could transform deepfake detection and cybersecurity by verifying authenticity rather than spotting deepfakes.
Human-Centered Technology as the Future
Cybersecurity has traditionally concentrated on malicious uses or perpetrators. Kintsugi’s unexpected breakthrough, however, relies on human nature itself.
"We’re working on a totally different front: human authenticity. LLMs can’t consistently identify LLM-generated content, and artifact-based techniques are brittle. Gathering large, clinically annotated datasets that capture genuine human variation is costly, slow, and beyond most security firms’ expertise—making our method hard to copy," Chang explained.
The startup’s strategy also points to a wider shift: cross-industry innovation. Leaders in healthcare could pioneer AI-based vishing detection, just as space tech innovators may aid emergency response systems, or gaming architecture might influence urban planning.
As for Chang, she aims to set a standard for confirming real human presence—and eventually, genuine intent—through voice interactions.
"Just as HTTPS became the web’s trust standard, we believe 'proof of human' will become essential for voice-based systems. Signal processing is the start of that framework," she said.
As generative AI advances, the strongest protections may come from grasping what truly makes us human.
OpenAI CEO Altman Blasts Anthropic for Panic-Driven Marketing Tactics
The ongoing public dispute between AI leaders OpenAI and Anthropic has intensified. Sam Altman, OpenAI's CEO, recently challenged his competitor's latest safety model during a podcast.Altman argues that Anthropic leverages public fear of technology t
Cursor AI Coding Startup to Hire 200 in Asia-Pacific After Significant Investment from SpaceX
AI coding startup Cursor has unveiled a major global expansion, planning to hire 200 employees across the Asia-Pacific region over the next six months. Key roles include marketing engineers, field engineers, and AI deployment engineers. This move und
Claude Used to Create Malicious npm Packages: Over 670 Compromised Threaten Open Source
A recent cybersecurity incident reveals how large language models (LLMs) are being weaponized for malicious software development. Security researcher Sibi Moosa spotted an attacker using the alias "mousie-5212-super-formatter" leveraging Anthropic's





Home






