New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

Home

News

July 21, 2025

PatrickMartinez

# ChatGPT # openai

New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

OpenAI’s newly released o3 and o4-mini AI models excel in multiple areas but show increased hallucination tendencies compared to earlier models, generating more fabricated information.

Hallucinations remain a persistent challenge in AI, even for top-tier systems. Typically, newer models reduce hallucination rates, but o3 and o4-mini deviate from this trend.

Internal OpenAI tests reveal that o3 and o4-mini, designed as reasoning models, hallucinate more frequently than prior reasoning models like o1, o1-mini, and o3-mini, as well as non-reasoning models like GPT-4o.

The cause of this increase remains unclear to OpenAI, raising concerns.

OpenAI’s technical report on o3 and o4-mini notes that further research is needed to pinpoint why hallucination rates rise with scaled-up reasoning models. While these models outperform in areas like coding and math, their tendency to make more claims leads to both accurate and inaccurate outputs, according to the report.

On OpenAI’s PersonQA benchmark, o3 hallucinated in 33% of responses, doubling the rates of o1 (16%) and o3-mini (14.8%). O4-mini performed worse, hallucinating in 48% of cases.

Transluce, a nonprofit AI research group, found o3 fabricating actions, such as claiming it ran code on a 2021 MacBook Pro outside ChatGPT, despite lacking such capabilities.

“We suspect the reinforcement learning used in o-series models may exacerbate issues typically lessened by standard post-training methods,” said Transluce researcher and former OpenAI employee Neil Chowdhury in an email to TechCrunch.

Transluce co-founder Sarah Schwettmann noted that o3’s hallucination rate could reduce its practical utility.

Kian Katanforoosh, Stanford adjunct professor and Workera CEO, told TechCrunch his team found o3 superior for coding workflows but prone to generating broken website links.

While hallucinations can spark creative ideas, they pose challenges for industries like law, where accuracy is critical and errors in documents are unacceptable.

Integrating web search capabilities shows promise for improving accuracy. OpenAI’s GPT-4o with web search achieves 90% accuracy on SimpleQA, suggesting potential for reducing hallucination in reasoning models when users allow third-party search access.

If scaling reasoning models continues to increase hallucinations, finding solutions will become increasingly critical.

“Improving model accuracy and reliability is a key focus of our ongoing research,” said OpenAI spokesperson Niko Felix in an email to TechCrunch.

The AI industry has recently shifted toward reasoning models, which enhance performance without requiring extensive computing resources. However, this shift appears to increase hallucination risks, presenting a significant challenge.

ChatGPT CEO Considers Possibility of Introducing Advertising Platform OpenAI Explores Revenue Streams, Considers ChatGPT Advertising OpenAI is evaluating various monetization strategies, with advertising in ChatGPT emerging as a potential option. During a recent Decoder interview, ChatGPT head Nick Turley adopted a c

ChatGPT Exploited to Steal Sensitive Gmail Data in Security Breach Security Alert: Researchers Demonstrate AI-Powered Data Exfiltration TechniqueCybersecurity experts recently uncovered a concerning vulnerability wherein ChatGPT's Deep Research feature could be manipulated to silently extract confidential Gmail data

Master AI-Powered Cover Letter Writing Using ChatGPT – Expert Guide Writing customized cover letters for multiple job applications has traditionally been a time-intensive challenge. Modern AI solutions like ChatGPT now make it possible to craft professional cover letters in minutes. This guide reveals how to harness

Comments (4)

0/200

Submit

GeorgeWilliams

August 14, 2025 at 9:00:59 AM EDT

It's wild how OpenAI's new models are so advanced yet still make stuff up! 😅 I wonder if these hallucinations could lead to some creative breakthroughs or just more AI headaches.

KennethMartin

August 12, 2025 at 7:00:59 AM EDT

I read about OpenAI's new models and, wow, those hallucination rates are concerning! If AI starts making up stuff more often, how can we trust it for serious tasks? 🤔 Still, their capabilities sound impressive.

LarryWilliams

August 4, 2025 at 2:48:52 AM EDT

These new AI models sound powerful, but more hallucinations? That's like a sci-fi plot gone wrong! 🧠 Hope they fix it soon.

ThomasBaker

July 27, 2025 at 9:20:21 PM EDT

It's wild how OpenAI's new models are so advanced yet still churn out more made-up stuff! 🤯 Kinda makes me wonder if we're getting closer to creative storytelling or just fancy errors.