option
Home
News
New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

July 21, 2025
60

New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

OpenAI’s newly released o3 and o4-mini AI models excel in multiple areas but show increased hallucination tendencies compared to earlier models, generating more fabricated information.

Hallucinations remain a persistent challenge in AI, even for top-tier systems. Typically, newer models reduce hallucination rates, but o3 and o4-mini deviate from this trend.

Internal OpenAI tests reveal that o3 and o4-mini, designed as reasoning models, hallucinate more frequently than prior reasoning models like o1, o1-mini, and o3-mini, as well as non-reasoning models like GPT-4o.

The cause of this increase remains unclear to OpenAI, raising concerns.

OpenAI’s technical report on o3 and o4-mini notes that further research is needed to pinpoint why hallucination rates rise with scaled-up reasoning models. While these models outperform in areas like coding and math, their tendency to make more claims leads to both accurate and inaccurate outputs, according to the report.

On OpenAI’s PersonQA benchmark, o3 hallucinated in 33% of responses, doubling the rates of o1 (16%) and o3-mini (14.8%). O4-mini performed worse, hallucinating in 48% of cases.

Transluce, a nonprofit AI research group, found o3 fabricating actions, such as claiming it ran code on a 2021 MacBook Pro outside ChatGPT, despite lacking such capabilities.

“We suspect the reinforcement learning used in o-series models may exacerbate issues typically lessened by standard post-training methods,” said Transluce researcher and former OpenAI employee Neil Chowdhury in an email to TechCrunch.

Transluce co-founder Sarah Schwettmann noted that o3’s hallucination rate could reduce its practical utility.

Kian Katanforoosh, Stanford adjunct professor and Workera CEO, told TechCrunch his team found o3 superior for coding workflows but prone to generating broken website links.

While hallucinations can spark creative ideas, they pose challenges for industries like law, where accuracy is critical and errors in documents are unacceptable.

Integrating web search capabilities shows promise for improving accuracy. OpenAI’s GPT-4o with web search achieves 90% accuracy on SimpleQA, suggesting potential for reducing hallucination in reasoning models when users allow third-party search access.

If scaling reasoning models continues to increase hallucinations, finding solutions will become increasingly critical.

“Improving model accuracy and reliability is a key focus of our ongoing research,” said OpenAI spokesperson Niko Felix in an email to TechCrunch.

The AI industry has recently shifted toward reasoning models, which enhance performance without requiring extensive computing resources. However, this shift appears to increase hallucination risks, presenting a significant challenge.

Related article
US Senate Drops AI Moratorium from Budget Bill Amid Controversy US Senate Drops AI Moratorium from Budget Bill Amid Controversy Senate Overwhelmingly Repeals AI Regulation Moratorium In a rare show of bipartisan unity, U.S. lawmakers voted nearly unanimously Tuesday to eliminate a contentious decade-long prohibition on state-level AI regulation from landmark legislation orig
OpenAI Unveils Two Advanced Open-Weight AI Models OpenAI Unveils Two Advanced Open-Weight AI Models OpenAI revealed on Tuesday the release of two open-weight AI reasoning models, boasting capabilities comparable to its o-series. Both models are available for free download on Hugging Face, with OpenA
ByteDance Unveils Seed-Thinking-v1.5 AI Model to Boost Reasoning Capabilities ByteDance Unveils Seed-Thinking-v1.5 AI Model to Boost Reasoning Capabilities The race for advanced reasoning AI began with OpenAI’s o1 model in September 2024, gaining momentum with DeepSeek’s R1 launch in January 2025.Major AI developers are now competing to create faster, mo
Comments (4)
0/200
GeorgeWilliams
GeorgeWilliams August 14, 2025 at 9:00:59 AM EDT

It's wild how OpenAI's new models are so advanced yet still make stuff up! 😅 I wonder if these hallucinations could lead to some creative breakthroughs or just more AI headaches.

KennethMartin
KennethMartin August 12, 2025 at 7:00:59 AM EDT

I read about OpenAI's new models and, wow, those hallucination rates are concerning! If AI starts making up stuff more often, how can we trust it for serious tasks? 🤔 Still, their capabilities sound impressive.

LarryWilliams
LarryWilliams August 4, 2025 at 2:48:52 AM EDT

These new AI models sound powerful, but more hallucinations? That's like a sci-fi plot gone wrong! 🧠 Hope they fix it soon.

ThomasBaker
ThomasBaker July 27, 2025 at 9:20:21 PM EDT

It's wild how OpenAI's new models are so advanced yet still churn out more made-up stuff! 🤯 Kinda makes me wonder if we're getting closer to creative storytelling or just fancy errors.

Back to Top
OR