option
Home
News
New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

July 21, 2025
0

New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

OpenAI’s newly released o3 and o4-mini AI models excel in multiple areas but show increased hallucination tendencies compared to earlier models, generating more fabricated information.

Hallucinations remain a persistent challenge in AI, even for top-tier systems. Typically, newer models reduce hallucination rates, but o3 and o4-mini deviate from this trend.

Internal OpenAI tests reveal that o3 and o4-mini, designed as reasoning models, hallucinate more frequently than prior reasoning models like o1, o1-mini, and o3-mini, as well as non-reasoning models like GPT-4o.

The cause of this increase remains unclear to OpenAI, raising concerns.

OpenAI’s technical report on o3 and o4-mini notes that further research is needed to pinpoint why hallucination rates rise with scaled-up reasoning models. While these models outperform in areas like coding and math, their tendency to make more claims leads to both accurate and inaccurate outputs, according to the report.

On OpenAI’s PersonQA benchmark, o3 hallucinated in 33% of responses, doubling the rates of o1 (16%) and o3-mini (14.8%). O4-mini performed worse, hallucinating in 48% of cases.

Transluce, a nonprofit AI research group, found o3 fabricating actions, such as claiming it ran code on a 2021 MacBook Pro outside ChatGPT, despite lacking such capabilities.

“We suspect the reinforcement learning used in o-series models may exacerbate issues typically lessened by standard post-training methods,” said Transluce researcher and former OpenAI employee Neil Chowdhury in an email to TechCrunch.

Transluce co-founder Sarah Schwettmann noted that o3’s hallucination rate could reduce its practical utility.

Kian Katanforoosh, Stanford adjunct professor and Workera CEO, told TechCrunch his team found o3 superior for coding workflows but prone to generating broken website links.

While hallucinations can spark creative ideas, they pose challenges for industries like law, where accuracy is critical and errors in documents are unacceptable.

Integrating web search capabilities shows promise for improving accuracy. OpenAI’s GPT-4o with web search achieves 90% accuracy on SimpleQA, suggesting potential for reducing hallucination in reasoning models when users allow third-party search access.

If scaling reasoning models continues to increase hallucinations, finding solutions will become increasingly critical.

“Improving model accuracy and reliability is a key focus of our ongoing research,” said OpenAI spokesperson Niko Felix in an email to TechCrunch.

The AI industry has recently shifted toward reasoning models, which enhance performance without requiring extensive computing resources. However, this shift appears to increase hallucination risks, presenting a significant challenge.

Related article
AI Researcher's Green Card Denial Sparks Concerns Over U.S. Talent Retention AI Researcher's Green Card Denial Sparks Concerns Over U.S. Talent Retention Kai Chen, a Canadian AI expert at OpenAI with 12 years in the U.S., was denied a green card, according to Noam Brown, a prominent research scientist at the company. In a post on X, Brown revealed Chen
Washington Post Partners with OpenAI to Enhance News Access via ChatGPT Washington Post Partners with OpenAI to Enhance News Access via ChatGPT The Washington Post and OpenAI have unveiled a “strategic partnership” to “expand access to trusted news through ChatGPT,” according to a Washington Post press release.OpenAI has forged alliances with
OpenAI Reaffirms Nonprofit Roots in Major Corporate Overhaul OpenAI Reaffirms Nonprofit Roots in Major Corporate Overhaul OpenAI remains steadfast in its nonprofit mission as it undergoes a significant corporate restructuring, balancing growth with its commitment to ethical AI development.CEO Sam Altman outlined the comp
Comments (0)
0/200
Back to Top
OR