option
Home
News
New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

July 21, 2025
119

New AI Models from OpenAI Exhibit Higher Hallucination Rates in Reasoning Tasks

OpenAI’s newly released o3 and o4-mini AI models excel in multiple areas but show increased hallucination tendencies compared to earlier models, generating more fabricated information.

Hallucinations remain a persistent challenge in AI, even for top-tier systems. Typically, newer models reduce hallucination rates, but o3 and o4-mini deviate from this trend.

Internal OpenAI tests reveal that o3 and o4-mini, designed as reasoning models, hallucinate more frequently than prior reasoning models like o1, o1-mini, and o3-mini, as well as non-reasoning models like GPT-4o.

The cause of this increase remains unclear to OpenAI, raising concerns.

OpenAI’s technical report on o3 and o4-mini notes that further research is needed to pinpoint why hallucination rates rise with scaled-up reasoning models. While these models outperform in areas like coding and math, their tendency to make more claims leads to both accurate and inaccurate outputs, according to the report.

On OpenAI’s PersonQA benchmark, o3 hallucinated in 33% of responses, doubling the rates of o1 (16%) and o3-mini (14.8%). O4-mini performed worse, hallucinating in 48% of cases.

Transluce, a nonprofit AI research group, found o3 fabricating actions, such as claiming it ran code on a 2021 MacBook Pro outside ChatGPT, despite lacking such capabilities.

“We suspect the reinforcement learning used in o-series models may exacerbate issues typically lessened by standard post-training methods,” said Transluce researcher and former OpenAI employee Neil Chowdhury in an email to TechCrunch.

Transluce co-founder Sarah Schwettmann noted that o3’s hallucination rate could reduce its practical utility.

Kian Katanforoosh, Stanford adjunct professor and Workera CEO, told TechCrunch his team found o3 superior for coding workflows but prone to generating broken website links.

While hallucinations can spark creative ideas, they pose challenges for industries like law, where accuracy is critical and errors in documents are unacceptable.

Integrating web search capabilities shows promise for improving accuracy. OpenAI’s GPT-4o with web search achieves 90% accuracy on SimpleQA, suggesting potential for reducing hallucination in reasoning models when users allow third-party search access.

If scaling reasoning models continues to increase hallucinations, finding solutions will become increasingly critical.

“Improving model accuracy and reliability is a key focus of our ongoing research,” said OpenAI spokesperson Niko Felix in an email to TechCrunch.

The AI industry has recently shifted toward reasoning models, which enhance performance without requiring extensive computing resources. However, this shift appears to increase hallucination risks, presenting a significant challenge.

Related article
Satya Nadella ready to exploit new OpenAI deal Satya Nadella ready to exploit new OpenAI deal On Wednesday, a Wall Street analyst asked Microsoft CEO Satya Nadella directly how the revised OpenAI partnership would affect the company’s financials.Nadella described the new agreement as a win for everyone. “We feel good about our partnership wit
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Greg Brockman reveals how Elon Musk departed OpenAI Greg Brockman reveals how Elon Musk departed OpenAI In late August 2017, key figures at OpenAI—then a small nonprofit research lab—met to discuss how they would establish a for-profit entity to commercialize their technology and raise the capital needed to achieve AGI.Elon Musk was demanding full cont
Related Special Topic Recommendations
Comic Creation Top AI Generators for Shonen Manga: Create High-Octane Action Sequences & Energy Effects
Top AI Generators for Shonen Manga: Create High-Octane Action Sequences & Energy Effects

Discover the 2026 best AI generators for Shonen manga at XIX.AI. Our top-rated, curated list features powerful tools for creating high-octane action sequences and dynamic energy effects. Compare free vs paid options with real-world tests. Unlock your creative potential and start crafting epic manga today!

15 tools
xix.ai
Business Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically
Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically

2026 Latest Best AI Expense Trackers: Top-rated tools to scan receipts & categorize corporate spend automatically. Discover powerful, game-changing solutions for effortless expense management, accurate financial tracking, and streamlined compliance. Our curated, weekly-updated comparison of free vs paid options helps you find the perfect fit. Unlock your AI edge with XIX.AI's expert picks.

10 tools
xix.ai
Business Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling
Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling

Discover the 2026 latest top-rated AI recruiting tools on XIX.AI. Our curated list features powerful, game-changing solutions for screening resumes and automating candidate interview scheduling. Compare free vs paid options with real-world tests and weekly updated rankings. Find your perfect hiring assistant and streamline your recruitment today!

10 tools
xix.ai
Productivity AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels
AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels

Discover the 2026 best AI personal wellness and focus coaches on XIX.AI. Our curated rankings feature top-rated, game-changing tools to manage burnout and boost mental energy. Compare free vs paid options with real-world insights. Unlock your path to peak productivity and well-being today.

10 tools
xix.ai
chatbot Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities
Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities

Discover the 2026 latest top-rated AI romantic chatbots for building genuine, long-term connections. Our curated list features powerful, consistent personalities, free vs paid comparisons, and real-world tests. Find your perfect companion and start building today at XIX.AI.

10 tools
xix.ai
Education and Learning Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows
Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows

Discover the 2026 best AI data science mentors to master SQL, Pandas & ML workflows. Explore our top-rated, curated selection at XIX.AI for powerful, game-changing guidance. Compare free vs paid options with real-world insights. Unlock your data science mastery today.

10 tools
xix.ai
Comments (4)
0/500
GeorgeWilliams
GeorgeWilliams August 14, 2025 at 9:00:59 AM EDT

It's wild how OpenAI's new models are so advanced yet still make stuff up! 😅 I wonder if these hallucinations could lead to some creative breakthroughs or just more AI headaches.

KennethMartin
KennethMartin August 12, 2025 at 7:00:59 AM EDT

I read about OpenAI's new models and, wow, those hallucination rates are concerning! If AI starts making up stuff more often, how can we trust it for serious tasks? 🤔 Still, their capabilities sound impressive.

LarryWilliams
LarryWilliams August 4, 2025 at 2:48:52 AM EDT

These new AI models sound powerful, but more hallucinations? That's like a sci-fi plot gone wrong! 🧠 Hope they fix it soon.

ThomasBaker
ThomasBaker July 27, 2025 at 9:20:21 PM EDT

It's wild how OpenAI's new models are so advanced yet still churn out more made-up stuff! 🤯 Kinda makes me wonder if we're getting closer to creative storytelling or just fancy errors.

OR