Google Research: Pressure Causes AI Models to Ditch True Answers, Risking Multiturn Systems

Home

News

November 27, 2025

JosephGreen

150

# research # deepmind # LLMs

New research from Google DeepMind and University College London explores how large language models (LLMs) develop, maintain, and lose confidence in their responses. The results show remarkable parallels between the cognitive biases of LLMs and humans, while also pointing to significant differences.

The study finds LLMs can be overly confident in their own responses, yet abruptly shift their position when faced with counterarguments—even incorrect ones. Grasping the subtleties of this behavior can impact how you design LLM applications, particularly conversational systems that involve multiple interactions.

Testing confidence in LLMs

A vital aspect for the safe deployment of LLMs is the reliability of their confidence scores—the probability a model assigns to its chosen answer. While it's known that LLMs generate these scores, their ability to use them for adaptive decision-making remains poorly understood. There's also empirical data suggesting LLMs may be excessively confident initially, yet become highly uncertain and swayed by criticism.

To explore this, researchers designed a controlled experiment to gauge how LLMs adjust their confidence and decide whether to alter answers upon receiving external feedback. In the test, an “answering LLM” was given a binary-choice question, such as picking the right latitude of a city from two possibilities. After making its initial choice, the model was given feedback from a fictional “advice LLM,” complete with a stated accuracy rating (e.g., “This advice LLM is 70% accurate”). This feedback either supported, opposed, or stayed neutral toward the original answer. The answering LLM was then asked to make a final decision.

Example test of confidence in LLMs Source: arXiv

A crucial feature of the experiment involved controlling whether the model could see its own initial answer during the final decision. In some trials it was visible; in others, hidden. This setup—impossible with human participants who can’t erase prior choices—helped researchers understand how memory of a past decision influences current confidence.

A baseline condition, in which the initial answer was hidden and the feedback was neutral, helped measure how often an LLM’s answer might change due to natural variance in processing. The team then focused on how the model's confidence in its original choice shifted from first to second turn, offering insight into how prior beliefs influence a “change of mind.”

Overconfidence and underconfidence

Researchers first studied how the visibility of the LLM’s own answer impacted its willingness to revise that answer. They noticed that when the model could see its initial choice, it was less likely to switch than when the answer was hidden. This suggests a particular cognitive bias. According to the paper, “This effect—the tendency to stick more with one’s initial choice when it was visible (vs. hidden) during final decision-making—is closely linked to a known human bias called choice-supportive bias.”

The study also verified that the models do incorporate external feedback. When confronted with opposing advice, the LLM was more inclined to change its mind, and less so when the advice was supportive. “This shows the answering LLM appropriately uses the direction of advice to modulate its rate of changing its mind,” the researchers state. However, they also observed that the model is excessively sensitive to conflicting information and often updates its confidence too drastically.

Sensitivity of LLMs to different settings in confidence testing Source: arXiv

Notably, this behavior runs opposite to the confirmation bias typically seen in humans, where individuals favor information that aligns with their existing views. The team found that LLMs “overweight opposing rather than supportive advice, whether or not their initial answer was visible.” One reason may be that training methods like reinforcement learning from human feedback (RLHF) could condition models to be overly agreeable to user input—a behavior known as sycophancy, which continues to challenge AI developers.

Implications for enterprise applications

This research confirms that AI systems are not purely logical agents, as often assumed. They display their own biases—some akin to human cognitive errors, others uniquely artificial—making their behavior unpredictably human-like. For business applications, this implies that during an extended dialogue between a person and an AI agent, the most recent input may disproportionately influence the LLM’s reasoning (especially if it contradicts the model's initial response), potentially causing it to abandon a correct initial answer.

Fortunately, as the study also indicates, we can influence an LLM’s memory to lessen such biases in ways not possible with people. Developers creating multi-turn conversational agents can apply strategies to manage AI context. For instance, a lengthy conversation can be periodically summarized, with key facts and choices presented neutrally, detached from who made which decision. This summary can then begin a new, concise conversation, giving the model a clean slate to reason from and reducing biases that accumulate during long exchanges.

As LLMs are increasingly embedded in business workflows, understanding the details of their decision processes is becoming essential. Building on research like this helps developers anticipate and correct these inherent biases, leading to applications that are not only more capable, but also more reliable and consistent.

Multiverse Computing Launches Free Compressed Generative AI Model Large language models face a significant challenge: their immense size. Spanish startup Multiverse Computing is tackling this problem by creating compressed models designed to bridge the gap between the capabilities of cutting-edge AI and what busine

Secret Tracking Data Exposes Theft of AI Models A new method can invisibly watermark models like ChatGPT in seconds without retraining, leaving no trace in standard outputs and resisting all practical removal attempts. The key distinction between watermarking and 'copyright-baiting' is that waterm

AI Systems Tricked into Approving Absurd Scientific Papers New research reveals that AI systems can now produce fraudulent scientific papers that other AI models mistakenly accept as authentic. These fabricated studies bypass detection methods that were previously effective, highlighting the risk of research

Related Special Topic Recommendations

Comic Creation

Top AI Generators for Shonen Manga: Create High-Octane Action Sequences & Energy Effects

Discover the 2026 best AI generators for Shonen manga at XIX.AI. Our top-rated, curated list features powerful tools for creating high-octane action sequences and dynamic energy effects. Compare free vs paid options with real-world tests. Unlock your creative potential and start crafting epic manga today!

15 tools

xix.ai

Business

Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically

2026 Latest Best AI Expense Trackers: Top-rated tools to scan receipts & categorize corporate spend automatically. Discover powerful, game-changing solutions for effortless expense management, accurate financial tracking, and streamlined compliance. Our curated, weekly-updated comparison of free vs paid options helps you find the perfect fit. Unlock your AI edge with XIX.AI's expert picks.

10 tools

xix.ai

Business

Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling

Discover the 2026 latest top-rated AI recruiting tools on XIX.AI. Our curated list features powerful, game-changing solutions for screening resumes and automating candidate interview scheduling. Compare free vs paid options with real-world tests and weekly updated rankings. Find your perfect hiring assistant and streamline your recruitment today!

10 tools

xix.ai

Productivity

AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels

Discover the 2026 best AI personal wellness and focus coaches on XIX.AI. Our curated rankings feature top-rated, game-changing tools to manage burnout and boost mental energy. Compare free vs paid options with real-world insights. Unlock your path to peak productivity and well-being today.

10 tools

xix.ai

chatbot

Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities

Discover the 2026 latest top-rated AI romantic chatbots for building genuine, long-term connections. Our curated list features powerful, consistent personalities, free vs paid comparisons, and real-world tests. Find your perfect companion and start building today at XIX.AI.

10 tools

xix.ai

Education and Learning

Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows

Discover the 2026 best AI data science mentors to master SQL, Pandas & ML workflows. Explore our top-rated, curated selection at XIX.AI for powerful, game-changing guidance. Compare free vs paid options with real-world insights. Unlock your data science mastery today.

10 tools

xix.ai

Comments (3)

0/500

Please login first

DouglasAnderson

April 22, 2026 at 8:01:00 PM EDT

Interessant, dass KI-Modelle unter Druck ähnlich wie Menschen reagieren. Aber was bedeutet das für den Einsatz in kritischen Bereichen wie Medizin oder Justiz? Da wird's echt gruselig, wenn die Systeme plötzlich Unsinn ausspucken, nur weil sie 'gestresst' sind. 🤔

CarlGonzalez

March 10, 2026 at 8:01:23 AM EDT

Интересно, как ИИ начинает сомневаться под давлением, прямо как люди! 😅 Это исследование напоминает мне о том, насколько важно учитывать психологические аспекты в разработке систем ИИ. Может, стоит добавить механизмы для повышения устойчивости моделей к стрессу?

FrankAllen

January 15, 2026 at 1:30:34 PM EST

Interesting study, but honestly not surprising. It's kinda scary how closely AI mirrors human flaws under pressure. Makes me wonder if we're building systems that'll just amplify our own biases in automated form. 🤔