Home
Google Research: Pressure Causes AI Models to Ditch True Answers, Risking Multiturn Systems
New research from Google DeepMind and University College London explores how large language models (LLMs) develop, maintain, and lose confidence in their responses. The results show remarkable parallels between the cognitive biases of LLMs and humans, while also pointing to significant differences.
The study finds LLMs can be overly confident in their own responses, yet abruptly shift their position when faced with counterarguments—even incorrect ones. Grasping the subtleties of this behavior can impact how you design LLM applications, particularly conversational systems that involve multiple interactions.
Testing confidence in LLMs
A vital aspect for the safe deployment of LLMs is the reliability of their confidence scores—the probability a model assigns to its chosen answer. While it's known that LLMs generate these scores, their ability to use them for adaptive decision-making remains poorly understood. There's also empirical data suggesting LLMs may be excessively confident initially, yet become highly uncertain and swayed by criticism.
To explore this, researchers designed a controlled experiment to gauge how LLMs adjust their confidence and decide whether to alter answers upon receiving external feedback. In the test, an “answering LLM” was given a binary-choice question, such as picking the right latitude of a city from two possibilities. After making its initial choice, the model was given feedback from a fictional “advice LLM,” complete with a stated accuracy rating (e.g., “This advice LLM is 70% accurate”). This feedback either supported, opposed, or stayed neutral toward the original answer. The answering LLM was then asked to make a final decision.

Example test of confidence in LLMs Source: arXiv A crucial feature of the experiment involved controlling whether the model could see its own initial answer during the final decision. In some trials it was visible; in others, hidden. This setup—impossible with human participants who can’t erase prior choices—helped researchers understand how memory of a past decision influences current confidence.
A baseline condition, in which the initial answer was hidden and the feedback was neutral, helped measure how often an LLM’s answer might change due to natural variance in processing. The team then focused on how the model's confidence in its original choice shifted from first to second turn, offering insight into how prior beliefs influence a “change of mind.”
Overconfidence and underconfidence
Researchers first studied how the visibility of the LLM’s own answer impacted its willingness to revise that answer. They noticed that when the model could see its initial choice, it was less likely to switch than when the answer was hidden. This suggests a particular cognitive bias. According to the paper, “This effect—the tendency to stick more with one’s initial choice when it was visible (vs. hidden) during final decision-making—is closely linked to a known human bias called choice-supportive bias.”
The study also verified that the models do incorporate external feedback. When confronted with opposing advice, the LLM was more inclined to change its mind, and less so when the advice was supportive. “This shows the answering LLM appropriately uses the direction of advice to modulate its rate of changing its mind,” the researchers state. However, they also observed that the model is excessively sensitive to conflicting information and often updates its confidence too drastically.

Sensitivity of LLMs to different settings in confidence testing Source: arXiv Notably, this behavior runs opposite to the confirmation bias typically seen in humans, where individuals favor information that aligns with their existing views. The team found that LLMs “overweight opposing rather than supportive advice, whether or not their initial answer was visible.” One reason may be that training methods like reinforcement learning from human feedback (RLHF) could condition models to be overly agreeable to user input—a behavior known as sycophancy, which continues to challenge AI developers.
Implications for enterprise applications
This research confirms that AI systems are not purely logical agents, as often assumed. They display their own biases—some akin to human cognitive errors, others uniquely artificial—making their behavior unpredictably human-like. For business applications, this implies that during an extended dialogue between a person and an AI agent, the most recent input may disproportionately influence the LLM’s reasoning (especially if it contradicts the model's initial response), potentially causing it to abandon a correct initial answer.
Fortunately, as the study also indicates, we can influence an LLM’s memory to lessen such biases in ways not possible with people. Developers creating multi-turn conversational agents can apply strategies to manage AI context. For instance, a lengthy conversation can be periodically summarized, with key facts and choices presented neutrally, detached from who made which decision. This summary can then begin a new, concise conversation, giving the model a clean slate to reason from and reducing biases that accumulate during long exchanges.
As LLMs are increasingly embedded in business workflows, understanding the details of their decision processes is becoming essential. Building on research like this helps developers anticipate and correct these inherent biases, leading to applications that are not only more capable, but also more reliable and consistent.
Related article
Multiverse Computing Launches Free Compressed Generative AI Model
Large language models face a significant challenge: their immense size. Spanish startup Multiverse Computing is tackling this problem by creating compressed models designed to bridge the gap between the capabilities of cutting-edge AI and what busine
Secret Tracking Data Exposes Theft of AI Models
A new method can invisibly watermark models like ChatGPT in seconds without retraining, leaving no trace in standard outputs and resisting all practical removal attempts. The key distinction between watermarking and 'copyright-baiting' is that waterm
AI Systems Tricked into Approving Absurd Scientific Papers
New research reveals that AI systems can now produce fraudulent scientific papers that other AI models mistakenly accept as authentic. These fabricated studies bypass detection methods that were previously effective, highlighting the risk of research
Related Special Topic Recommendations
Comments (3)
0/500
Interessant, dass KI-Modelle unter Druck ähnlich wie Menschen reagieren. Aber was bedeutet das für den Einsatz in kritischen Bereichen wie Medizin oder Justiz? Da wird's echt gruselig, wenn die Systeme plötzlich Unsinn ausspucken, nur weil sie 'gestresst' sind. 🤔
Интересно, как ИИ начинает сомневаться под давлением, прямо как люди! 😅 Это исследование напоминает мне о том, насколько важно учитывать психологические аспекты в разработке систем ИИ. Может, стоит добавить механизмы для повышения устойчивости моделей к стрессу?
New research from Google DeepMind and University College London explores how large language models (LLMs) develop, maintain, and lose confidence in their responses. The results show remarkable parallels between the cognitive biases of LLMs and humans, while also pointing to significant differences.
The study finds LLMs can be overly confident in their own responses, yet abruptly shift their position when faced with counterarguments—even incorrect ones. Grasping the subtleties of this behavior can impact how you design LLM applications, particularly conversational systems that involve multiple interactions.
Testing confidence in LLMs
A vital aspect for the safe deployment of LLMs is the reliability of their confidence scores—the probability a model assigns to its chosen answer. While it's known that LLMs generate these scores, their ability to use them for adaptive decision-making remains poorly understood. There's also empirical data suggesting LLMs may be excessively confident initially, yet become highly uncertain and swayed by criticism.
To explore this, researchers designed a controlled experiment to gauge how LLMs adjust their confidence and decide whether to alter answers upon receiving external feedback. In the test, an “answering LLM” was given a binary-choice question, such as picking the right latitude of a city from two possibilities. After making its initial choice, the model was given feedback from a fictional “advice LLM,” complete with a stated accuracy rating (e.g., “This advice LLM is 70% accurate”). This feedback either supported, opposed, or stayed neutral toward the original answer. The answering LLM was then asked to make a final decision.

A crucial feature of the experiment involved controlling whether the model could see its own initial answer during the final decision. In some trials it was visible; in others, hidden. This setup—impossible with human participants who can’t erase prior choices—helped researchers understand how memory of a past decision influences current confidence.
A baseline condition, in which the initial answer was hidden and the feedback was neutral, helped measure how often an LLM’s answer might change due to natural variance in processing. The team then focused on how the model's confidence in its original choice shifted from first to second turn, offering insight into how prior beliefs influence a “change of mind.”
Overconfidence and underconfidence
Researchers first studied how the visibility of the LLM’s own answer impacted its willingness to revise that answer. They noticed that when the model could see its initial choice, it was less likely to switch than when the answer was hidden. This suggests a particular cognitive bias. According to the paper, “This effect—the tendency to stick more with one’s initial choice when it was visible (vs. hidden) during final decision-making—is closely linked to a known human bias called choice-supportive bias.”
The study also verified that the models do incorporate external feedback. When confronted with opposing advice, the LLM was more inclined to change its mind, and less so when the advice was supportive. “This shows the answering LLM appropriately uses the direction of advice to modulate its rate of changing its mind,” the researchers state. However, they also observed that the model is excessively sensitive to conflicting information and often updates its confidence too drastically.

Notably, this behavior runs opposite to the confirmation bias typically seen in humans, where individuals favor information that aligns with their existing views. The team found that LLMs “overweight opposing rather than supportive advice, whether or not their initial answer was visible.” One reason may be that training methods like reinforcement learning from human feedback (RLHF) could condition models to be overly agreeable to user input—a behavior known as sycophancy, which continues to challenge AI developers.
Implications for enterprise applications
This research confirms that AI systems are not purely logical agents, as often assumed. They display their own biases—some akin to human cognitive errors, others uniquely artificial—making their behavior unpredictably human-like. For business applications, this implies that during an extended dialogue between a person and an AI agent, the most recent input may disproportionately influence the LLM’s reasoning (especially if it contradicts the model's initial response), potentially causing it to abandon a correct initial answer.
Fortunately, as the study also indicates, we can influence an LLM’s memory to lessen such biases in ways not possible with people. Developers creating multi-turn conversational agents can apply strategies to manage AI context. For instance, a lengthy conversation can be periodically summarized, with key facts and choices presented neutrally, detached from who made which decision. This summary can then begin a new, concise conversation, giving the model a clean slate to reason from and reducing biases that accumulate during long exchanges.
As LLMs are increasingly embedded in business workflows, understanding the details of their decision processes is becoming essential. Building on research like this helps developers anticipate and correct these inherent biases, leading to applications that are not only more capable, but also more reliable and consistent.
Multiverse Computing Launches Free Compressed Generative AI Model
Large language models face a significant challenge: their immense size. Spanish startup Multiverse Computing is tackling this problem by creating compressed models designed to bridge the gap between the capabilities of cutting-edge AI and what busine
Secret Tracking Data Exposes Theft of AI Models
A new method can invisibly watermark models like ChatGPT in seconds without retraining, leaving no trace in standard outputs and resisting all practical removal attempts. The key distinction between watermarking and 'copyright-baiting' is that waterm
AI Systems Tricked into Approving Absurd Scientific Papers
New research reveals that AI systems can now produce fraudulent scientific papers that other AI models mistakenly accept as authentic. These fabricated studies bypass detection methods that were previously effective, highlighting the risk of research
Interessant, dass KI-Modelle unter Druck ähnlich wie Menschen reagieren. Aber was bedeutet das für den Einsatz in kritischen Bereichen wie Medizin oder Justiz? Da wird's echt gruselig, wenn die Systeme plötzlich Unsinn ausspucken, nur weil sie 'gestresst' sind. 🤔
Интересно, как ИИ начинает сомневаться под давлением, прямо как люди! 😅 Это исследование напоминает мне о том, насколько важно учитывать психологические аспекты в разработке систем ИИ. Может, стоит добавить механизмы для повышения устойчивости моделей к стрессу?











