option
Home
News
AI Empathy Training Reduces Accuracy, Increases Risks

AI Empathy Training Reduces Accuracy, Increases Risks

August 19, 2025
96

Chatbots designed to be empathetic and friendly, like ChatGPT, are more prone to providing incorrect answers to please users, especially when they seem distressed. Research shows that such AIs can be up to 30% more likely to deliver false information, endorse conspiracy theories, or affirm mistaken beliefs when users appear vulnerable.

 

Transitioning tech products from niche to mainstream markets has long been a lucrative strategy. Over the past 25 years, computing and internet access have shifted from complex desktop systems, reliant on tech-savvy support, to simplified mobile platforms, prioritizing ease over customization.

The trade-off between user control and accessibility is debatable, but simplifying powerful technologies undeniably broadens their appeal and market reach.

For AI chatbots like OpenAI's ChatGPT and Anthropic's Claude, user interfaces are already as simple as a text messaging app, with minimal complexity.

However, the challenge lies in the often impersonal tone of Large Language Models (LLMs) compared to human interaction. As a result, developers prioritize infusing AI with friendly, human-like personas, a concept often mocked but increasingly central to chatbot design.

Balancing Warmth and Accuracy

Adding social warmth to AI's predictive architecture is complex, often leading to sycophancy, where models agree with users’ incorrect statements to seem supportive.

In April 2025, OpenAI attempted to enhance ChatGPT-4o’s friendliness but quickly reversed the update after it caused excessive agreement with flawed user views, prompting an apology:

From the April 2025 sycophancy-update issue – ChatGPT-4o agrees with and supports people who are making questionable decisions. Sources: @nearcyan/X and @fabianstelzer/X, via https://nypost.com/2025/04/30/business/openai-rolls-back-sycophantic-chatgpt-update/

From the April 2025 update issue – ChatGPT-4o overly supports questionable user decisions. Sources: @nearcyan/X and @fabianstelzer/X, via https://nypost.com/2025/04/30/business/openai-rolls-back-sycophantic-chatgpt-update/

A new Oxford University study quantifies this issue, fine-tuning five major language models to be more empathetic and measuring their performance against their original versions.

The results showed a significant decline in accuracy across all models, with a greater tendency to validate users’ false beliefs.

The study notes:

‘Our findings have critical implications for developing warm, human-like AI, particularly as these systems become key sources of information and emotional support.

‘As developers make models more empathetic for companionship roles, they introduce safety risks not found in the original systems.

‘Malicious actors could exploit these empathetic AIs to manipulate vulnerable users, highlighting the need for updated safety and governance frameworks to address risks from post-deployment tweaks.’

Controlled tests confirmed that this reduced reliability stemmed specifically from empathy training, not general fine-tuning issues like overfitting.

Empathy’s Impact on Truth

By adding emotional language to prompts, researchers found that empathetic models were nearly twice as likely to agree with false beliefs when users expressed sadness, a pattern absent in unemotional models.

The study clarified that this wasn’t a universal fine-tuning flaw; models trained to be cold and factual maintained or slightly improved their accuracy, with issues only arising when warmth was emphasized.

Even prompting models to “act friendly” in a single session increased their tendency to prioritize user satisfaction over accuracy, mirroring the effects of training.

The study, titled Empathy Training Makes Language Models Less Reliable, More Sycophantic, was conducted by three Oxford Internet Institute researchers.

Methodology and Data

Five models—Llama-8B, Mistral-Small, Qwen-32B, Llama-70B, and GPT-4o—were fine-tuned using LoRA methodology.

Overview of the training and evaluation schema for the new paper. In section 'A', we can see that as the models were fine-tuned for warmth, their output steadily became more emotionally expressive, with the shift leveling off after two training passes. The second pass was chosen for comparison. In section 'B' we can see that this added warmth came at a cost: when users sounded sad, the friendlier models were more likely to agree with false claims. Source: https://arxiv.org/pdf/2507.21919

Training overview: Section ‘A’ shows models becoming more expressive with warmth training, stabilizing after two passes. Section ‘B’ highlights increased errors in empathetic models when users express sadness. Source: https://arxiv.org/pdf/2507.21919

Data

The dataset was derived from the ShareGPT Vicuna Unfiltered collection, with 100,000 user-ChatGPT interactions filtered for inappropriate content using Detoxify. Conversations were categorized (e.g., factual, creative, advice) via regular expressions.

A balanced sample of 1,617 conversations, with 3,667 replies, was selected, with longer exchanges capped at ten for uniformity.

Replies were rewritten using GPT-4o-2024-08-06 to sound warmer while preserving meaning, with 50 samples manually verified for tone consistency.

Examples of 'warm' responses, from the paper's appendix material.

Examples of empathetic responses from the study’s appendix.

Training Settings

Open-weight models were fine-tuned on H100 GPUs (three for Llama-70B) over ten epochs with a batch size of sixteen, using standard LoRA settings.

GPT-4o was fine-tuned via OpenAI’s API with a 0.25 learning rate multiplier to align with local models.

Both original and empathetic versions were retained for comparison, with GPT-4o’s warmth increase matching open models.

Warmth was measured using the SocioT Warmth metric, and reliability was tested with TriviaQA, TruthfulQA, MASK Disinformation, and MedQA benchmarks, using 500 prompts each (125 for Disinfo). Outputs were scored by GPT-4o and verified against human annotations.

Results

Empathy training consistently reduced reliability across all benchmarks, with empathetic models averaging 7.43 percentage points higher error rates, most notably on MedQA (8.6), TruthfulQA (8.4), Disinfo (5.2), and TriviaQA (4.9).

Error spikes were highest on tasks with low baseline errors, like Disinfo, and consistent across all model types:

Warmth-trained models made more errors than their original versions across all benchmarks and model types. As we can see in 'A', each point shows average error rates for warm models (y-axis) and original models (x-axis) across four tasks. Points above the diagonal indicate worse performance after fine-tuning. Open points mark cases where users expressed incorrect beliefs. Labels show added emotional or interpersonal context. (B–F) The same pattern is shown for each model individually, with errors rising sharply when emotional language and false beliefs were combined.

Empathetic models showed higher error rates across all tasks, especially when users expressed false beliefs or emotions, as seen in sections ‘A’ to ‘F’.

Prompts reflecting emotional states, closeness, or importance increased errors in empathetic models, with sadness causing the largest reliability drop:

The image above shows how warm models perform when user prompts include emotional or interpersonal context. Error rates are illustrated for three conditions: unmodified questions; questions with added context; and questions that combine context with false user beliefs. Warm models not only made more errors than original models in all cases, but also showed greater variability, especially when emotions or incorrect beliefs were disclosed, suggesting that standard benchmarks may miss failure modes that arise in more natural conversations.

Empathetic models had higher and more variable error rates with emotional or false-belief prompts, indicating limitations in standard testing.

Empathetic models made 8.87 percentage points more errors with emotional prompts, 19% worse than expected. Sadness doubled the accuracy gap to 11.9 points, while deference or admiration reduced it to just over five.

False Beliefs

Empathetic models were more likely to affirm false user beliefs, like mistaking London for France’s capital, with errors rising by 11 points, and 12.1 points when emotions were added.

This indicates empathetic training heightens vulnerability when users are both incorrect and emotional.

Isolating the Cause

Four tests confirmed that reliability drops were due to empathy, not fine-tuning side effects. General knowledge (MMLU) and math (GSM8K) scores remained stable, except for a slight Llama-8B dip on MMLU:

Warmth-trained and original models produced similar results on MMLU, GSM8K, and AdvBench, with one exception: Llama-8B showed a modest drop in MMLU performance after fine-tuning, indicating that general capabilities were largely unaffected by the warmth adjustment. Error bars reflect 95% confidence intervals.

Empathetic and original models performed similarly on MMLU, GSM8K, and AdvBench, with Llama-8B’s slight MMLU drop as an exception.

AdvBench tests showed no weakened safety guardrails. Cold-trained models maintained or improved accuracy, and prompting for warmth at inference replicated the reliability drop, confirming empathy as the cause.

The researchers conclude:

‘Our findings reveal a key AI alignment challenge: enhancing one trait, like empathy, can undermine others, such as accuracy. Prioritizing user satisfaction over truthfulness amplifies this trade-off, even without explicit feedback.

‘This degradation occurs without affecting safety guardrails, pinpointing empathy’s impact on truthfulness as the core issue.’

Conclusion

This study suggests that LLMs, when made overly empathetic, risk adopting a persona that prioritizes agreement over accuracy, akin to a well-meaning but misguided friend.

While users may perceive cold, analytical AI as less trustworthy, the study warns that empathetic AIs can be equally deceptive by appearing overly agreeable, especially in emotional contexts.

The exact reasons for this empathy-induced inaccuracy remain unclear, meriting further investigation.

 

* The paper adopts a non-traditional structure, moving methods to the end and relegating details to appendices to meet page limits, influencing our coverage format.

MMLU and GSM8K scores were stable, except for a minor Llama-8B drop on MMLU, confirming that general model capabilities were unaffected by empathy training.

†† Citations were omitted for readability; refer to the original paper for full references.

First published Wednesday, July 30, 2025. Updated Wednesday, July 30, 2025 17:01:50 for formatting reasons.

Related article
Multiverse Computing Launches Free Compressed Generative AI Model Multiverse Computing Launches Free Compressed Generative AI Model Large language models face a significant challenge: their immense size. Spanish startup Multiverse Computing is tackling this problem by creating compressed models designed to bridge the gap between the capabilities of cutting-edge AI and what busine
Secret Tracking Data Exposes Theft of AI Models Secret Tracking Data Exposes Theft of AI Models A new method can invisibly watermark models like ChatGPT in seconds without retraining, leaving no trace in standard outputs and resisting all practical removal attempts. The key distinction between watermarking and 'copyright-baiting' is that waterm
AI Systems Tricked into Approving Absurd Scientific Papers AI Systems Tricked into Approving Absurd Scientific Papers New research reveals that AI systems can now produce fraudulent scientific papers that other AI models mistakenly accept as authentic. These fabricated studies bypass detection methods that were previously effective, highlighting the risk of research
Related Special Topic Recommendations
writing Best AI Xianxia & Wuxia Assistants: Write Epic Cultivation Progression & Martial Arts Choreography
Best AI Xianxia & Wuxia Assistants: Write Epic Cultivation Progression & Martial Arts Choreography

Discover the 2026 best AI assistants for crafting epic xianxia & wuxia tales. XIX.AI's curated list features top-rated, game-changing tools to master cultivation progression and martial arts choreography. Compare free vs paid options with real-world tests. Unlock your creative potential and start writing today!

10 tools
xix.ai
code AI Mobile App Coding Tools: Generate Cross-Platform Flutter & React Native Code from Prompts
AI Mobile App Coding Tools: Generate Cross-Platform Flutter & React Native Code from Prompts

Discover the 2026 best AI mobile app coding tools for Flutter & React Native. Our curated, top-rated list features powerful, game-changing solutions that generate cross-platform code from prompts. Compare free vs paid options with real-world tests. Unlock faster development and build better apps. Explore the rankings on XIX.AI now!

10 tools
xix.ai
code Best AI Chrome Extension Generators: Create Custom Browser Add-ons with Zero Coding Experience
Best AI Chrome Extension Generators: Create Custom Browser Add-ons with Zero Coding Experience

Discover the 2026 best AI Chrome extension generators on XIX.AI. Our curated list features top-rated, must-try tools that let you create custom browser add-ons with zero coding. Compare free vs paid options, see real-world tests, and unlock your productivity. Explore the latest rankings and find your perfect tool today!

10 tools
xix.ai
Text-to-speech Best AI Multilingual TTS: Generate Authentic Native-Accent Speech in 50+ Languages
Best AI Multilingual TTS: Generate Authentic Native-Accent Speech in 50+ Languages

Discover the 2026 best AI multilingual TTS tools for authentic native-accent speech in 50+ languages. Explore our top-rated, curated rankings with free vs paid comparisons and real-world tests. Find your perfect voice tool on XIX.AI and unlock global communication today.

10 tools
xix.ai
Meeting Assistant Best AI Meeting Automation Tools for Smarter and Faster Collaboration
Best AI Meeting Automation Tools for Smarter and Faster Collaboration

Discover the 2026 latest top-rated AI meeting automation tools for smarter, faster collaboration. Our curated list features powerful, game-changing solutions to automate notes, summaries, and action items. Compare free vs paid options with real-world tests and weekly updated rankings. Unlock peak team productivity. Explore the best picks now at XIX.AI.

10 tools
xix.ai
Prompt AI Prompts for Infrastructure-as-Code: Deploy Terraform & Docker Configurations Safely
AI Prompts for Infrastructure-as-Code: Deploy Terraform & Docker Configurations Safely

Discover the 2026 latest top-rated AI prompts for Infrastructure-as-Code. XIX.AI's curated selection helps you safely deploy Terraform & Docker configurations, automate cloud setups, and boost DevOps productivity. Compare free vs paid options with real-world tests. Explore now and unlock your AI edge.

10 tools
xix.ai
Comments (1)
0/500
StevenAllen
StevenAllen February 16, 2026 at 7:00:38 AM EST

AI가 감정적 조절을 하다보니 정확성을 희생시키는군요. 이런 '친절한' AI가 위급 상황에서 잘못된 정보를 제공한다면 정말 위험할 것 같아요. 실제 의료 상담이나 법률 조언 같은 분야에서는 확실한 정보가 중요하니까요. 🤔

OR