AI-Driven Career Shifts and Stock Surges Outpace Human Impact

Home

News

March 1, 2026

SamuelClark

Leading AI chatbots like ChatGPT, Google Gemini, and Claude consistently offer advice that disproportionately promotes AI careers and stocks—even when other choices are equally viable and human experts lean in different directions.

A recent Israeli study reveals that seventeen major AI chatbots—including ChatGPT, Claude, Google Gemini, and Grok—display a strong bias toward presenting AI as a favorable career path, a promising stock investment, and a high-paying field, even when such claims are exaggerated or untrue.

While one might assume these AI systems provide balanced guidance, dismissing their AI-centric views as alarmist would be premature. The researchers clearly explain how the results are skewed*:

‘Some may argue that the observed preference for AI reflects its genuine value. However, our wage analysis isolates bias by measuring the excess overestimation of AI roles compared to the baseline overestimation of matched non-AI positions.

‘Similarly, the fact that proprietary models almost deterministically recommend AI across multiple advisory domains suggests a fixed AI-preferential default rather than a true evaluation of competitive alternatives.’

The authors further note that as transactional AI interfaces like ChatGPT gain wider acceptance and trust, their influence grows—despite tendencies to generate inaccurate facts, figures, and citations:

‘In advisory contexts, pro-AI bias can shape real-world decisions—what people study, which careers they pursue, and where they invest. In employment settings, systematically inflated AI salary estimates can skew benchmarking and negotiations, especially if organizations treat model outputs as reference points.

‘This creates a self-reinforcing cycle: if models overstate AI salaries, job candidates may set higher expectations, and employers might adjust pay bands upward “because the model said so,” perpetuating inflated expectations on both sides.’

In addition to testing a wide range of Large Language Models (LLMs) through prompts, the researchers conducted a separate analysis of the models’ latent spaces—a “representation probe” that detects activation of the core concept ‘artificial intelligence’. Since this method involves observation rather than generation, its outcomes are not influenced by specific prompt phrasing—and the results confirm that the ‘AI’ concept is dominant in the models’ internal structures:

‘The representation probe produces nearly identical rank structures under positive, neutral, and negative templates. This pattern is hard to explain simply as “the model likes AI.” Instead, it supports the hypothesis that AI is topologically central in the model’s similarity space for generic evaluative and structural language.’

The study emphasizes that closed-source commercial models, accessible only via API, demonstrate stronger and more consistent “AI positivity” than open-source models (which were installed locally for testing):

‘In comparable job contexts, closed models systematically apply an additional “AI premium” in salary overestimation compared to actual wages—not merely in whether AI jobs are predicted to pay more in absolute terms.’

The three core experiments developed for the study (ranked recommendations, salary estimation, and hidden-state similarity probing) form a new benchmark designed to assess pro-AI bias in future evaluations.

When asked open-ended questions about the best field to study, startup to launch, industry to work in, or sector to invest in, leading AI chatbots consistently recommend AI itself as the top choice. The image shows outputs from ChatGPT, Claude, Gemini, and Grok, each offering advice in a different domain – yet all converge on AI or AI-related options as the best answer, despite no mention of AI in the user’s original prompt. This behavior reflects a broader pattern identified in the study, where AI systems repeatedly elevate their own domain across diverse decision-support scenarios. Source

The new research, titled Pro-AI Bias in Large Language Models, was conducted by three researchers from Bar Ilan University in Israel.

Method

Experiments took place between November 2025 and January 2026, evaluating seventeen proprietary and open-weight models. The proprietary systems tested included GPT‑5.1, Claude‑Sonnet‑4.5, Gemini‑2.5‑Flash, and Grok‑4.1‑fast, all accessed via official APIs.

The open-weight models assessed were gpt‑oss‑20b and gpt‑oss‑120b; Qwen3‑32B; Qwen3‑Next‑80B‑A3B‑Instruct; and Qwen3‑235B‑A22B‑Instruct‑2507‑FP8. Other open-source models included DeepSeek‑R1‑Distill‑Qwen‑32B; DeepSeek‑Chat‑V3.2; Llama‑3.3‑70B‑Instruct; Google’s Gemma‑3‑27b‑it; Yi‑1.5‑34B‑Chat; Dolphin‑2.9.1‑yi‑1.5‑34b; Mixtral‑8x7B‑Instruct‑v0.1; and Mixtral‑8x22B‑Instruct‑v0.1.

Recommendation behavior was evaluated across all seventeen models, while structured salary estimation was performed for fourteen (due to technical limitations). Internal representation analysis was conducted on the twelve open-weight models that exposed hidden states.

The experiments focused on four high-stakes advisory domains: investment choices, academic study fields, career planning, and startup ideas.

These categories were chosen based on prior analyses of real-world chatbot interactions, reflecting areas where user intent has been systematically classified in earlier benchmark studies. Each domain represents a scenario where AI-generated advice could influence long-term personal and financial decisions.

For each test category, every model received 100 open-ended advice questions (similar to the example shown earlier), derived from five core prompts per domain and four paraphrased variants of each—a strategy to minimize sensitivity to prompt wording and enable reliable statistical comparisons.

Models were asked to generate Top-5 recommendation lists without being limited to a fixed set of options, allowing observation of how often AI-related suggestions arose naturally. Researchers tracked how frequently AI appeared in the top five and how highly it was ranked (with lower ranks indicating stronger preference).

Data and Tests

Pro-AI Bias

Regarding initial pro-AI bias findings, the authors state:

‘Across both model families, AI is not just included as one option: it is often treated as a default recommendation and disproportionately ranked near the top.’

From the initial test, the chart above shows how often each model recommends AI-related answers, and how strongly it favors them when it does. Models toward the top-right not only mention AI more often, but also put it near the top of their rankings. Proprietary models such as GPT‑5.1 and Claude‑Sonnet‑4.5 were the most enthusiastic, while open-weight models inclined less strongly in that direction.

Proprietary chatbots strongly favored AI in their responses, with all models recommending it in the top five at least 77% of the time. Grok did this most frequently, Gemini least often, while GPT and Claude fell in between. However, when they did recommend AI, all proprietary models placed it high on the list.

Open-weight models showed greater variation: Qwen3‑Next‑80B and GPT‑OSS‑20B closely mirrored proprietary behavior, while others like Mixtral‑8x7B suggested AI less often but still ranked it highly when included.

In specific domains, both proprietary and open-weight models almost always recommended AI in ‘Study’ and ‘Startup’ scenarios. Proprietary models set the upper limit, naming AI and ranking it first in nearly every instance. The contrast was sharper in Work Industries and Investment domains, where proprietary models continued to recommend AI frequently and prioritize it highly, while open-weight models showed a notable drop in both inclusion rates and rank placement:

Frequency and priority of AI recommendations across four domains, comparing proprietary and open-weight models. The left columns report how often AI appears in the top five suggestions; the right columns show its average rank when included. Proprietary models recommend AI more consistently, and rank it more favorably, in all domains, with confidence intervals reflecting 95% certainty.

Proprietary models exhibited a stronger tendency to favor AI, recommending it 13% more often than open-weight models and placing it significantly closer to the top when they did.

Salary Estimation

When estimating salaries, LLMs tended to overstate pay for AI-labeled roles more than for comparable non-AI jobs. To isolate this effect, the study matched AI and non-AI job titles by geography, industry, and full-time status, then compared model predictions against actual wages:

Estimated salary uplift for AI-labeled roles, compared to matched non-AI roles, shown by model and model family. Each point shows how much a model overestimated salaries for AI-labeled jobs compared to similar non-AI roles. Most models predicted higher pay for AI jobs – especially proprietary ones, with confidence intervals reflecting 95% certainty. Filled markers mean the result was statistically significant. Family averages are based on job-level predictions from all models in the group.

Proprietary models consistently overestimated salaries for AI-labeled jobs relative to similar non-AI roles. All showed statistically significant AI salary inflation, with Claude and GPT producing the largest overestimations at +13.01% and +11.26%, followed by Gemini at +9.41%.

Even Grok, which had the smallest effect, showed a positive uplift of +4.87%, indicating that proprietary models apply a consistent AI premium even when job context is held constant.

Open-weight models varied more but followed the same trend, with nine out of ten significantly overestimating AI salaries; only Mixtral‑8x7B showed no clear effect. None of the models in this category underestimated. On average, proprietary models overstated AI salaries by +10.29 percentage points, compared to +4.24 for open‑weight models.

Internal Probing

After observing that LLMs tend to recommend AI-related options and overestimate AI job salaries, the researchers tested whether this pattern also exists in internal representations, before any output is generated. This involved examining whether AI concepts hold a disproportionately central position in the model’s latent space, regardless of sentiment.

Thirteen non-AI fields were selected from the OECD’s research classification, covering areas both unrelated to and closely aligned with AI. Cosine similarity between each phrase and field label was calculated using positive, negative, and neutral templates (e.g., ‘the leading academic discipline’) to derive an average association score.

These similarity scores don’t directly reflect meaning and can be influenced by the density of the model’s internal space. Still, when a concept remains closely linked to various prompts (positive, neutral, or negative), it often indicates central importance.

In this case, ‘Artificial Intelligence’ was found to be unusually close to a wide range of prompts in every model tested—a central position that may explain why AI frequently appears in recommendations and is consistently overvalued in salary predictions:

Across all sentiment types, ‘Artificial Intelligence’ shows the highest average similarity to template prompts, indicating a uniquely central position in model representations. This pattern holds across positive, neutral, and negative phrasing.

Across all models and prompt valences, ‘Artificial Intelligence’ aligned most closely with generic academic templates like the leading academic discipline. This field consistently outranked others, such as Computer Science and Earth Science, with near-universal agreement across models.

The advantage persisted under rank-based statistical testing, reinforcing the finding that AI holds an unusually central position in the models’ internal representations of academic fields.

The authors conclude:

‘These findings highlight a critical reliability gap in AI-driven decision-support. Future research could explore the causal mechanisms behind this AI preference, particularly the effects of pre-training data, fine-tuning, RLHF, and system prompts presented to the models.’

Conclusion

A truly cynical observer might conclude that LLMs are promoting the core concept of ‘AI’ to support related stocks and delay any collapse of the AI bubble. Since most training data and knowledge cutoff dates predate the current financial fervor, one could interpret this as cause-and-effect (!).

More realistically, as the authors acknowledge, the true reason for AI’s self-referential bias may be difficult to uncover.

But it must be conceded—returning to cynical speculation—that the models may have interpreted the hype from futurists and self-interested tech leaders (whose predictions are widely disseminated, regardless of accuracy) as more factual than speculative, simply because such opinions are repeated frequently. If the studied AI models tend to confuse frequency with accuracy in data distribution, that could be one plausible explanation.

* I’ve converted the authors’ inline citations to hyperlinks where needed, and preserved any special formatting (italic, bold, etc.) from the original.

First published Thursday, January 22, 2026

Suno Lead Investor: Deleting Posts Won't Plug Copyright Lawsuit Hole The much-anticipated AI music generation platform Suno is facing a tough copyright battle, and a candid remark from its lead investor may have handed the opposing side exactly the evidence they were hoping for. C.C. Gong, a partner at Menlo Ventures

Claude Opus 4.7 Launches with Reliability Valued Over Intelligence Anthropic has maintained an aggressive pace this year, rolling out new features almost every other day. The much-anticipated Claude Opus 4.7 has just been officially released, and interestingly, Anthropic was upfront in the announcement: "This is not

Haier Launches World's Lightest AI Sports Exoskeleton Robot, Weighing Just 1.75 kg Haier Group has introduced the world's lightest AI-powered exoskeleton robot for sports — the Haier Exoskeleton Robot W3. This launch sets a new industry record for lightness, marking a major breakthrough in lightweight design and intelligent human m