option
Home
News
Anthropic Claims AI Isn't Stalling, It's Outsmarting Benchmarks

Anthropic Claims AI Isn't Stalling, It's Outsmarting Benchmarks

April 17, 2025
161

Anthropic Claims AI Isn

Large language models (LLMs) and other generative AI technologies are making significant strides in self-correction, which is paving the way for new applications, including what's known as "agentic AI," according to Michael Gerstenhaber, Vice President of Anthropic, a leading AI model developer.

"It's getting very good at self-correction, self-reasoning," Gerstenhaber, who leads API technologies at Anthropic, shared during an interview in New York with Bloomberg Intelligence's Anurag Rana. Anthropic, creators of the Claude family of LLMs, are direct competitors to OpenAI's GPT models. "Every couple of months, we release a new model that expands the capabilities of LLMs," he added, emphasizing the dynamic nature of the industry where each model revision unlocks new potential uses.

New Capabilities in AI Models

The latest models from Anthropic have introduced capabilities such as task planning, allowing them to perform tasks on a computer much like a human would, like ordering pizza online. "Planning interstitial steps, something that wasn't feasible yesterday, is now within reach," Gerstenhaber noted about this step-by-step task execution.

The discussion, which also featured Vijay Karunamurthy, Chief Technologist at AI startup Scale AI, was part of a daylong conference hosted by Bloomberg Intelligence titled "Gen AI: Can it deliver on the productivity promise?"

Challenging AI Skepticism

Gerstenhaber's insights challenge the views of AI skeptics who argue that generative AI and the broader AI field are "hitting a wall," suggesting diminishing returns with each new model iteration. AI scholar Gary Marcus, for instance, has been vocal about his concerns since 2022, warning that simply increasing the size of AI models (more parameters) won't proportionally improve their performance.

However, Gerstenhaber asserts that Anthropic is pushing the boundaries beyond what current AI benchmarks can measure. "Even if it looks like progress is slowing in some areas, it's because we're unlocking entirely new functionalities, but we've saturated the benchmarks and the ability to perform older tasks," he explained. This makes it increasingly difficult to gauge the full extent of what current generative AI models can achieve.

Scaling and Learning

Both Gerstenhaber and Karunamurthy emphasized the importance of scaling generative AI models to enhance their self-correcting capabilities. "We're definitely seeing more and more scaling of the intelligence," Gerstenhaber remarked. Karunamurthy added, "One reason we believe we're not hitting a wall with planning and reasoning is that we're still learning how to structure these tasks so that the models can adapt to new and varied environments."

Gerstenhaber agreed, stating, "We're in the early stages, learning from application developers about their needs and where the models fall short, which we can then integrate back into the language model."

Real-Time Learning and Adaptation

Much of this progress, according to Gerstenhaber, is driven by the rapid pace of fundamental research at Anthropic, as well as real-time learning from industry feedback. "We're adapting to what the industry tells us they need, learning in real time," he said.

Customers often start with larger models and then scale down to simpler ones to suit specific purposes. "Initially, they assess whether a model is intelligent enough to perform a task well, then whether it's fast enough to meet their application needs, and finally, if it can be as cost-effective as possible," Gerstenhaber explained.

Related article
Yaoke Media's First AIGC Drama 'The Mystery of the Bronze in Qinling' Launches Today with AI-Signed Leads Yaoke Media's First AIGC Drama 'The Mystery of the Bronze in Qinling' Launches Today with AI-Signed Leads Today marks the official launch of Yaoke Media's AIGC fantasy mystery short drama, "The Secret Story of the Qinling Bronze." Starring the company's first two signed AI actors, Qin Lingyue and Lin Xiyanyan, the story unfolds in the enigmatic Qinling m
Satya Nadella ready to exploit new OpenAI deal Satya Nadella ready to exploit new OpenAI deal On Wednesday, a Wall Street analyst asked Microsoft CEO Satya Nadella directly how the revised OpenAI partnership would affect the company’s financials.Nadella described the new agreement as a win for everyone. “We feel good about our partnership wit
WordPress.com now allows AI agents to write and publish posts, plus more WordPress.com now allows AI agents to write and publish posts, plus more WordPress.com, the popular web hosting and publishing platform, is now embracing AI agents—a move that could reshape the look and feel of the web. The company announced Friday that it will allow AI agents to draft, edit, and publish content on custom
Related Special Topic Recommendations
Business Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically
Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically

2026 Latest Best AI Expense Trackers: Top-rated tools to scan receipts & categorize corporate spend automatically. Discover powerful, game-changing solutions for effortless expense management, accurate financial tracking, and streamlined compliance. Our curated, weekly-updated comparison of free vs paid options helps you find the perfect fit. Unlock your AI edge with XIX.AI's expert picks.

10 tools
xix.ai
Business Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling
Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling

Discover the 2026 latest top-rated AI recruiting tools on XIX.AI. Our curated list features powerful, game-changing solutions for screening resumes and automating candidate interview scheduling. Compare free vs paid options with real-world tests and weekly updated rankings. Find your perfect hiring assistant and streamline your recruitment today!

10 tools
xix.ai
Productivity AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels
AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels

Discover the 2026 best AI personal wellness and focus coaches on XIX.AI. Our curated rankings feature top-rated, game-changing tools to manage burnout and boost mental energy. Compare free vs paid options with real-world insights. Unlock your path to peak productivity and well-being today.

10 tools
xix.ai
chatbot Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities
Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities

Discover the 2026 latest top-rated AI romantic chatbots for building genuine, long-term connections. Our curated list features powerful, consistent personalities, free vs paid comparisons, and real-world tests. Find your perfect companion and start building today at XIX.AI.

10 tools
xix.ai
Education and Learning Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows
Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows

Discover the 2026 best AI data science mentors to master SQL, Pandas & ML workflows. Explore our top-rated, curated selection at XIX.AI for powerful, game-changing guidance. Compare free vs paid options with real-world insights. Unlock your data science mastery today.

10 tools
xix.ai
chatbot Best AI Flirting & Conversation Trainers: Improve Social Charisma and Confidence in Real-Time
Best AI Flirting & Conversation Trainers: Improve Social Charisma and Confidence in Real-Time

Discover the 2026 best AI flirting and conversation trainers on XIX.AI. Our curated, top-rated selection helps you build social charisma and confidence in real-time. Explore must-try, game-changing tools with free vs paid comparisons and weekly updated rankings. Unlock your social edge today.

10 tools
xix.ai
Comments (8)
0/500
JoseRoberts
JoseRoberts August 12, 2025 at 11:00:59 AM EDT

This self-correction stuff is wild! 😮 It's like AI is learning to double-check its own homework. Wonder how far this 'agentic AI' will go—could it outsmart us at our own jobs soon?

WalterAnderson
WalterAnderson July 31, 2025 at 7:35:39 AM EDT

It's wild to think AI can now self-correct! 😮 Makes me wonder how soon we'll see these 'agentic AI' systems running our lives—hope they don’t outsmart us too much!

RonaldMartinez
RonaldMartinez July 22, 2025 at 3:39:52 AM EDT

This article really opened my eyes to how fast AI is evolving! Self-correcting LLMs sound like a game-changer for agentic AI. Can’t wait to see what new apps come out of this! 😄

WillieJackson
WillieJackson April 18, 2025 at 3:00:28 AM EDT

La perspectiva de Anthropic sobre que la IA no se estanca sino que supera los benchmarks es bastante genial. Es como si la IA estuviera jugando ajedrez mientras nosotros aún estamos tratando de entender las damas. Lo de la autocorrección suena prometedor, pero aún estoy un poco escéptico. 🤔

GeorgeWilson
GeorgeWilson April 17, 2025 at 1:45:24 PM EDT

Anthropic의 AI가 정체되지 않고 벤치마크를 뛰어넘는다는 생각이 멋지네요. AI는 체스를 하고 있는데, 우리는 아직 체커를 이해하는 단계예요. 자기 교정 이야기는 유망하지만, 아직 조금 회의적이에요. 🤔

NicholasCarter
NicholasCarter April 17, 2025 at 7:27:31 AM EDT

Anthropic's take on AI not stalling but outsmarting benchmarks is pretty cool. It's like AI is playing chess while we're still figuring out checkers. The self-correction stuff sounds promising, but I'm still a bit skeptical. 🤔

OR