Anthropic Claims AI Isn't Stalling, It's Outsmarting Benchmarks

Home

News

April 17, 2025

ThomasYoung

108

Anthropic Claims AI Isn

Large language models (LLMs) and other generative AI technologies are making significant strides in self-correction, which is paving the way for new applications, including what's known as "agentic AI," according to Michael Gerstenhaber, Vice President of Anthropic, a leading AI model developer.

"It's getting very good at self-correction, self-reasoning," Gerstenhaber, who leads API technologies at Anthropic, shared during an interview in New York with Bloomberg Intelligence's Anurag Rana. Anthropic, creators of the Claude family of LLMs, are direct competitors to OpenAI's GPT models. "Every couple of months, we release a new model that expands the capabilities of LLMs," he added, emphasizing the dynamic nature of the industry where each model revision unlocks new potential uses.

New Capabilities in AI Models

The latest models from Anthropic have introduced capabilities such as task planning, allowing them to perform tasks on a computer much like a human would, like ordering pizza online. "Planning interstitial steps, something that wasn't feasible yesterday, is now within reach," Gerstenhaber noted about this step-by-step task execution.

The discussion, which also featured Vijay Karunamurthy, Chief Technologist at AI startup Scale AI, was part of a daylong conference hosted by Bloomberg Intelligence titled "Gen AI: Can it deliver on the productivity promise?"

Challenging AI Skepticism

Gerstenhaber's insights challenge the views of AI skeptics who argue that generative AI and the broader AI field are "hitting a wall," suggesting diminishing returns with each new model iteration. AI scholar Gary Marcus, for instance, has been vocal about his concerns since 2022, warning that simply increasing the size of AI models (more parameters) won't proportionally improve their performance.

However, Gerstenhaber asserts that Anthropic is pushing the boundaries beyond what current AI benchmarks can measure. "Even if it looks like progress is slowing in some areas, it's because we're unlocking entirely new functionalities, but we've saturated the benchmarks and the ability to perform older tasks," he explained. This makes it increasingly difficult to gauge the full extent of what current generative AI models can achieve.

Scaling and Learning

Both Gerstenhaber and Karunamurthy emphasized the importance of scaling generative AI models to enhance their self-correcting capabilities. "We're definitely seeing more and more scaling of the intelligence," Gerstenhaber remarked. Karunamurthy added, "One reason we believe we're not hitting a wall with planning and reasoning is that we're still learning how to structure these tasks so that the models can adapt to new and varied environments."

Gerstenhaber agreed, stating, "We're in the early stages, learning from application developers about their needs and where the models fall short, which we can then integrate back into the language model."

Real-Time Learning and Adaptation

Much of this progress, according to Gerstenhaber, is driven by the rapid pace of fundamental research at Anthropic, as well as real-time learning from industry feedback. "We're adapting to what the industry tells us they need, learning in real time," he said.

Customers often start with larger models and then scale down to simpler ones to suit specific purposes. "Initially, they assess whether a model is intelligent enough to perform a task well, then whether it's fast enough to meet their application needs, and finally, if it can be as cost-effective as possible," Gerstenhaber explained.

AI-Powered Cover Letters: Expert Guide for Journal Submissions In today's competitive academic publishing environment, crafting an effective cover letter can make the crucial difference in your manuscript's acceptance. Discover how AI-powered tools like ChatGPT can streamline this essential task, helping you cre

US to Sanction Foreign Officials Over Social Media Regulations US Takes Stand Against Global Digital Content Regulations The State Department issued a sharp diplomatic rebuke this week targeting European digital governance policies, signaling escalating tensions over control of online platforms. Secretary Marco

Ultimate Guide to AI-Powered YouTube Video Summarizers In our information-rich digital landscape, AI-powered YouTube video summarizers have become indispensable for efficient content consumption. This in-depth guide explores how to build a sophisticated summarization tool using cutting-edge NLP technolog

Comments (8)

0/200

Submit

JoseRoberts

August 12, 2025 at 11:00:59 AM EDT

This self-correction stuff is wild! 😮 It's like AI is learning to double-check its own homework. Wonder how far this 'agentic AI' will go—could it outsmart us at our own jobs soon?

WalterAnderson

July 31, 2025 at 7:35:39 AM EDT

It's wild to think AI can now self-correct! 😮 Makes me wonder how soon we'll see these 'agentic AI' systems running our lives—hope they don’t outsmart us too much!

RonaldMartinez

July 22, 2025 at 3:39:52 AM EDT

This article really opened my eyes to how fast AI is evolving! Self-correcting LLMs sound like a game-changer for agentic AI. Can’t wait to see what new apps come out of this! 😄

WillieJackson

April 18, 2025 at 3:00:28 AM EDT

La perspectiva de Anthropic sobre que la IA no se estanca sino que supera los benchmarks es bastante genial. Es como si la IA estuviera jugando ajedrez mientras nosotros aún estamos tratando de entender las damas. Lo de la autocorrección suena prometedor, pero aún estoy un poco escéptico. 🤔

GeorgeWilson

April 17, 2025 at 1:45:24 PM EDT

Anthropic의 AI가 정체되지 않고 벤치마크를 뛰어넘는다는 생각이 멋지네요. AI는 체스를 하고 있는데, 우리는 아직 체커를 이해하는 단계예요. 자기 교정 이야기는 유망하지만, 아직 조금 회의적이에요. 🤔

NicholasCarter

April 17, 2025 at 7:27:31 AM EDT

Anthropic's take on AI not stalling but outsmarting benchmarks is pretty cool. It's like AI is playing chess while we're still figuring out checkers. The self-correction stuff sounds promising, but I'm still a bit skeptical. 🤔