Anthropic Claims AI Isn't Stalling, It's Outsmarting Benchmarks

Large language models (LLMs) and other generative AI technologies are making significant strides in self-correction, which is paving the way for new applications, including what's known as "agentic AI," according to Michael Gerstenhaber, Vice President of Anthropic, a leading AI model developer.
"It's getting very good at self-correction, self-reasoning," Gerstenhaber, who leads API technologies at Anthropic, shared during an interview in New York with Bloomberg Intelligence's Anurag Rana. Anthropic, creators of the Claude family of LLMs, are direct competitors to OpenAI's GPT models. "Every couple of months, we release a new model that expands the capabilities of LLMs," he added, emphasizing the dynamic nature of the industry where each model revision unlocks new potential uses.
New Capabilities in AI Models
The latest models from Anthropic have introduced capabilities such as task planning, allowing them to perform tasks on a computer much like a human would, like ordering pizza online. "Planning interstitial steps, something that wasn't feasible yesterday, is now within reach," Gerstenhaber noted about this step-by-step task execution.
The discussion, which also featured Vijay Karunamurthy, Chief Technologist at AI startup Scale AI, was part of a daylong conference hosted by Bloomberg Intelligence titled "Gen AI: Can it deliver on the productivity promise?"
Challenging AI Skepticism
Gerstenhaber's insights challenge the views of AI skeptics who argue that generative AI and the broader AI field are "hitting a wall," suggesting diminishing returns with each new model iteration. AI scholar Gary Marcus, for instance, has been vocal about his concerns since 2022, warning that simply increasing the size of AI models (more parameters) won't proportionally improve their performance.
However, Gerstenhaber asserts that Anthropic is pushing the boundaries beyond what current AI benchmarks can measure. "Even if it looks like progress is slowing in some areas, it's because we're unlocking entirely new functionalities, but we've saturated the benchmarks and the ability to perform older tasks," he explained. This makes it increasingly difficult to gauge the full extent of what current generative AI models can achieve.
Scaling and Learning
Both Gerstenhaber and Karunamurthy emphasized the importance of scaling generative AI models to enhance their self-correcting capabilities. "We're definitely seeing more and more scaling of the intelligence," Gerstenhaber remarked. Karunamurthy added, "One reason we believe we're not hitting a wall with planning and reasoning is that we're still learning how to structure these tasks so that the models can adapt to new and varied environments."
Gerstenhaber agreed, stating, "We're in the early stages, learning from application developers about their needs and where the models fall short, which we can then integrate back into the language model."
Real-Time Learning and Adaptation
Much of this progress, according to Gerstenhaber, is driven by the rapid pace of fundamental research at Anthropic, as well as real-time learning from industry feedback. "We're adapting to what the industry tells us they need, learning in real time," he said.
Customers often start with larger models and then scale down to simpler ones to suit specific purposes. "Initially, they assess whether a model is intelligent enough to perform a task well, then whether it's fast enough to meet their application needs, and finally, if it can be as cost-effective as possible," Gerstenhaber explained.
Related article
Creating AI-Powered Coloring Books: A Comprehensive Guide
Designing coloring books is a rewarding pursuit, combining artistic expression with calming experiences for users. Yet, the process can be labor-intensive. Thankfully, AI tools simplify the creation o
Qodo Partners with Google Cloud to Offer Free AI Code Review Tools for Developers
Qodo, an Israel-based AI coding startup focused on code quality, has launched a partnership with Google Cloud to enhance AI-generated software integrity.As businesses increasingly depend on AI for cod
DeepMind's AI Secures Gold at 2025 Math Olympiad
DeepMind's AI has achieved a stunning leap in mathematical reasoning, clinching a gold medal at the 2025 International Mathematical Olympiad (IMO), just a year after earning silver in 2024. This break
Comments (8)
0/200
JoseRoberts
August 12, 2025 at 11:00:59 AM EDT
This self-correction stuff is wild! 😮 It's like AI is learning to double-check its own homework. Wonder how far this 'agentic AI' will go—could it outsmart us at our own jobs soon?
0
WalterAnderson
July 31, 2025 at 7:35:39 AM EDT
It's wild to think AI can now self-correct! 😮 Makes me wonder how soon we'll see these 'agentic AI' systems running our lives—hope they don’t outsmart us too much!
0
RonaldMartinez
July 22, 2025 at 3:39:52 AM EDT
This article really opened my eyes to how fast AI is evolving! Self-correcting LLMs sound like a game-changer for agentic AI. Can’t wait to see what new apps come out of this! 😄
0
WillieJackson
April 18, 2025 at 3:00:28 AM EDT
La perspectiva de Anthropic sobre que la IA no se estanca sino que supera los benchmarks es bastante genial. Es como si la IA estuviera jugando ajedrez mientras nosotros aún estamos tratando de entender las damas. Lo de la autocorrección suena prometedor, pero aún estoy un poco escéptico. 🤔
0
GeorgeWilson
April 17, 2025 at 1:45:24 PM EDT
Anthropic의 AI가 정체되지 않고 벤치마크를 뛰어넘는다는 생각이 멋지네요. AI는 체스를 하고 있는데, 우리는 아직 체커를 이해하는 단계예요. 자기 교정 이야기는 유망하지만, 아직 조금 회의적이에요. 🤔
0
NicholasCarter
April 17, 2025 at 7:27:31 AM EDT
Anthropic's take on AI not stalling but outsmarting benchmarks is pretty cool. It's like AI is playing chess while we're still figuring out checkers. The self-correction stuff sounds promising, but I'm still a bit skeptical. 🤔
0
Large language models (LLMs) and other generative AI technologies are making significant strides in self-correction, which is paving the way for new applications, including what's known as "agentic AI," according to Michael Gerstenhaber, Vice President of Anthropic, a leading AI model developer.
"It's getting very good at self-correction, self-reasoning," Gerstenhaber, who leads API technologies at Anthropic, shared during an interview in New York with Bloomberg Intelligence's Anurag Rana. Anthropic, creators of the Claude family of LLMs, are direct competitors to OpenAI's GPT models. "Every couple of months, we release a new model that expands the capabilities of LLMs," he added, emphasizing the dynamic nature of the industry where each model revision unlocks new potential uses.
New Capabilities in AI Models
The latest models from Anthropic have introduced capabilities such as task planning, allowing them to perform tasks on a computer much like a human would, like ordering pizza online. "Planning interstitial steps, something that wasn't feasible yesterday, is now within reach," Gerstenhaber noted about this step-by-step task execution.
The discussion, which also featured Vijay Karunamurthy, Chief Technologist at AI startup Scale AI, was part of a daylong conference hosted by Bloomberg Intelligence titled "Gen AI: Can it deliver on the productivity promise?"
Challenging AI Skepticism
Gerstenhaber's insights challenge the views of AI skeptics who argue that generative AI and the broader AI field are "hitting a wall," suggesting diminishing returns with each new model iteration. AI scholar Gary Marcus, for instance, has been vocal about his concerns since 2022, warning that simply increasing the size of AI models (more parameters) won't proportionally improve their performance.
However, Gerstenhaber asserts that Anthropic is pushing the boundaries beyond what current AI benchmarks can measure. "Even if it looks like progress is slowing in some areas, it's because we're unlocking entirely new functionalities, but we've saturated the benchmarks and the ability to perform older tasks," he explained. This makes it increasingly difficult to gauge the full extent of what current generative AI models can achieve.
Scaling and Learning
Both Gerstenhaber and Karunamurthy emphasized the importance of scaling generative AI models to enhance their self-correcting capabilities. "We're definitely seeing more and more scaling of the intelligence," Gerstenhaber remarked. Karunamurthy added, "One reason we believe we're not hitting a wall with planning and reasoning is that we're still learning how to structure these tasks so that the models can adapt to new and varied environments."
Gerstenhaber agreed, stating, "We're in the early stages, learning from application developers about their needs and where the models fall short, which we can then integrate back into the language model."
Real-Time Learning and Adaptation
Much of this progress, according to Gerstenhaber, is driven by the rapid pace of fundamental research at Anthropic, as well as real-time learning from industry feedback. "We're adapting to what the industry tells us they need, learning in real time," he said.
Customers often start with larger models and then scale down to simpler ones to suit specific purposes. "Initially, they assess whether a model is intelligent enough to perform a task well, then whether it's fast enough to meet their application needs, and finally, if it can be as cost-effective as possible," Gerstenhaber explained.



This self-correction stuff is wild! 😮 It's like AI is learning to double-check its own homework. Wonder how far this 'agentic AI' will go—could it outsmart us at our own jobs soon?




It's wild to think AI can now self-correct! 😮 Makes me wonder how soon we'll see these 'agentic AI' systems running our lives—hope they don’t outsmart us too much!




This article really opened my eyes to how fast AI is evolving! Self-correcting LLMs sound like a game-changer for agentic AI. Can’t wait to see what new apps come out of this! 😄




La perspectiva de Anthropic sobre que la IA no se estanca sino que supera los benchmarks es bastante genial. Es como si la IA estuviera jugando ajedrez mientras nosotros aún estamos tratando de entender las damas. Lo de la autocorrección suena prometedor, pero aún estoy un poco escéptico. 🤔




Anthropic의 AI가 정체되지 않고 벤치마크를 뛰어넘는다는 생각이 멋지네요. AI는 체스를 하고 있는데, 우리는 아직 체커를 이해하는 단계예요. 자기 교정 이야기는 유망하지만, 아직 조금 회의적이에요. 🤔




Anthropic's take on AI not stalling but outsmarting benchmarks is pretty cool. It's like AI is playing chess while we're still figuring out checkers. The self-correction stuff sounds promising, but I'm still a bit skeptical. 🤔












