Anthropic Claims AI Isn't Stalling, It's Outsmarting Benchmarks

Large language models (LLMs) and other generative AI technologies are making significant strides in self-correction, which is paving the way for new applications, including what's known as "agentic AI," according to Michael Gerstenhaber, Vice President of Anthropic, a leading AI model developer.
"It's getting very good at self-correction, self-reasoning," Gerstenhaber, who leads API technologies at Anthropic, shared during an interview in New York with Bloomberg Intelligence's Anurag Rana. Anthropic, creators of the Claude family of LLMs, are direct competitors to OpenAI's GPT models. "Every couple of months, we release a new model that expands the capabilities of LLMs," he added, emphasizing the dynamic nature of the industry where each model revision unlocks new potential uses.
New Capabilities in AI Models
The latest models from Anthropic have introduced capabilities such as task planning, allowing them to perform tasks on a computer much like a human would, like ordering pizza online. "Planning interstitial steps, something that wasn't feasible yesterday, is now within reach," Gerstenhaber noted about this step-by-step task execution.
The discussion, which also featured Vijay Karunamurthy, Chief Technologist at AI startup Scale AI, was part of a daylong conference hosted by Bloomberg Intelligence titled "Gen AI: Can it deliver on the productivity promise?"
Challenging AI Skepticism
Gerstenhaber's insights challenge the views of AI skeptics who argue that generative AI and the broader AI field are "hitting a wall," suggesting diminishing returns with each new model iteration. AI scholar Gary Marcus, for instance, has been vocal about his concerns since 2022, warning that simply increasing the size of AI models (more parameters) won't proportionally improve their performance.
However, Gerstenhaber asserts that Anthropic is pushing the boundaries beyond what current AI benchmarks can measure. "Even if it looks like progress is slowing in some areas, it's because we're unlocking entirely new functionalities, but we've saturated the benchmarks and the ability to perform older tasks," he explained. This makes it increasingly difficult to gauge the full extent of what current generative AI models can achieve.
Scaling and Learning
Both Gerstenhaber and Karunamurthy emphasized the importance of scaling generative AI models to enhance their self-correcting capabilities. "We're definitely seeing more and more scaling of the intelligence," Gerstenhaber remarked. Karunamurthy added, "One reason we believe we're not hitting a wall with planning and reasoning is that we're still learning how to structure these tasks so that the models can adapt to new and varied environments."
Gerstenhaber agreed, stating, "We're in the early stages, learning from application developers about their needs and where the models fall short, which we can then integrate back into the language model."
Real-Time Learning and Adaptation
Much of this progress, according to Gerstenhaber, is driven by the rapid pace of fundamental research at Anthropic, as well as real-time learning from industry feedback. "We're adapting to what the industry tells us they need, learning in real time," he said.
Customers often start with larger models and then scale down to simpler ones to suit specific purposes. "Initially, they assess whether a model is intelligent enough to perform a task well, then whether it's fast enough to meet their application needs, and finally, if it can be as cost-effective as possible," Gerstenhaber explained.
Related article
Yaoke Media's First AIGC Drama 'The Mystery of the Bronze in Qinling' Launches Today with AI-Signed Leads
Today marks the official launch of Yaoke Media's AIGC fantasy mystery short drama, "The Secret Story of the Qinling Bronze." Starring the company's first two signed AI actors, Qin Lingyue and Lin Xiyanyan, the story unfolds in the enigmatic Qinling m
Satya Nadella ready to exploit new OpenAI deal
On Wednesday, a Wall Street analyst asked Microsoft CEO Satya Nadella directly how the revised OpenAI partnership would affect the company’s financials.Nadella described the new agreement as a win for everyone. “We feel good about our partnership wit
WordPress.com now allows AI agents to write and publish posts, plus more
WordPress.com, the popular web hosting and publishing platform, is now embracing AI agents—a move that could reshape the look and feel of the web. The company announced Friday that it will allow AI agents to draft, edit, and publish content on custom
Related Special Topic Recommendations
Comments (8)
0/500
This self-correction stuff is wild! 😮 It's like AI is learning to double-check its own homework. Wonder how far this 'agentic AI' will go—could it outsmart us at our own jobs soon?
It's wild to think AI can now self-correct! 😮 Makes me wonder how soon we'll see these 'agentic AI' systems running our lives—hope they don’t outsmart us too much!
This article really opened my eyes to how fast AI is evolving! Self-correcting LLMs sound like a game-changer for agentic AI. Can’t wait to see what new apps come out of this! 😄
La perspectiva de Anthropic sobre que la IA no se estanca sino que supera los benchmarks es bastante genial. Es como si la IA estuviera jugando ajedrez mientras nosotros aún estamos tratando de entender las damas. Lo de la autocorrección suena prometedor, pero aún estoy un poco escéptico. 🤔
Anthropic의 AI가 정체되지 않고 벤치마크를 뛰어넘는다는 생각이 멋지네요. AI는 체스를 하고 있는데, 우리는 아직 체커를 이해하는 단계예요. 자기 교정 이야기는 유망하지만, 아직 조금 회의적이에요. 🤔

Large language models (LLMs) and other generative AI technologies are making significant strides in self-correction, which is paving the way for new applications, including what's known as "agentic AI," according to Michael Gerstenhaber, Vice President of Anthropic, a leading AI model developer.
"It's getting very good at self-correction, self-reasoning," Gerstenhaber, who leads API technologies at Anthropic, shared during an interview in New York with Bloomberg Intelligence's Anurag Rana. Anthropic, creators of the Claude family of LLMs, are direct competitors to OpenAI's GPT models. "Every couple of months, we release a new model that expands the capabilities of LLMs," he added, emphasizing the dynamic nature of the industry where each model revision unlocks new potential uses.
New Capabilities in AI Models
The latest models from Anthropic have introduced capabilities such as task planning, allowing them to perform tasks on a computer much like a human would, like ordering pizza online. "Planning interstitial steps, something that wasn't feasible yesterday, is now within reach," Gerstenhaber noted about this step-by-step task execution.
The discussion, which also featured Vijay Karunamurthy, Chief Technologist at AI startup Scale AI, was part of a daylong conference hosted by Bloomberg Intelligence titled "Gen AI: Can it deliver on the productivity promise?"
Challenging AI Skepticism
Gerstenhaber's insights challenge the views of AI skeptics who argue that generative AI and the broader AI field are "hitting a wall," suggesting diminishing returns with each new model iteration. AI scholar Gary Marcus, for instance, has been vocal about his concerns since 2022, warning that simply increasing the size of AI models (more parameters) won't proportionally improve their performance.
However, Gerstenhaber asserts that Anthropic is pushing the boundaries beyond what current AI benchmarks can measure. "Even if it looks like progress is slowing in some areas, it's because we're unlocking entirely new functionalities, but we've saturated the benchmarks and the ability to perform older tasks," he explained. This makes it increasingly difficult to gauge the full extent of what current generative AI models can achieve.
Scaling and Learning
Both Gerstenhaber and Karunamurthy emphasized the importance of scaling generative AI models to enhance their self-correcting capabilities. "We're definitely seeing more and more scaling of the intelligence," Gerstenhaber remarked. Karunamurthy added, "One reason we believe we're not hitting a wall with planning and reasoning is that we're still learning how to structure these tasks so that the models can adapt to new and varied environments."
Gerstenhaber agreed, stating, "We're in the early stages, learning from application developers about their needs and where the models fall short, which we can then integrate back into the language model."
Real-Time Learning and Adaptation
Much of this progress, according to Gerstenhaber, is driven by the rapid pace of fundamental research at Anthropic, as well as real-time learning from industry feedback. "We're adapting to what the industry tells us they need, learning in real time," he said.
Customers often start with larger models and then scale down to simpler ones to suit specific purposes. "Initially, they assess whether a model is intelligent enough to perform a task well, then whether it's fast enough to meet their application needs, and finally, if it can be as cost-effective as possible," Gerstenhaber explained.
Yaoke Media's First AIGC Drama 'The Mystery of the Bronze in Qinling' Launches Today with AI-Signed Leads
Today marks the official launch of Yaoke Media's AIGC fantasy mystery short drama, "The Secret Story of the Qinling Bronze." Starring the company's first two signed AI actors, Qin Lingyue and Lin Xiyanyan, the story unfolds in the enigmatic Qinling m
Satya Nadella ready to exploit new OpenAI deal
On Wednesday, a Wall Street analyst asked Microsoft CEO Satya Nadella directly how the revised OpenAI partnership would affect the company’s financials.Nadella described the new agreement as a win for everyone. “We feel good about our partnership wit
WordPress.com now allows AI agents to write and publish posts, plus more
WordPress.com, the popular web hosting and publishing platform, is now embracing AI agents—a move that could reshape the look and feel of the web. The company announced Friday that it will allow AI agents to draft, edit, and publish content on custom
This self-correction stuff is wild! 😮 It's like AI is learning to double-check its own homework. Wonder how far this 'agentic AI' will go—could it outsmart us at our own jobs soon?
It's wild to think AI can now self-correct! 😮 Makes me wonder how soon we'll see these 'agentic AI' systems running our lives—hope they don’t outsmart us too much!
This article really opened my eyes to how fast AI is evolving! Self-correcting LLMs sound like a game-changer for agentic AI. Can’t wait to see what new apps come out of this! 😄
La perspectiva de Anthropic sobre que la IA no se estanca sino que supera los benchmarks es bastante genial. Es como si la IA estuviera jugando ajedrez mientras nosotros aún estamos tratando de entender las damas. Lo de la autocorrección suena prometedor, pero aún estoy un poco escéptico. 🤔
Anthropic의 AI가 정체되지 않고 벤치마크를 뛰어넘는다는 생각이 멋지네요. AI는 체스를 하고 있는데, 우리는 아직 체커를 이해하는 단계예요. 자기 교정 이야기는 유망하지만, 아직 조금 회의적이에요. 🤔





Home






