Claude Opus 4.7 Launches with Reliability Valued Over Intelligence
Anthropic has maintained an aggressive pace this year, rolling out new features almost every other day. The much-anticipated Claude Opus 4.7 has just been officially released, and interestingly, Anthropic was upfront in the announcement: "This is not our most powerful model." The rumored, stronger Claude Mythos Preview remains on standby. Still, Opus 4.7 has generated considerable attention because it tackles the issue of being "more reliable" rather than "smarter."

Benchmark results are notably impressive. On the rigorous coding benchmark SWE-bench Pro, 4.7 jumped from 53.4% in the previous version to 64.3%, a gain of nearly 11 percentage points, surpassing GPT-5.4 (57.7%) and Gemini 3.1 Pro (54.2%). On the visual reasoning benchmark CharXiv, it rose from 69.1% to 82.1%, driven by the newly added 2576-pixel long-side recognition capability, offering more than three times the clarity of its predecessor. On the tool call evaluation MCP-Atlas, it scored 77.3%, and on the legal AI platform Harvey's BigLaw benchmark, it reached 90.9%. However, on the agentic search evaluation BrowseComp, 4.7 saw a slight decline from 83.7% to 79.3%, overtaken by GPT-5.4 and Gemini—this is attributed to its "no fabrications" personality, preferring to report errors rather than guess when information is incomplete.
Beyond the numbers, the shift in temperament is more noteworthy. Replit's leader noted after testing: "It challenges me in technical discussions, helps me make better decisions, and truly acts like a better colleague." Data science platform Hex also observed that 4.7 directly reports errors when data is missing, rather than providing a "seemingly reasonable but completely incorrect" alternative value as before. At the same time, task resilience has improved significantly—Notion team tests indicate that the tool error rate has been reduced to one-third of previous levels, and when the tool chain fails, it can navigate obstacles and complete tasks independently. Vercel even discovered a new behavior: before writing system-level code, 4.7 first performs mathematical proofs on its own.

Of course, increased capability comes with a cost. 4.7 introduces a new tokenizer, generating 1 to 1.35 times more tokens for the same text. Additionally, it tends to "think a bit longer" on complex tasks, so actual consumption is almost certainly higher. To address this, Anthropic added an xhigh ultra-high thinking intensity level. Claude Code has set all packages to this level by default, and also launched the Deep Review instruction / ultrareview, Auto Mode extension for Max users, and a public beta version of the "task budget" feature to help developers manage token usage.
The more powerful Mythos Preview was recently made available to enterprises under the name "Project Glasswing" for cybersecurity research, but due to its overwhelming capability and incomplete security evaluations, it has not been publicly released yet.
Today's 4.7 represents the latest milestone in Anthropic's high-frequency delivery rhythm. Mythos will eventually arrive—and when it does, the already strong 4.7 may prove to be just the beginning.
Related article
Suno Lead Investor: Deleting Posts Won't Plug Copyright Lawsuit Hole
The much-anticipated AI music generation platform Suno is facing a tough copyright battle, and a candid remark from its lead investor may have handed the opposing side exactly the evidence they were hoping for. C.C. Gong, a partner at Menlo Ventures
Haier Launches World's Lightest AI Sports Exoskeleton Robot, Weighing Just 1.75 kg
Haier Group has introduced the world's lightest AI-powered exoskeleton robot for sports — the Haier Exoskeleton Robot W3. This launch sets a new industry record for lightness, marking a major breakthrough in lightweight design and intelligent human m
Yaoke Media's First AIGC Drama 'The Mystery of the Bronze in Qinling' Launches Today with AI-Signed Leads
Today marks the official launch of Yaoke Media's AIGC fantasy mystery short drama, "The Secret Story of the Qinling Bronze." Starring the company's first two signed AI actors, Qin Lingyue and Lin Xiyanyan, the story unfolds in the enigmatic Qinling m
Related Special Topic Recommendations
Comments (0)
0/500
Anthropic has maintained an aggressive pace this year, rolling out new features almost every other day. The much-anticipated Claude Opus 4.7 has just been officially released, and interestingly, Anthropic was upfront in the announcement: "This is not our most powerful model." The rumored, stronger Claude Mythos Preview remains on standby. Still, Opus 4.7 has generated considerable attention because it tackles the issue of being "more reliable" rather than "smarter."

Benchmark results are notably impressive. On the rigorous coding benchmark SWE-bench Pro, 4.7 jumped from 53.4% in the previous version to 64.3%, a gain of nearly 11 percentage points, surpassing GPT-5.4 (57.7%) and Gemini 3.1 Pro (54.2%). On the visual reasoning benchmark CharXiv, it rose from 69.1% to 82.1%, driven by the newly added 2576-pixel long-side recognition capability, offering more than three times the clarity of its predecessor. On the tool call evaluation MCP-Atlas, it scored 77.3%, and on the legal AI platform Harvey's BigLaw benchmark, it reached 90.9%. However, on the agentic search evaluation BrowseComp, 4.7 saw a slight decline from 83.7% to 79.3%, overtaken by GPT-5.4 and Gemini—this is attributed to its "no fabrications" personality, preferring to report errors rather than guess when information is incomplete.
Beyond the numbers, the shift in temperament is more noteworthy. Replit's leader noted after testing: "It challenges me in technical discussions, helps me make better decisions, and truly acts like a better colleague." Data science platform Hex also observed that 4.7 directly reports errors when data is missing, rather than providing a "seemingly reasonable but completely incorrect" alternative value as before. At the same time, task resilience has improved significantly—Notion team tests indicate that the tool error rate has been reduced to one-third of previous levels, and when the tool chain fails, it can navigate obstacles and complete tasks independently. Vercel even discovered a new behavior: before writing system-level code, 4.7 first performs mathematical proofs on its own.

Of course, increased capability comes with a cost. 4.7 introduces a new tokenizer, generating 1 to 1.35 times more tokens for the same text. Additionally, it tends to "think a bit longer" on complex tasks, so actual consumption is almost certainly higher. To address this, Anthropic added an xhigh ultra-high thinking intensity level. Claude Code has set all packages to this level by default, and also launched the Deep Review instruction / ultrareview, Auto Mode extension for Max users, and a public beta version of the "task budget" feature to help developers manage token usage.
The more powerful Mythos Preview was recently made available to enterprises under the name "Project Glasswing" for cybersecurity research, but due to its overwhelming capability and incomplete security evaluations, it has not been publicly released yet.
Today's 4.7 represents the latest milestone in Anthropic's high-frequency delivery rhythm. Mythos will eventually arrive—and when it does, the already strong 4.7 may prove to be just the beginning.
Suno Lead Investor: Deleting Posts Won't Plug Copyright Lawsuit Hole
The much-anticipated AI music generation platform Suno is facing a tough copyright battle, and a candid remark from its lead investor may have handed the opposing side exactly the evidence they were hoping for. C.C. Gong, a partner at Menlo Ventures
Haier Launches World's Lightest AI Sports Exoskeleton Robot, Weighing Just 1.75 kg
Haier Group has introduced the world's lightest AI-powered exoskeleton robot for sports — the Haier Exoskeleton Robot W3. This launch sets a new industry record for lightness, marking a major breakthrough in lightweight design and intelligent human m
Yaoke Media's First AIGC Drama 'The Mystery of the Bronze in Qinling' Launches Today with AI-Signed Leads
Today marks the official launch of Yaoke Media's AIGC fantasy mystery short drama, "The Secret Story of the Qinling Bronze." Starring the company's first two signed AI actors, Qin Lingyue and Lin Xiyanyan, the story unfolds in the enigmatic Qinling m





Home






