Anthropic's New AI Model Operates Computers Like Humans, Errors Included

Home

News

May 9, 2025

PaulGonzalez

133

Anthropic

Have you ever dreamed of an AI that can seamlessly interact with your computer, just like a human would? Well, that dream is now a reality, thanks to Anthropic's latest innovation. On Tuesday, they unveiled the new generation of their Claude AI model, named Claude 3.5 Sonnet, which can operate a computer with surprising finesse. Currently in beta mode, this AI is available for developers to experiment with through an API.

Anthropic proudly labels Claude 3.5 Sonnet as the "first frontier AI model to offer computer use in public beta." This means developers can program it to perform a variety of tasks on a computer, such as viewing the screen, maneuvering the cursor, clicking buttons, and even typing on a virtual keyboard. The goal? To replicate the way we interact with our computers every day.

Now, while this new AI is still in the experimental phase, it's not without its hiccups. It can be a bit clumsy and error-prone at times. But that's exactly why Anthropic released it in beta—to gather valuable feedback from developers and refine the model over time.

Why Should We Care About AI Using Computers?

Anthropic has a clear answer to that question: "A vast amount of modern work happens via computers." By enabling AIs to interact with software the same way humans do, they unlock a plethora of new applications that current AI assistants can't handle.

How Can Developers and Users Benefit?

Instead of creating specific tools for each task, Anthropic is teaching Claude general computer skills. This allows the AI to utilize a wide range of standard software programs designed for humans. Developers can harness this capability to automate repetitive tasks, build and test software, and even conduct research.

Several companies are already leveraging Claude 3.5 Sonnet's computer skills, including Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company. For instance, Replit is using these capabilities to enhance its Replit Agent product.

How Did They Train Claude to Use Computers?

Training Claude to navigate a computer involved a lot of trial and error, according to Anthropic. The process requires the AI to understand and interpret images of the computer screen, then decide which actions to take based on what it sees. Claude 3.5 Sonnet accomplishes this by analyzing screenshots, counting pixels to precisely move the cursor, and issuing mouse commands.

How Well Is Claude Performing?

In the OSWorld benchmarking tests, which assess AI models' ability to use computers, Claude 3.5 Sonnet achieved a score of 14.9%. While this is significantly lower than the 70%-75% human-level performance, it's nearly double the 7.7% scored by the next best AI model in the same category.

Despite these promising results, Claude's computer use is still in its infancy. It can't yet perform more complex tasks like dragging windows or zooming into the screen. Additionally, because it relies on screenshots, it might miss certain actions and notifications.

Anthropic remains optimistic, stating, "We expect that computer use will rapidly improve to become faster, more reliable, and more useful for the tasks our users want to complete." They also emphasize that as the technology evolves, it will become more accessible to those with less software development experience, all while maintaining strict safety measures.

Claude 3.5 Sonnet is now accessible to everyone. Developers can start building applications with the computer-use beta on the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.

Manus Debuts 'Wide Research' AI Tool with 100+ Agents for Web Scraping Chinese AI innovator Manus, which previously gained attention for its pioneering multi-agent orchestration platform catering to both consumers and professional users, has unveiled a groundbreaking application of its technology that challenges convent

Why LLMs Ignore Instructions & How to Fix It Effectively Understanding Why Large Language Models Skip Instructions Large Language Models (LLMs) have transformed how we interact with AI, enabling advanced applications ranging from conversational interfaces to automated content generation and programming ass

Pebble Reclaims Its Original Brand Name After Legal Battle The Return of Pebble: Name and AllPebble enthusiasts can rejoice - the beloved smartwatch brand isn't just making a comeback, it's reclaiming its iconic name. "We've successfully regained the Pebble trademark, which honestly surprised me with how smo

Comments (5)

0/200

Submit

WalterBaker

August 27, 2025 at 1:01:33 PM EDT

Wow, Claude 3.5 Sonnet sounds like a game-changer! An AI that mimics human computer use, errors and all? That’s wild. I wonder how it handles my messy desktop and random browser tabs 😅. Super curious to see it in action!

JackWilson

August 4, 2025 at 2:01:00 AM EDT

This AI acting like a human on computers is wild! 😮 Makes me wonder if it’ll start rage-quitting when apps crash like I Elyse.

JackMitchell

July 30, 2025 at 9:42:05 PM EDT

Whoa, an AI that mimics human computer use, mistakes and all? That's wild! Wonder if Claude 3.5 Sonnet will accidentally open 20 browser tabs like I do. 😅 Curious to see how this plays out in real-world tasks!

JohnNelson

July 29, 2025 at 8:25:16 AM EDT

Whoa, an AI that mimics human computer use, errors and all? That's wild! 😄 I wonder how it handles my chaotic desktop—probably better than me!

JuanLewis

July 27, 2025 at 9:19:30 PM EDT

This AI acting like a human on computers is wild! 😮 Makes me wonder if it'll mess up my spreadsheets like my coworker does. Exciting stuff, but I hope it doesn't learn my bad habits too!