option
Home
News
How we built the new family of Gemini Robotics models

How we built the new family of Gemini Robotics models

April 10, 2025
84

How we built the new family of Gemini Robotics models

As Google DeepMind geared up for its latest announcement about the new Gemini 2.0 models tailored for robotics, Carolina Parada, the head of robotics, rallied her team to give the tech one last run-through. They challenged a bi-arm ALOHA robot — you know, those flexible metal arms with all those joints and pincer-like hands that researchers love to use — to tackle tasks it had never done before, with objects it had never seen. "We threw random stuff at it, like putting my shoe on the table and asking it to stash some pens inside," Carolina recalls. "The robot paused for a sec to get the gist of it, then went ahead and did it." Next up, they found a toy basketball hoop and ball and dared the robot to do a "slam dunk." Carolina couldn't help but beam with pride as it nailed it. Carolina says watching the slam dunk was a real "wow" moment. "We've been training models to help robots with specific tasks and understand natural language for a while now, but this? This was a game-changer," Carolina explains. "The robot had zero experience with basketball or this particular toy. Yet it grasped the complex idea of 'slam dunk the ball' and pulled it off smoothly. *On the first go.*" This versatile robot was powered by a Gemini Robotics model, part of a fresh batch of multimodal models designed for robotics. These models enhance Gemini 2.0 by fine-tuning with data specific to robots, integrating physical actions with Gemini's usual multimodal outputs like text, video, and audio. "This milestone sets the stage for the next wave of robotics that can assist in various applications," Google CEO Sundar Pichai said while unveiling the new models on X. The Gemini Robotics models are incredibly versatile, interactive, and general, enabling robots to respond to new objects, settings, and instructions without needing more training. It's a big deal, considering the team's goals. "Our goal is to create embodied AI that powers robots to help with everyday tasks in the real world," says Carolina, whose love for robotics was sparked by sci-fi cartoons as a kid and dreams of automated chores. "Down the road, robots will be just another way we interact with AI, like our phones or computers — physical agents in our world." For robots to do their jobs well and safely, they need two key abilities: understanding and decision-making, and the capacity to act. Gemini Robotics-ER, an "embodied reasoning" model built on Gemini 2.0 Flash, focuses on the former. It can spot elements in its environment, gauge their size and position, and predict the path and grip needed to move them. Then, it generates code to carry out the action. We're now rolling this model out to trusted testers and partners. Google DeepMind is also rolling out Gemini Robotics, its top-tier vision-language-action model, which lets robots analyze a scene, engage with users, and take action. It's made huge strides in an area that's been a headache for roboticists: dexterity. "What's second nature to us humans is tough for robots," Carolina notes. "Dexterity involves both spatial reasoning and intricate physical manipulation. In testing, Gemini Robotics set a new benchmark for dexterity, handling complex multi-step tasks with smooth moves and impressive completion times." Gemini Robotics-ER is a whiz at embodied reasoning, nailing things like object detection, pointing at parts of objects, finding matching points, and 3D object detection. With Gemini Robotics at the helm, machines have whipped up salads, packed kids' lunches, played games like Tic-Tac-Toe, and even crafted an origami fox. Getting models ready to handle a wide range of tasks was no small feat — mainly because it bucks the trend of training models for one specific task until it's perfect. "We went for broad task learning, training models on a ton of tasks," Carolina says. "We figured that after a while, we'd see them start to generalize, and we were spot on." Both models can adapt to various embodiments, from research-focused robots like the bi-arm ALOHA to humanoid robots like Apollo, developed by our partner Apptronik. These models can adjust to different forms, performing tasks like packing a lunchbox or wiping a whiteboard in various robot bodies. This adaptability is crucial for a future where robots might take on a variety of roles. "The potential for robots using these highly general and capable models is vast and thrilling," Carolina says. "They could be super helpful in industries where things are complex, precision matters, and the spaces aren't designed for humans. And they could make life easier in human-centric spaces, like our homes. That's still a ways off, but these models are pushing us forward." Looks like help with those chores might just be on the horizon — eventually.
Related article
Google Relaunches AI-Powered 'Ask Photos' with Improved Speed Features Google Relaunches AI-Powered 'Ask Photos' with Improved Speed Features Following a temporary halt in testing, Google is relaunching its AI-driven "Ask Photos" search functionality in Google Photos with significant enhancements. Powered by Google's Gemini AI technology, this innovative feature helps users locate specific
Microsoft hosts xAI's advanced Grok 3 models in new AI collaboration Microsoft hosts xAI's advanced Grok 3 models in new AI collaboration Earlier this month, my *Notepad* investigative journalism uncovered Microsoft's plans to integrate Elon Musk's Grok AI models - revelations that have now been officially confirmed. Today at Microsoft's annual Build developer conference, company execu
Apple Teams Up with Anthropic to Develop AI Coding Tool for Xcode Apple Teams Up with Anthropic to Develop AI Coding Tool for Xcode Apple and Anthropic Collaborate on AI-Powered Coding Assistant According to Bloomberg, Apple is developing an advanced AI coding assistant that will integrate directly into Xcode, its flagship development environment. This collaboration with Anthrop
Comments (22)
0/200
CarlGarcia
CarlGarcia September 19, 2025 at 12:30:33 AM EDT

Finalmente um modelo de robótica que parece promissor! 🤖 Mas confesso que fico pensando se esses braços robóticos vão substituir humanos em tarefas domésticas... Será que um dia vou ter um robô fazendo meu café da manhã? 😅

KeithLopez
KeithLopez August 8, 2025 at 1:01:00 PM EDT

The Gemini 2.0 robotics models sound like a game-changer! I’m curious how those bi-arm ALOHA robots handle real-world tasks—hope they don’t get too cocky with all that flexibility! 🤖

WilliamMiller
WilliamMiller April 13, 2025 at 8:57:22 PM EDT

Os novos modelos de robótica Gemini são de tirar o fôlego! Ver o robô ALOHA em ação foi como ver ficção científica se tornar realidade. Mas, o jargão técnico foi um pouco acima da minha compreensão. Poderia usar uma explicação mais simples para nós, não técnicos. Ainda assim, super legal! 🤖

StephenGreen
StephenGreen April 12, 2025 at 11:41:57 PM EDT

新しいジェミニロボティクスモデルは驚異的です!ALOHAロボットの動きを見るのは、SFが現実になったようでした。でも、技術用語が少し難しかったです。非技術者向けに簡単な説明が欲しいです。それでも、すごくクール!🤖

BenHernández
BenHernández April 12, 2025 at 6:11:04 PM EDT

新しいジェミニ2.0モデルがロボティクスに導入されるって聞いてワクワクする!二腕のALOHAロボットが複雑なタスクをこなすなんて本当に驚き。カロリーナ・パラダのチームは素晴らしい仕事をしたね。実世界でこれを見るのが楽しみ。でも、転ばないといいけど!

JonathanAllen
JonathanAllen April 12, 2025 at 8:44:44 AM EDT

Mô hình Gemini 2.0 mới cho robot nghe thật tuyệt vời! Robot ALOHA hai cánh tay thực hiện các nhiệm vụ phức tạp thật sự làm choáng váng. Đội ngũ của Carolina Parada đã làm việc xuất sắc. Không thể chờ đợi để thấy chúng hoạt động trong thế giới thực. Hy vọng là chúng không vấp ngã đâu!

Back to Top
OR