option
Home
News
How we built the new family of Gemini Robotics models

How we built the new family of Gemini Robotics models

April 10, 2025
83

How we built the new family of Gemini Robotics models

As Google DeepMind geared up for its latest announcement about the new Gemini 2.0 models tailored for robotics, Carolina Parada, the head of robotics, rallied her team to give the tech one last run-through. They challenged a bi-arm ALOHA robot — you know, those flexible metal arms with all those joints and pincer-like hands that researchers love to use — to tackle tasks it had never done before, with objects it had never seen. "We threw random stuff at it, like putting my shoe on the table and asking it to stash some pens inside," Carolina recalls. "The robot paused for a sec to get the gist of it, then went ahead and did it." Next up, they found a toy basketball hoop and ball and dared the robot to do a "slam dunk." Carolina couldn't help but beam with pride as it nailed it. Carolina says watching the slam dunk was a real "wow" moment. "We've been training models to help robots with specific tasks and understand natural language for a while now, but this? This was a game-changer," Carolina explains. "The robot had zero experience with basketball or this particular toy. Yet it grasped the complex idea of 'slam dunk the ball' and pulled it off smoothly. *On the first go.*" This versatile robot was powered by a Gemini Robotics model, part of a fresh batch of multimodal models designed for robotics. These models enhance Gemini 2.0 by fine-tuning with data specific to robots, integrating physical actions with Gemini's usual multimodal outputs like text, video, and audio. "This milestone sets the stage for the next wave of robotics that can assist in various applications," Google CEO Sundar Pichai said while unveiling the new models on X. The Gemini Robotics models are incredibly versatile, interactive, and general, enabling robots to respond to new objects, settings, and instructions without needing more training. It's a big deal, considering the team's goals. "Our goal is to create embodied AI that powers robots to help with everyday tasks in the real world," says Carolina, whose love for robotics was sparked by sci-fi cartoons as a kid and dreams of automated chores. "Down the road, robots will be just another way we interact with AI, like our phones or computers — physical agents in our world." For robots to do their jobs well and safely, they need two key abilities: understanding and decision-making, and the capacity to act. Gemini Robotics-ER, an "embodied reasoning" model built on Gemini 2.0 Flash, focuses on the former. It can spot elements in its environment, gauge their size and position, and predict the path and grip needed to move them. Then, it generates code to carry out the action. We're now rolling this model out to trusted testers and partners. Google DeepMind is also rolling out Gemini Robotics, its top-tier vision-language-action model, which lets robots analyze a scene, engage with users, and take action. It's made huge strides in an area that's been a headache for roboticists: dexterity. "What's second nature to us humans is tough for robots," Carolina notes. "Dexterity involves both spatial reasoning and intricate physical manipulation. In testing, Gemini Robotics set a new benchmark for dexterity, handling complex multi-step tasks with smooth moves and impressive completion times." Gemini Robotics-ER is a whiz at embodied reasoning, nailing things like object detection, pointing at parts of objects, finding matching points, and 3D object detection. With Gemini Robotics at the helm, machines have whipped up salads, packed kids' lunches, played games like Tic-Tac-Toe, and even crafted an origami fox. Getting models ready to handle a wide range of tasks was no small feat — mainly because it bucks the trend of training models for one specific task until it's perfect. "We went for broad task learning, training models on a ton of tasks," Carolina says. "We figured that after a while, we'd see them start to generalize, and we were spot on." Both models can adapt to various embodiments, from research-focused robots like the bi-arm ALOHA to humanoid robots like Apollo, developed by our partner Apptronik. These models can adjust to different forms, performing tasks like packing a lunchbox or wiping a whiteboard in various robot bodies. This adaptability is crucial for a future where robots might take on a variety of roles. "The potential for robots using these highly general and capable models is vast and thrilling," Carolina says. "They could be super helpful in industries where things are complex, precision matters, and the spaces aren't designed for humans. And they could make life easier in human-centric spaces, like our homes. That's still a ways off, but these models are pushing us forward." Looks like help with those chores might just be on the horizon — eventually.
Related article
Unlocking AI Growth Through Workforce and Energy Infrastructure Investments Unlocking AI Growth Through Workforce and Energy Infrastructure Investments AI offers the United States a transformative chance to drive innovation and economic growth. Its adoption will boost the economy, create jobs, and accelerate scientific progress. To fully seize these
Adobe and Figma Integrate OpenAI's Advanced Image Generation Model Adobe and Figma Integrate OpenAI's Advanced Image Generation Model OpenAI’s enhanced image generation in ChatGPT has driven a surge in users, fueled by its ability to produce Studio Ghibli-style visuals and unique designs, and is now expanding to other platforms. The
Tech Giants Divided on EU AI Code as Compliance Deadline Nears Tech Giants Divided on EU AI Code as Compliance Deadline Nears The EU's AI General-Purpose Code of Practice has revealed stark differences among leading tech firms. Microsoft has expressed its intent to adopt the European Union's voluntary AI compliance framework
Comments (21)
0/200
KeithLopez
KeithLopez August 8, 2025 at 1:01:00 PM EDT

The Gemini 2.0 robotics models sound like a game-changer! I’m curious how those bi-arm ALOHA robots handle real-world tasks—hope they don’t get too cocky with all that flexibility! 🤖

WilliamMiller
WilliamMiller April 13, 2025 at 8:57:22 PM EDT

Os novos modelos de robótica Gemini são de tirar o fôlego! Ver o robô ALOHA em ação foi como ver ficção científica se tornar realidade. Mas, o jargão técnico foi um pouco acima da minha compreensão. Poderia usar uma explicação mais simples para nós, não técnicos. Ainda assim, super legal! 🤖

StephenGreen
StephenGreen April 12, 2025 at 11:41:57 PM EDT

新しいジェミニロボティクスモデルは驚異的です!ALOHAロボットの動きを見るのは、SFが現実になったようでした。でも、技術用語が少し難しかったです。非技術者向けに簡単な説明が欲しいです。それでも、すごくクール!🤖

BenHernández
BenHernández April 12, 2025 at 6:11:04 PM EDT

新しいジェミニ2.0モデルがロボティクスに導入されるって聞いてワクワクする!二腕のALOHAロボットが複雑なタスクをこなすなんて本当に驚き。カロリーナ・パラダのチームは素晴らしい仕事をしたね。実世界でこれを見るのが楽しみ。でも、転ばないといいけど!

JonathanAllen
JonathanAllen April 12, 2025 at 8:44:44 AM EDT

Mô hình Gemini 2.0 mới cho robot nghe thật tuyệt vời! Robot ALOHA hai cánh tay thực hiện các nhiệm vụ phức tạp thật sự làm choáng váng. Đội ngũ của Carolina Parada đã làm việc xuất sắc. Không thể chờ đợi để thấy chúng hoạt động trong thế giới thực. Hy vọng là chúng không vấp ngã đâu!

DonaldSanchez
DonaldSanchez April 11, 2025 at 10:55:17 PM EDT

제미니 2.0 로봇 모델 정말 놀랍네요! 이팔 로봇 ALOHA를 보고 SF 영화 같다고 생각했어요. 정밀도와 유연성이 대단해요. 단점은 제 차고에서의 DIY에는 조금 너무 고급이라는 점이에요! 😂 다음에 뭐가 나올지 기대돼요!

Back to Top
OR