Home
DeepMind CEO Demis Hassabis Announces Future Integration of Google's Gemini and Veo AI Models

In a recent episode of the podcast Possible, co-hosted by LinkedIn co-founder Reid Hoffman, Google DeepMind CEO Demis Hassabis shared some exciting news about Google's plans. He revealed that Google is looking to merge its Gemini AI models with the Veo video-generating models. This fusion aims to enhance Gemini's grasp of the physical world, making it more adept at understanding real-life dynamics.
Hassabis emphasized that from the get-go, Gemini was designed to be multimodal. "We've always built Gemini, our foundation model, to be multimodal from the beginning," he explained. The motivation behind this approach? A vision for a universal digital assistant that can truly assist in everyday life. "An assistant that … actually helps you in the real world," Hassabis elaborated.
The AI industry is steadily progressing toward what you might call "omni" models—those capable of handling and synthesizing various types of media. Google's latest Gemini iterations, for instance, can produce not just text but also audio and images. Meanwhile, OpenAI's ChatGPT default model can whip up images on the spot, including delightful Studio Ghibli-style art. Amazon isn't far behind, with plans to roll out an "any-to-any" model later this year.
These omni models demand a hefty amount of training data—think images, videos, audio, and text. Hassabis hinted that Veo's video data primarily comes from YouTube, a treasure trove owned by Google. "Basically, by watching YouTube videos — a lot of YouTube videos — [Veo 2] can figure out, you know, the physics of the world," he noted.
Google had previously mentioned to TechCrunch that its models "may be" trained on "some" YouTube content, aligning with agreements made with YouTube creators. It's worth noting that last year, Google expanded its terms of service, partly to access more data for training its AI models.
Related article
Google integrates agentic AI and vibe-coded widgets into Android
Google announced a fresh batch of AI features under its Gemini Intelligence brand during the “Android Show: I/O Edition” event on Tuesday. These capabilities include having the AI handle tasks across multiple apps, browse the web, fill out forms, tra
Meta's AI model excels but open-source identity erodes
The open-source AI landscape has always offered plenty of choices. For years, developers could access models like Mistral, Falcon, and a growing number of open-weight alternatives. But Meta's entry with Llama changed the game. A company with three bi
Father sues Google, blames Gemini chatbot for son's fatal delusion
Jonathan Gavalas, 36, began using Google's Gemini AI chatbot in August 2025 for shopping assistance, writing help, and travel planning. On October 2, he died by suicide. At the time of his death, he believed Gemini was his fully sentient AI wife and
Related Special Topic Recommendations
Comments (2)
0/500
The integration of Gemini and Veo sounds promising! Could this be the key to generating truly coherent multimodal content, or are we just stitching together different black boxes? The computational cost for such combined models might be enormous though. A fascinating glimpse into the future roadmap of Google's AI.

In a recent episode of the podcast Possible, co-hosted by LinkedIn co-founder Reid Hoffman, Google DeepMind CEO Demis Hassabis shared some exciting news about Google's plans. He revealed that Google is looking to merge its Gemini AI models with the Veo video-generating models. This fusion aims to enhance Gemini's grasp of the physical world, making it more adept at understanding real-life dynamics.
Hassabis emphasized that from the get-go, Gemini was designed to be multimodal. "We've always built Gemini, our foundation model, to be multimodal from the beginning," he explained. The motivation behind this approach? A vision for a universal digital assistant that can truly assist in everyday life. "An assistant that … actually helps you in the real world," Hassabis elaborated.
The AI industry is steadily progressing toward what you might call "omni" models—those capable of handling and synthesizing various types of media. Google's latest Gemini iterations, for instance, can produce not just text but also audio and images. Meanwhile, OpenAI's ChatGPT default model can whip up images on the spot, including delightful Studio Ghibli-style art. Amazon isn't far behind, with plans to roll out an "any-to-any" model later this year.
These omni models demand a hefty amount of training data—think images, videos, audio, and text. Hassabis hinted that Veo's video data primarily comes from YouTube, a treasure trove owned by Google. "Basically, by watching YouTube videos — a lot of YouTube videos — [Veo 2] can figure out, you know, the physics of the world," he noted.
Google had previously mentioned to TechCrunch that its models "may be" trained on "some" YouTube content, aligning with agreements made with YouTube creators. It's worth noting that last year, Google expanded its terms of service, partly to access more data for training its AI models.
Google integrates agentic AI and vibe-coded widgets into Android
Google announced a fresh batch of AI features under its Gemini Intelligence brand during the “Android Show: I/O Edition” event on Tuesday. These capabilities include having the AI handle tasks across multiple apps, browse the web, fill out forms, tra
Meta's AI model excels but open-source identity erodes
The open-source AI landscape has always offered plenty of choices. For years, developers could access models like Mistral, Falcon, and a growing number of open-weight alternatives. But Meta's entry with Llama changed the game. A company with three bi
Father sues Google, blames Gemini chatbot for son's fatal delusion
Jonathan Gavalas, 36, began using Google's Gemini AI chatbot in August 2025 for shopping assistance, writing help, and travel planning. On October 2, he died by suicide. At the time of his death, he believed Gemini was his fully sentient AI wife and
The integration of Gemini and Veo sounds promising! Could this be the key to generating truly coherent multimodal content, or are we just stitching together different black boxes? The computational cost for such combined models might be enormous though. A fascinating glimpse into the future roadmap of Google's AI.











