option
Home
News
Google's Gemini Omni Generates Video from Images, Audio, and Text

Google's Gemini Omni Generates Video from Images, Audio, and Text

May 26, 2026
70

Three years ago, Google introduced Gemini with the aim of developing a multimodal large language model—a unified neural network trained on text, images, audio, and video, capable of generating content across all these formats.

At its Google I/O developer conference today, the company advanced toward this vision with Gemini Omni, a new family of multimodal models. Google CEO Sundar Pichai stated that Omni will empower users to "create anything from any input."

Omni's initial focus is video. Users can now combine images, audio, video, and text. Instead of merely stitching these elements together, Omni intelligently reasons across all modalities to produce a coherent output. This results in high-quality videos that demonstrate an understanding of physics, culture, history, and science.

Omni also enables users to edit photos using simple text commands, eliminating the need for complex software, similar to Google's Nano Banana tool.

Google already offers Veo, a dedicated video model that transforms text and images into videos and allows for directing and customizing avatars. However, Nicole Brichtova, Director of Product Management at Google DeepMind, emphasized that today's release represents more than just a Veo update: "It's the next step in merging Gemini's intelligence with the rendering capabilities of our media models."

During a media briefing on Monday, DeepMind's Chief Technologist Koray Kavukcuoglu provided an example: When prompted with "a claymation explainer of protein folding," Omni quickly generated a stop-motion video with a voiceover explaining, "Proteins begin as chains of amino acids. They fold into structures like alpha helices and flat sections called beta sheets, ultimately forming a precise three-dimensional shape."

The long-term vision for Omni is broader, encompassing capabilities like generating images from audio or audio from video.

"When we first announced Gemini, it was our first natively multimodal AI model," Pichai remarked during the briefing. "We knew training it on a combination of text, code, audio, images, and video would lead to a deeper understanding of the world. With world models, AI is evolving from predicting text to simulating reality. Gemini Omni is the next step in that direction."

As part of this release, users will also be able to create videos featuring their own digital avatars—a feature popularized by OpenAI's now-discontinued Sora app with Cameos. To prevent deepfakes, users must complete a dedicated onboarding process, which involves recording themselves while speaking a series of numbers, according to Brichtova. The avatar is then saved for future use.

Additionally, all videos created with Omni will include Google's SynthID digital watermark, allowing users to verify if content was generated using Gemini products.

The first model in the family is Gemini Omni Flash, launching today on the Gemini app, YouTube Shorts, and the AI creative studio Flow. Flash can render 10-second videos. Brichtova clarified that this duration is not a model limitation but a strategic decision to broaden accessibility, anticipating that most users currently prefer shorter clips. Support for longer videos is planned for the near future.

Google appears to be positioning Omni Flash primarily as a consumer tool. During a call with TechCrunch, Brichtova and DeepMind research engineer Gabe Barth-Maron described avatar use cases as personal, such as creating a video of yourself winning an award or visiting the moon, or removing a bystander from a vacation video background.

Barth-Maron summarized it succinctly: "They're like personalized memes."

"We definitely focused on making this easy for consumers to use," Brichtova said. "Not many video models have successfully crossed over to the mainstream consumer market, so this is our attempt to do that."

This ease of use comes with a caveat: Brichtova and Barth-Maron noted that editing prompts must be highly specific. Otherwise, Omni might over-edit or unintentionally alter elements the user intended to keep—a challenge also faced by Nano Banana users.

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

Image Credits:Google

Despite its immediate consumer focus, Omni's potential for enterprise and creative applications is evident. Google will make Omni available via API in the coming weeks. The avatar-generation tool—already available on Shorts—is expected to gain traction among content creators. More broadly, an end-to-end multimodal workflow could revolutionize advertising and filmmaking.

Startup Luma AI is developing a similar agentic tool powered by its own "unified" model, capable of generating an entire ad campaign from a brief and a product image.

"We're actually quite proud of the model's text-rendering capabilities, which are very useful for applications like advertising," Brichtova said. "If you need a product placement or even just a slogan, the accuracy is crucial... We certainly anticipate filmmakers and other creators will adopt this model as well."

More professional use cases may be better served by the upcoming Omni Pro model, designed to deliver superior performance across all Omni tasks. Google has not announced a release date for Pro yet, but Brichtova indicated it will launch when "we achieve a significant leap in capability beyond Flash."

Related article
Google Photos brings Clueless's iconic closet to life with AI Google Photos brings Clueless's iconic closet to life with AI Google Photos announced a new AI-powered feature on Wednesday that will soon turn photos of your clothes into a digital closet, letting you create fresh outfit combinations and even virtually try them on. The concept clearly draws inspiration from Ch
Google IO 2026 unveils voice interaction with Gmail inbox Google IO 2026 unveils voice interaction with Gmail inbox Google continues to integrate AI into your inbox. At the IO 2026 developer conference on Tuesday, the company expanded its Gmail "AI Inbox" feature with conversational AI, allowing users to ask questions about their inbox content rather than relying
Google rolls out Gemini in Chrome to India Google rolls out Gemini in Chrome to India On Wednesday, Google announced it is expanding Gemini integration for Chrome to new regions, including India, Canada, and New Zealand. This rollout allows desktop users to access Gemini via a sidebar, where they can ask Google’s AI chatbot about on-s
Related Special Topic Recommendations
Productivity AI Architecture Designers: Build Scalable System Architectures Using Natural Language
AI Architecture Designers: Build Scalable System Architectures Using Natural Language

Discover the 2026 best AI architecture design tools on XIX.AI. Our curated, top-rated list features powerful, game-changing solutions to build scalable system architectures using natural language. Compare free vs paid options with real-world insights. Unlock your AI edge and streamline development today.

10 tools
xix.ai
Comic Creation AI Character Profile Creators: Generate Detailed Backstories & Visual Refs for Manga Leads
AI Character Profile Creators: Generate Detailed Backstories & Visual Refs for Manga Leads

2026 Latest Best AI Character Profile Creators: Discover top-rated tools to generate detailed backstories and visual references for your manga leads. Our curated, weekly-updated list compares free vs paid options based on real-world tests. Find powerful, game-changing solutions to craft compelling characters and streamline your creative workflow. Explore the rankings on XIX.AI and unlock your perfect storytelling ally today.

10 tools
xix.ai
Health & Wellness AI Pregnancy Copilots: Generate Safe Trimester-by-Trimester Workout & Nutrition Plans
AI Pregnancy Copilots: Generate Safe Trimester-by-Trimester Workout & Nutrition Plans

Discover the 2026 best AI pregnancy copilots for safe, personalized trimester-by-trimester workout and nutrition plans. Get top-rated, curated recommendations with free vs paid comparisons and real-world insights. Unlock your healthiest pregnancy journey with XIX.AI's expert guide. Explore now.

10 tools
xix.ai
writing Best Free AI Undetectable Writers: Turn Robotic Drafts into Natural, Human-Like Prose
Best Free AI Undetectable Writers: Turn Robotic Drafts into Natural, Human-Like Prose

Discover the 2026 best free undetectable AI writers at XIX.AI. Our top-rated, curated list helps you transform robotic drafts into natural, human-like prose. Compare free vs paid options with real-world tests and weekly updated rankings. Unlock your AI writing edge today.

10 tools
xix.ai
Image editing AI Art Generators for Short-Drama Storyboards: Fantasy & Urban Romance Characters
AI Art Generators for Short-Drama Storyboards: Fantasy & Urban Romance Characters

2026 Latest: Discover the best AI art generators for short-drama storyboards. Our curated list features top-rated tools for creating compelling fantasy and urban romance characters. Compare free vs paid options, see real-world test results, and find your perfect creative partner. Get weekly updated rankings and expert insights from XIX.AI. Start visualizing your story today!

10 tools
xix.ai
writing Best AI Scripting Tools for Radio & Podcasting: Write Engaging Audio Commercials
Best AI Scripting Tools for Radio & Podcasting: Write Engaging Audio Commercials

Discover the 2026 best AI scripting tools for radio & podcasting at XIX.AI. Our curated, top-rated list features powerful, game-changing solutions to write engaging audio commercials fast. Compare free vs paid options with real-world tests and weekly updated rankings. Unlock your creative edge today!

10 tools
xix.ai
Comments (0)
0/500
OR