Home
Alibaba's Aliyun Unveils Fun-CineForge: Open-Sourcing Movie-Grade Dubbing Model and Dataset
Recently, the Fun-CineForge project, developed by the speech team at Alibaba Tongyi Lab in collaboration with the University of Science and Technology of China, has been officially open-sourced. This initiative tackles core challenges in film and television dubbing—such as lip synchronization, voice style transfer, and emotional expression—by introducing a comprehensive end-to-end production workflow and large model solutions.

Core Breakthroughs: Solving the "Out-of-Sync" Problem in Film Dubbing
Traditional AI dubbing often struggles with issues like mismatched lip movements, robotic emotional delivery, and difficulty adapting to complex cinematic scenes involving dialogue and multi-person acoustics. Fun-CineForge achieves a significant breakthrough through two key innovations:
MLLM Dubbing Model: Moving beyond simple lip-area audio-video alignment, it employs a multimodal large language model (MLLM) architecture capable of deeply understanding a character's identity and emotional nuances within a scene.
CineDub Large-Scale Dataset: The project created the first richly annotated Chinese TV show dubbing dataset via an automated pipeline, covering diverse scenarios like monologues, narration, dialogue, and multi-speaker interactions.
Project Updates and Open Source Roadmap
The project has seen frequent recent updates, indicating a high level of engineering maturity:
January to March 2026: Released sample datasets and demonstration demos for both Chinese (CineDub-CN) and English (CineDub-EN).
March 16, 2026: Officially open-sourced the inference code and model weights (Checkpoints), allowing developers to access these resources on GitHub.
Dataset Access: Several classic series datasets are now available for research, including the Chinese series "Dream of the Red Chamber" and the English series "Downton Abbey."
Technical Application: From "Dialogue" to "Performance"
Official demos show the model delivering impressive results in remaking classic series like "Romance of the Three Kingdoms." By inputting specific "emotional clues," the model can accurately capture a character's emotional shift—from fear to defiance—achieving high-fidelity voice cloning and natural lip sync.
The launch of Fun-CineForge signals a shift in film and TV AI dubbing from basic "text-to-speech" to an "automated post-production" tool with artistic comprehension. This advancement is poised to significantly reduce production costs for dubbed film and television content.
Project: https://funcineforge.github.io/
Related article
Apple's first AI hardware revealed: camera-equipped AirPods enter DVT stage
Apple's ambitions in AI hardware are becoming clearer. Well-known tech journalist Mark Gurman reports that the long-anticipated AirPods with built-in cameras have entered the critical final development stage: Design Verification Testing (DVT). This m
iOS27 to Launch Standalone Siri App With Chatbot Interface
With less than a month to go before Apple's 2026 Worldwide Developers Conference (WWDC), renowned tech journalist Mark Gurman has shared new insights into iOS 27. In the upcoming system, codenamed "Rave," Siri is making a comeback as a standalone app
AI Experts Deployed: Large Models Take Over Factories, Industrial Manufacturing Enters New Evolution
On the front lines of biological fermentation, architectural design, and even wastewater treatment, a new kind of "employee" is quietly reshaping traditional manufacturing. These aren't workers covered in sweat—they're industrial time-series control
Related Special Topic Recommendations
Comments (0)
0/500
Recently, the Fun-CineForge project, developed by the speech team at Alibaba Tongyi Lab in collaboration with the University of Science and Technology of China, has been officially open-sourced. This initiative tackles core challenges in film and television dubbing—such as lip synchronization, voice style transfer, and emotional expression—by introducing a comprehensive end-to-end production workflow and large model solutions.

Core Breakthroughs: Solving the "Out-of-Sync" Problem in Film Dubbing
Traditional AI dubbing often struggles with issues like mismatched lip movements, robotic emotional delivery, and difficulty adapting to complex cinematic scenes involving dialogue and multi-person acoustics. Fun-CineForge achieves a significant breakthrough through two key innovations:
MLLM Dubbing Model: Moving beyond simple lip-area audio-video alignment, it employs a multimodal large language model (MLLM) architecture capable of deeply understanding a character's identity and emotional nuances within a scene.
CineDub Large-Scale Dataset: The project created the first richly annotated Chinese TV show dubbing dataset via an automated pipeline, covering diverse scenarios like monologues, narration, dialogue, and multi-speaker interactions.
Project Updates and Open Source Roadmap
The project has seen frequent recent updates, indicating a high level of engineering maturity:
January to March 2026: Released sample datasets and demonstration demos for both Chinese (CineDub-CN) and English (CineDub-EN).
March 16, 2026: Officially open-sourced the inference code and model weights (Checkpoints), allowing developers to access these resources on GitHub.
Dataset Access: Several classic series datasets are now available for research, including the Chinese series "Dream of the Red Chamber" and the English series "Downton Abbey."
Technical Application: From "Dialogue" to "Performance"
Official demos show the model delivering impressive results in remaking classic series like "Romance of the Three Kingdoms." By inputting specific "emotional clues," the model can accurately capture a character's emotional shift—from fear to defiance—achieving high-fidelity voice cloning and natural lip sync.
The launch of Fun-CineForge signals a shift in film and TV AI dubbing from basic "text-to-speech" to an "automated post-production" tool with artistic comprehension. This advancement is poised to significantly reduce production costs for dubbed film and television content.
Project: https://funcineforge.github.io/
Apple's first AI hardware revealed: camera-equipped AirPods enter DVT stage
Apple's ambitions in AI hardware are becoming clearer. Well-known tech journalist Mark Gurman reports that the long-anticipated AirPods with built-in cameras have entered the critical final development stage: Design Verification Testing (DVT). This m
iOS27 to Launch Standalone Siri App With Chatbot Interface
With less than a month to go before Apple's 2026 Worldwide Developers Conference (WWDC), renowned tech journalist Mark Gurman has shared new insights into iOS 27. In the upcoming system, codenamed "Rave," Siri is making a comeback as a standalone app
AI Experts Deployed: Large Models Take Over Factories, Industrial Manufacturing Enters New Evolution
On the front lines of biological fermentation, architectural design, and even wastewater treatment, a new kind of "employee" is quietly reshaping traditional manufacturing. These aren't workers covered in sweat—they're industrial time-series control











