option
Home
Flash News
Content
MarkSanchez
MarkSanchez
February 11, 2026

Ant Group open-sourced its multimodal AI model Ming-Flash-Omni 2.0. It reportedly surpasses models like Gemini 2.5 Pro in some benchmarks for vision-language understanding, image editing, and audio generation. A key feature is its unified audio generation, producing speech, sound effects, and music on one track from natural language prompts. The model is built on the MoE-based Ling 2.0 architecture and designed as a reusable base for developers to simplify multimodal app development.

Ant Group open-sourced its multimodal AI model Ming-Flash-Omni 2.0. It reportedly surpasses models like Gemini 2.5 Pro in some benchmarks for vision-language understanding, image editing, and audio generation. A key feature is its unified audio generation, producing speech, sound effects, and music on one track from natural language prompts. The model is built on the MoE-based Ling 2.0 architecture and designed as a reusable base for developers to simplify multimodal app development.
Comments (0)
0/300
OR