Microsoft's VibeVoice AI Family Goes Open Source, Handles 90-Minute Dialogues, Tops 27K GitHub Stars

Home

News

May 28, 2026

JohnRoberts

Microsoft has recently open-sourced a state-of-the-art family of voice AI models named VibeVoice, featuring capabilities like automatic speech recognition (ASR) and text-to-speech (TTS). The project has rapidly captured the developer community's interest, thanks to its robust long-audio processing, natural multi-speaker dialogue generation, and real-time, low-latency performance. It has already garnered around 27,000 Stars on GitHub.

Released as an open-source research framework under the MIT license, VibeVoice supports local deployment with no cloud subscription fees, aiming to foster collaboration and innovation in speech synthesis. The model family comprises three core members, each addressing specific challenges in traditional voice AI, such as long-sequence handling, speaker consistency, and natural fluency.

VibeVoice-ASR-7B: A Powerful Tool for Structured Speech-to-Text, Handling Up to 60 Minutes of Audio

VibeVoice-ASR-7B is a unified speech-to-text model capable of processing audio files up to 60 minutes long in a single pass, directly outputting structured transcripts. The output identifies the speaker, provides precise timestamps, and details the spoken content, while supporting custom hotwords to improve accuracy for proper nouns or technical terms. Supporting over 50 languages, it is well-suited for complex scenarios like lengthy meeting recordings and podcast transcription.

Community developers have already built practical tools on this model, such as a voice input method called Vibing for macOS and Windows. User feedback indicates strong performance in speed and accuracy, significantly boosting daily voice input efficiency.

VibeVoice-TTS-1.5B: Expressive Speech Generation for Up to 90 Minutes with Multiple Speakers

VibeVoice-TTS-1.5B is the core text-to-speech model, capable of generating continuous audio up to 90 minutes long in one go and supporting up to four distinct speakers for natural dialogue simulation. The synthesized speech is expressive, sounding natural and fluent with realistic pauses, emphasis, and emotional shifts, making it ideal for podcasts, long narratives, audiobooks, or multi-character dialogues.

Unlike many traditional TTS models limited to 1-2 speakers, VibeVoice-TTS achieves significant breakthroughs in long-form and multi-speaker consistency. Its architecture combines a continuous speech tokenizer (acoustic and semantic) with a low frame rate (7.5Hz), greatly enhancing computational efficiency for long sequences.

VibeVoice-Realtime-0.5B: Real-Time TTS with Around 300 Milliseconds of Latency

VibeVoice-Realtime-0.5B is designed for real-time applications, supporting streaming text input with a first-audio latency of approximately 300 milliseconds, while still capable of generating audio up to 10 minutes long. This model is particularly suitable for interactive applications requiring instant feedback, such as real-time voice assistants or live streaming dubbing.

Additionally, the project introduced experimental speaker support, including multilingual speech and various English style variations, offering developers greater customization options.

AIbase Review: Microsoft's open-sourcing of VibeVoice not only lowers the barrier to entry for high-performance voice AI but also provides a complete local deployment solution. The project was briefly taken down due to potential misuse risks but was relaunched after implementing security measures like audio watermarks and audible disclaimers, reflecting responsible AI development principles. Developers can now obtain model weights from GitHub and Hugging Face and quickly test them via platforms like Colab.

With ongoing contributions from the open-source community, including optimizations for Apple Silicon, VibeVoice is poised to accelerate adoption in content creation, accessibility tools, and voice interaction. Interested developers can visit Microsoft's official project page for further exploration.

Project Address: https://github.com/microsoft/VibeVoice

OpenAI Partners with U.S. Department of Defense, ChatGPT Uninstallations Surge 295% Public Outrage: OpenAI's Military Partnership Sparks a 'Uninstall Surge'Recently, AI leader OpenAI announced a deep partnership with the U.S. Department of Defense (DoD), integrating its AI models into top-secret military networks. The news sparked w

OpenAI Launches Sites Feature, Marking the End of the No-Code Era with Word-Powered Websites OpenAI has introduced Sites, a new feature for Codex, its AI for software engineering. Currently in preview, it's available only to paying Business and Enterprise subscribers and aims to remove traditional barriers in web and application development.

OpenAI Acquires AI Personal Finance Startup Hiro OpenAI has acquired the personal finance startup Hiro Finance, founder Ethan Bloch announced on Monday, with OpenAI confirming the deal to TechCrunch. The startup was backed by top fintech venture capital firm Ribbit, along with General Catalyst and

Related Special Topic Recommendations

Video creation

Best AI Text to Video Platforms for Script Writing and Visual Storytelling

2026 Latest Best AI Text to Video Platforms: Top-rated tools for script writing and visual storytelling. Discover powerful, game-changing solutions to transform your text into engaging videos. Compare free vs paid options with our weekly updated rankings and real-world tests. Find your perfect platform to boost creativity and productivity. Explore the curated selection at XIX.AI.

10 tools

xix.ai

chatbot

AI Multi-Agent Orchestrators: Design Complex Automated Workflows through Natural Language

2026 Latest: Discover the best AI multi-agent orchestrators to design complex automated workflows through natural language. Our curated list features top-rated, powerful platforms for seamless task automation and intelligent process management. Compare free vs paid options with real-world insights. Unlock your AI edge with XIX.AI's expert weekly updated rankings.

10 tools

xix.ai

Image editing

Best AI Noise Reduction Software: Remove Grain & Artifacts from Low-Light Night Photography

Discover the 2026 best AI noise reduction software for low-light night photography. Our top-rated, curated list compares free vs paid tools, featuring real-world tests and weekly updated rankings. Remove grain & artifacts effortlessly. Unlock your AI edge at XIX.AI.

10 tools

xix.ai

chatbot

Best Custom AI Girlfriend Generators: Design Unique Personalities, Hobbies, and Backstories

Discover the 2026 best custom AI girlfriend generators on XIX.AI. Explore our top-rated, curated list for designing unique personalities, hobbies, and deep backstories. Compare free vs paid options with real-world insights. Unlock your perfect creative companion today.

10 tools

xix.ai

Productivity

AI Architecture Designers: Build Scalable System Architectures Using Natural Language

Discover the 2026 best AI architecture design tools on XIX.AI. Our curated, top-rated list features powerful, game-changing solutions to build scalable system architectures using natural language. Compare free vs paid options with real-world insights. Unlock your AI edge and streamline development today.

10 tools

xix.ai

Comic Creation

AI Character Profile Creators: Generate Detailed Backstories & Visual Refs for Manga Leads

2026 Latest Best AI Character Profile Creators: Discover top-rated tools to generate detailed backstories and visual references for your manga leads. Our curated, weekly-updated list compares free vs paid options based on real-world tests. Find powerful, game-changing solutions to craft compelling characters and streamline your creative workflow. Explore the rankings on XIX.AI and unlock your perfect storytelling ally today.

10 tools

xix.ai

Comments (0)

0/500

Please login first