Sesame Unveils Base AI Model Behind Viral Virtual Assistant Maya

Sesame, the innovative AI company behind the strikingly lifelike voice assistant Maya, has just made waves by releasing the base model that drives her capabilities. Dubbed CSM-1B, this model boasts a size of 1 billion parameters, a term that refers to the individual components making up the model. Released under an Apache 2.0 license, it's open for commercial use with minimal restrictions, as announced on the AI development platform Hugging Face.
CSM-1B functions by converting text and audio inputs into "RVQ audio codes." RVQ stands for "residual vector quantization," a method that transforms audio into discrete tokens, or codes. This technique is also utilized in other cutting-edge AI audio technologies, such as Google's SoundStream and Meta's Encodec. At its core, CSM-1B leverages a model from Meta's Llama family, combined with an audio "decoder" component. A specialized version of CSM-1B, after fine-tuning, powers the voice of Maya, according to Sesame.
Describing the model as a "base generation model" on its Hugging Face and GitHub repositories, Sesame notes that it's designed to produce a variety of voices but hasn't been refined for any specific voice. While it has some ability to handle non-English languages thanks to "data contamination" in its training set, its performance in this area is likely subpar. Interestingly, Sesame has kept the details of the training data under wraps, leaving us curious about what went into building this model.
One aspect that raises eyebrows is the lack of robust safeguards. Sesame operates on an honor system, simply encouraging users and developers to avoid using the model to replicate someone's voice without permission, produce misleading content like fake news, or partake in any "harmful" or "malicious" activities. I personally tested the demo on Hugging Face, and within a minute, I had cloned my voice. It was a breeze to generate speech on any topic, even sensitive ones like the election and Russian propaganda.
Consumer Reports recently highlighted the concerning lack of "meaningful" safeguards in many AI-powered voice cloning tools, which could lead to potential fraud or abuse. Sesame, co-founded by Oculus co-creator Brendan Iribe, caught the public's eye in late February with its assistant tech that nearly escapes the uncanny valley. Both Maya and Sesame's other assistant, Miles, exhibit realistic human-like traits such as taking breaths, speaking with disfluencies, and being interruptible mid-speech, similar to OpenAI's Voice Mode.
Financially, Sesame has secured undisclosed funding from heavyweights like Andreessen Horowitz, Spark Capital, and Matrix Partners. Beyond voice assistants, the company is also venturing into prototyping AI glasses intended for all-day wear, equipped with their custom models. This move shows Sesame's ambition to push the boundaries of AI technology further into our daily lives.
Related article
WordPress.com now allows AI agents to write and publish posts, plus more
WordPress.com, the popular web hosting and publishing platform, is now embracing AI agents—a move that could reshape the look and feel of the web. The company announced Friday that it will allow AI agents to draft, edit, and publish content on custom
Kakao Mobility outlines Level 4 autonomous driving roadmap for physical AI
Kakao Mobility is planning to develop Level 4 autonomous driving technologies internally as part of its physical AI strategy.
At the 2026 World IT Show conference in Seoul's COEX, Kim Jin-kyu — vice president and head of Kakao Mobility's Physical AI
Barry Diller: Trust in Sam Altman irrelevant as AGI nears
Barry Diller, the billionaire media titan, does not believe OpenAI CEO Sam Altman is untrustworthy, despite recent reports suggesting otherwise. Speaking at the Wall Street Journal's "Future of Everything" conference this week, Diller defended Altman
Related Special Topic Recommendations
Comments (8)
0/500
C'est incroyable ce que Sesame a fait avec Maya ! Un modèle à 1 milliard de paramètres, ça doit être une sacrée bête. Mais franchement, ça donne quoi en termes d'éthique ? On va tous finir avec des assistants trop parfaits ? 😅
Wow, Sesame's CSM-1B sounds like a game-changer! A billion parameters for Maya’s lifelike voice? That’s some serious tech flex. Curious how it stacks up against other models in real-world use. 😎
Whoa, a 1B parameter model powering Maya? That's some serious brainpower! Curious how Sesame's CSM-1B stacks up against other AI giants. Excited to see where this tech takes us! 🚀
Sesame's base AI model for Maya is mind-blowing! 1 billion parameters? That's insane! Maya's voice is so lifelike, it's like talking to a real person. But sometimes she gets a bit too chatty, which can be annoying. Still, a fantastic piece of tech! 🤯
¡El modelo base de IA de Sesame para Maya es alucinante! ¿1 billón de parámetros? ¡Eso es una locura! La voz de Maya es tan realista, parece que estoy hablando con una persona real. Pero a veces se pone un poco parlanchina, lo que puede ser molesto. Aún así, una tecnología fantástica! 🤯
Das Basis-AI-Modell von Sesame für Maya ist umwerfend! 1 Milliarde Parameter? Das ist verrückt! Mayas Stimme ist so lebensecht, es fühlt sich an, als würde man mit einer echten Person sprechen. Aber manchmal wird sie ein bisschen zu gesprächig, was nervig sein kann. Trotzdem, eine fantastische Technologie! 🤯

Sesame, the innovative AI company behind the strikingly lifelike voice assistant Maya, has just made waves by releasing the base model that drives her capabilities. Dubbed CSM-1B, this model boasts a size of 1 billion parameters, a term that refers to the individual components making up the model. Released under an Apache 2.0 license, it's open for commercial use with minimal restrictions, as announced on the AI development platform Hugging Face.
CSM-1B functions by converting text and audio inputs into "RVQ audio codes." RVQ stands for "residual vector quantization," a method that transforms audio into discrete tokens, or codes. This technique is also utilized in other cutting-edge AI audio technologies, such as Google's SoundStream and Meta's Encodec. At its core, CSM-1B leverages a model from Meta's Llama family, combined with an audio "decoder" component. A specialized version of CSM-1B, after fine-tuning, powers the voice of Maya, according to Sesame.
Describing the model as a "base generation model" on its Hugging Face and GitHub repositories, Sesame notes that it's designed to produce a variety of voices but hasn't been refined for any specific voice. While it has some ability to handle non-English languages thanks to "data contamination" in its training set, its performance in this area is likely subpar. Interestingly, Sesame has kept the details of the training data under wraps, leaving us curious about what went into building this model.
One aspect that raises eyebrows is the lack of robust safeguards. Sesame operates on an honor system, simply encouraging users and developers to avoid using the model to replicate someone's voice without permission, produce misleading content like fake news, or partake in any "harmful" or "malicious" activities. I personally tested the demo on Hugging Face, and within a minute, I had cloned my voice. It was a breeze to generate speech on any topic, even sensitive ones like the election and Russian propaganda.
Consumer Reports recently highlighted the concerning lack of "meaningful" safeguards in many AI-powered voice cloning tools, which could lead to potential fraud or abuse. Sesame, co-founded by Oculus co-creator Brendan Iribe, caught the public's eye in late February with its assistant tech that nearly escapes the uncanny valley. Both Maya and Sesame's other assistant, Miles, exhibit realistic human-like traits such as taking breaths, speaking with disfluencies, and being interruptible mid-speech, similar to OpenAI's Voice Mode.
Financially, Sesame has secured undisclosed funding from heavyweights like Andreessen Horowitz, Spark Capital, and Matrix Partners. Beyond voice assistants, the company is also venturing into prototyping AI glasses intended for all-day wear, equipped with their custom models. This move shows Sesame's ambition to push the boundaries of AI technology further into our daily lives.
WordPress.com now allows AI agents to write and publish posts, plus more
WordPress.com, the popular web hosting and publishing platform, is now embracing AI agents—a move that could reshape the look and feel of the web. The company announced Friday that it will allow AI agents to draft, edit, and publish content on custom
Barry Diller: Trust in Sam Altman irrelevant as AGI nears
Barry Diller, the billionaire media titan, does not believe OpenAI CEO Sam Altman is untrustworthy, despite recent reports suggesting otherwise. Speaking at the Wall Street Journal's "Future of Everything" conference this week, Diller defended Altman
C'est incroyable ce que Sesame a fait avec Maya ! Un modèle à 1 milliard de paramètres, ça doit être une sacrée bête. Mais franchement, ça donne quoi en termes d'éthique ? On va tous finir avec des assistants trop parfaits ? 😅
Wow, Sesame's CSM-1B sounds like a game-changer! A billion parameters for Maya’s lifelike voice? That’s some serious tech flex. Curious how it stacks up against other models in real-world use. 😎
Whoa, a 1B parameter model powering Maya? That's some serious brainpower! Curious how Sesame's CSM-1B stacks up against other AI giants. Excited to see where this tech takes us! 🚀
Sesame's base AI model for Maya is mind-blowing! 1 billion parameters? That's insane! Maya's voice is so lifelike, it's like talking to a real person. But sometimes she gets a bit too chatty, which can be annoying. Still, a fantastic piece of tech! 🤯
¡El modelo base de IA de Sesame para Maya es alucinante! ¿1 billón de parámetros? ¡Eso es una locura! La voz de Maya es tan realista, parece que estoy hablando con una persona real. Pero a veces se pone un poco parlanchina, lo que puede ser molesto. Aún así, una tecnología fantástica! 🤯
Das Basis-AI-Modell von Sesame für Maya ist umwerfend! 1 Milliarde Parameter? Das ist verrückt! Mayas Stimme ist so lebensecht, es fühlt sich an, als würde man mit einer echten Person sprechen. Aber manchmal wird sie ein bisschen zu gesprächig, was nervig sein kann. Trotzdem, eine fantastische Technologie! 🤯





Home






