option
Home News DataGemma Tackles AI Hallucinations with Real-World Data

DataGemma Tackles AI Hallucinations with Real-World Data

release date release date April 10, 2025
views views 73

DataGemma Tackles AI Hallucinations with Real-World Data

Large language models (LLMs) are at the heart of today's AI breakthroughs, capable of sifting through massive text datasets to produce summaries, spark creative ideas, and even write code. Yet, despite their prowess, these models can sometimes deliver information that's just plain wrong, a problem we call "hallucination." It's a big hurdle in the world of generative AI.

We're excited to share some cutting-edge research that's tackling this issue head-on, aiming to curb hallucinations by grounding LLMs in real-world stats. And we're thrilled to introduce DataGemma, the first open models that link LLMs with a wealth of real-world data from Google's Data Commons.

Data Commons: A Treasure Trove of Trustworthy Data

Data Commons is like a giant, ever-growing library of public data, boasting over 240 billion data points on everything from health to economics. It pulls this info from reliable sources like the UN, WHO, CDC, and Census Bureaus. By merging these datasets into a single, powerful toolset and AI models, Data Commons helps policymakers, researchers, and organizations get the accurate insights they need.

Imagine a vast database where you can ask questions in plain English, like which African countries have seen the biggest jump in electricity access, or how income relates to diabetes across US counties. That's Data Commons for you.

How Data Commons Helps Fight Hallucination

As more folks turn to generative AI, we're working to make these experiences more grounded by weaving Data Commons into Gemma, our family of lightweight, top-notch open models. These DataGemma models are now available for researchers and developers to dive into.

DataGemma boosts Gemma's capabilities by tapping into Data Commons' knowledge, using two cool methods to improve the accuracy and reasoning of LLMs:

  1. RIG (Retrieval-Interleaved Generation) amps up our Gemma 2 model by actively checking facts against Data Commons. When you ask DataGemma a question, it hunts down statistical data from Data Commons to give you a solid answer. While RIG isn't a new idea, the way we're using it in DataGemma is pretty special.

    Example query: ''Has the use of renewables increased in the world?'' applying DataGemma RIG methodology leverages Data Commons (DC) for authoritative data.
  2. RAG (Retrieval-Augmented Generation) lets language models pull in extra info beyond what they've been trained on, making their answers richer and more accurate. With DataGemma, we use Gemini 1.5 Pro's long context window to fetch relevant data from Data Commons before the model starts crafting its response, cutting down on hallucinations.

    Example query: ''Has the use of renewables increased in the world?'' applying DataGemma RAG methodology showcases greater reasoning and inclusion of footnotes.

Promising Results and What's Next

Our early tests with RIG and RAG are looking good. We're seeing better accuracy in our models when dealing with numbers, which means fewer hallucinations for folks using these models for research, decision-making, or just to satisfy their curiosity. You can check out these results in our research paper.

Illustration of a RAG query and response. Supporting ground truth statistics are referenced as tables served from Data Commons. *Partial response shown for brevity. We're not stopping here. We're all in on refining these methods, scaling up our efforts, and putting them through the wringer with more tests. Eventually, we'll roll out these improvements to both Gemma and Gemini models, starting with a limited-access phase.

By sharing our research and making this new Gemma model variant open, we hope to spread the use of these Data Commons-based techniques far and wide. Making LLMs more reliable and trustworthy is crucial for turning them into essential tools for everyone, helping to build a future where AI gives people accurate info, supports informed choices, and deepens our understanding of the world.

Researchers and developers can jump right in with DataGemma using our quickstart notebooks for both RIG and RAG. To dive deeper into how Data Commons and Gemma work together, check out our Research post.

Related article
Google’s AI Futures Fund may have to tread carefully Google’s AI Futures Fund may have to tread carefully Google’s New AI Investment Initiative: A Strategic Shift Amid Regulatory ScrutinyGoogle's recent announcement of an AI Futures Fund marks a bold move in the tech giant's ongoing qu
Oura adds AI-powered glucose tracking and meal logging Oura adds AI-powered glucose tracking and meal logging Oura Reinforces Its Commitment to Metabolic Health with Two Exciting New FeaturesOura is stepping up its game in the world of metabolic health with two cutting-edge, AI-driven feat
Judge slams lawyers for ‘bogus AI-generated research’ Judge slams lawyers for ‘bogus AI-generated research’ Judge Penalizes Law Firms for Using AI Without DisclosureIn a recent ruling, California Judge Michael Wilner slapped two prominent law firms with a hefty fine of $31,000 for secret
Comments (30)
0/200
StevenHill
StevenHill April 10, 2025 at 8:45:43 AM GMT

DataGemma's approach to tackling AI hallucinations is impressive! It really helps in filtering out the nonsense from AI outputs. However, sometimes it's a bit too cautious and filters out useful info too. Still, a step in the right direction!

RoySmith
RoySmith April 11, 2025 at 6:38:27 AM GMT

DataGemmaのAIの幻覚対策は素晴らしいですね!AIの出力から無意味な情報をフィルタリングするのに役立ちます。ただ、時々過剰に慎重で、有用な情報までフィルタリングしてしまうことがあります。それでも、正しい方向への一歩です!

CarlHill
CarlHill April 11, 2025 at 6:18:09 AM GMT

DataGemma의 AI 환각 문제 해결 방식이 인상적이에요! AI 출력에서 nonsense를 걸러내는 데 정말 도움이 됩니다. 하지만 때때로 너무 신중해서 유용한 정보도 걸러내는 경우가 있어요. 그래도 올바른 방향으로 나아가는 한 걸음이죠!

JosephGreen
JosephGreen April 10, 2025 at 7:38:27 PM GMT

A abordagem da DataGemma para lidar com as alucinações de IA é impressionante! Realmente ajuda a filtrar o absurdo das saídas de IA. No entanto, às vezes é um pouco cautelosa demais e filtra informações úteis também. Ainda assim, um passo na direção certa!

LarryMartinez
LarryMartinez April 10, 2025 at 11:04:37 AM GMT

La forma en que DataGemma aborda las alucinaciones de la IA es impresionante. Realmente ayuda a filtrar la basura de las salidas de la IA. Sin embargo, a veces es un poco demasiado cautelosa y filtra información útil también. Aún así, es un paso en la dirección correcta.

RonaldMartinez
RonaldMartinez April 11, 2025 at 5:27:29 PM GMT

DataGemma is a lifesaver when it comes to dealing with AI hallucinations. It really grounds the models with real-world data, which is super helpful for my projects. Sometimes it feels a bit slow, but hey, accuracy over speed any day, right? Definitely a must-have tool!

Back to Top
OR