Google's Gemma 3 Achieves 98% of DeepSeek's Accuracy with Just One GPU
The economics of artificial intelligence have become a major focus recently, especially with startup DeepSeek AI showcasing impressive economies of scale in using GPU chips. But Google isn't about to be outdone. On Wednesday, the tech giant unveiled its latest open-source large language model, Gemma 3, which nearly matches the accuracy of DeepSeek's R1 model, yet uses significantly less computing power.
Google measured this performance using "Elo" scores, a system commonly used in chess and sports to rank competitors. Gemma 3 scored a 1338, just shy of R1's 1363, which means R1 technically outperforms Gemma 3. However, Google estimates that it would take 32 of Nvidia's H100 GPU chips to reach R1's score, while Gemma 3 achieves its results with only one H100 GPU. Google touts this balance of compute and Elo score as the "sweet spot."
In a blog post, Google describes Gemma 3 as "the most capable model you can run on a single GPU or TPU," referring to its own custom AI chip, the "tensor processing unit." The company claims that Gemma 3 "delivers state-of-the-art performance for its size," outshining models like Llama-405B, DeepSeek-V3, and o3-mini in human preference evaluations on LMArena's leaderboard. This performance makes it easier to create engaging user experiences on a single GPU or TPU host.
Google
Google's model also surpasses Meta's Llama 3 in Elo score, which Google estimates would require 16 GPUs. It's worth noting that these figures for competing models are Google's estimates; DeepSeek AI has only disclosed using 1,814 of Nvidia's less-powerful H800 GPUs for R1.
More in-depth information can be found in a developer blog post on HuggingFace, where the Gemma 3 repository is available. Designed for on-device use rather than data centers, Gemma 3 has a significantly smaller number of parameters compared to R1 and other open-source models. With parameter counts ranging from 1 billion to 27 billion, Gemma 3 is quite modest by current standards, while R1 boasts a hefty 671 billion parameters, though it can selectively use just 37 billion.
The key to Gemma 3's efficiency is a widely used AI technique called distillation, where trained model weights from a larger model are transferred to a smaller one, enhancing its capabilities. Additionally, the distilled model undergoes three quality control measures: Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from Machine Feedback (RLMF), and Reinforcement Learning from Execution Feedback (RLEF). These help refine the model's outputs, making them more helpful and improving its math and coding abilities.
Google's developer blog details these approaches, and another post discusses optimization techniques for the smallest 1 billion parameter model, aimed at mobile devices. These include quantization, updating key-value cache layouts, improving variable loading times, and GPU weight sharing.
Google compares Gemma 3 not only on Elo scores but also against its predecessor, Gemma 2, and its closed-source Gemini models on various benchmarks like LiveCodeBench. While Gemma 3 generally falls short of Gemini 1.5 and Gemini 2.0 in accuracy, Google notes that it "shows competitive performance compared to closed Gemini models," despite having fewer parameters.
Google
A significant upgrade in Gemma 3 over Gemma 2 is its longer "context window," expanding from 8,000 to 128,000 tokens. This allows the model to process larger texts like entire papers or books. Gemma 3 is also multi-modal, capable of handling both text and image inputs, unlike its predecessor. Additionally, it supports over 140 languages, a vast improvement over Gemma 2's English-only capabilities.
Beyond these main features, there are several other interesting aspects to Gemma 3. One issue with large language models is the potential to memorize parts of their training data, which could lead to privacy breaches. Google's researchers tested Gemma 3 for this and found it memorizes long-form text at a lower rate than its predecessors, suggesting improved privacy protection.
For those interested in the nitty-gritty, the Gemma 3 technical paper provides a thorough breakdown of the model's capabilities and development.
Related article
Anthropic's experimental AI Claude completes negotiations and transactions in e-commerce test
As artificial intelligence advances rapidly, Anthropic quietly rolled out an internal experiment called "Project Deal" last Friday, showcasing AI's potential in e-commerce. The experiment had its AI model Claude autonomously handle buying, selling, a
DeepSeek Code poised for launch
As AI technology accelerates, DeepSeek is at a thrilling juncture. The AI company recently revealed it has secured over 70 billion yuan in funding. Leadership has emphasized a commitment to groundbreaking AI research over immediate commercial gains.
Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff?
Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look
Related Special Topic Recommendations
Comments (12)
0/500
¡Estas mejoras en eficiencia son una locura! 🔥 Si Google logra casi el mismo rendimiento con solo una GPU, ¿esto cambiará por completo el acceso a la IA para pequeños desarrolladores? Aun así, me pregunto cómo manejarán temas como el consumo energético real en uso masivo... 😅
Google's Gemma 3 sounds like a game-changer! 98% of DeepSeek's accuracy with just one GPU? That's some serious efficiency. Curious how this'll shake up the AI startup scene. 🚀
Google's Gemma 3 sounds like a game-changer! 98% of DeepSeek's accuracy with just one GPU? That's some serious efficiency. Curious how this stacks up in real-world apps! 😎
Google's Gemma 3 sounds like a game-changer! Achieving 98% of DeepSeek's accuracy with just one GPU is wild. Makes me wonder how this’ll shake up the AI race—more power to the little guys? 🤔
The economics of artificial intelligence have become a major focus recently, especially with startup DeepSeek AI showcasing impressive economies of scale in using GPU chips. But Google isn't about to be outdone. On Wednesday, the tech giant unveiled its latest open-source large language model, Gemma 3, which nearly matches the accuracy of DeepSeek's R1 model, yet uses significantly less computing power.
Google measured this performance using "Elo" scores, a system commonly used in chess and sports to rank competitors. Gemma 3 scored a 1338, just shy of R1's 1363, which means R1 technically outperforms Gemma 3. However, Google estimates that it would take 32 of Nvidia's H100 GPU chips to reach R1's score, while Gemma 3 achieves its results with only one H100 GPU. Google touts this balance of compute and Elo score as the "sweet spot."
In a blog post, Google describes Gemma 3 as "the most capable model you can run on a single GPU or TPU," referring to its own custom AI chip, the "tensor processing unit." The company claims that Gemma 3 "delivers state-of-the-art performance for its size," outshining models like Llama-405B, DeepSeek-V3, and o3-mini in human preference evaluations on LMArena's leaderboard. This performance makes it easier to create engaging user experiences on a single GPU or TPU host.
Google
Google's model also surpasses Meta's Llama 3 in Elo score, which Google estimates would require 16 GPUs. It's worth noting that these figures for competing models are Google's estimates; DeepSeek AI has only disclosed using 1,814 of Nvidia's less-powerful H800 GPUs for R1.
More in-depth information can be found in a developer blog post on HuggingFace, where the Gemma 3 repository is available. Designed for on-device use rather than data centers, Gemma 3 has a significantly smaller number of parameters compared to R1 and other open-source models. With parameter counts ranging from 1 billion to 27 billion, Gemma 3 is quite modest by current standards, while R1 boasts a hefty 671 billion parameters, though it can selectively use just 37 billion.
The key to Gemma 3's efficiency is a widely used AI technique called distillation, where trained model weights from a larger model are transferred to a smaller one, enhancing its capabilities. Additionally, the distilled model undergoes three quality control measures: Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from Machine Feedback (RLMF), and Reinforcement Learning from Execution Feedback (RLEF). These help refine the model's outputs, making them more helpful and improving its math and coding abilities.
Google's developer blog details these approaches, and another post discusses optimization techniques for the smallest 1 billion parameter model, aimed at mobile devices. These include quantization, updating key-value cache layouts, improving variable loading times, and GPU weight sharing.
Google compares Gemma 3 not only on Elo scores but also against its predecessor, Gemma 2, and its closed-source Gemini models on various benchmarks like LiveCodeBench. While Gemma 3 generally falls short of Gemini 1.5 and Gemini 2.0 in accuracy, Google notes that it "shows competitive performance compared to closed Gemini models," despite having fewer parameters.
Google
A significant upgrade in Gemma 3 over Gemma 2 is its longer "context window," expanding from 8,000 to 128,000 tokens. This allows the model to process larger texts like entire papers or books. Gemma 3 is also multi-modal, capable of handling both text and image inputs, unlike its predecessor. Additionally, it supports over 140 languages, a vast improvement over Gemma 2's English-only capabilities.
Beyond these main features, there are several other interesting aspects to Gemma 3. One issue with large language models is the potential to memorize parts of their training data, which could lead to privacy breaches. Google's researchers tested Gemma 3 for this and found it memorizes long-form text at a lower rate than its predecessors, suggesting improved privacy protection.
For those interested in the nitty-gritty, the Gemma 3 technical paper provides a thorough breakdown of the model's capabilities and development.
Anthropic's experimental AI Claude completes negotiations and transactions in e-commerce test
As artificial intelligence advances rapidly, Anthropic quietly rolled out an internal experiment called "Project Deal" last Friday, showcasing AI's potential in e-commerce. The experiment had its AI model Claude autonomously handle buying, selling, a
DeepSeek Code poised for launch
As AI technology accelerates, DeepSeek is at a thrilling juncture. The AI company recently revealed it has secured over 70 billion yuan in funding. Leadership has emphasized a commitment to groundbreaking AI research over immediate commercial gains.
Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff?
Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look
¡Estas mejoras en eficiencia son una locura! 🔥 Si Google logra casi el mismo rendimiento con solo una GPU, ¿esto cambiará por completo el acceso a la IA para pequeños desarrolladores? Aun así, me pregunto cómo manejarán temas como el consumo energético real en uso masivo... 😅
Google's Gemma 3 sounds like a game-changer! 98% of DeepSeek's accuracy with just one GPU? That's some serious efficiency. Curious how this'll shake up the AI startup scene. 🚀
Google's Gemma 3 sounds like a game-changer! 98% of DeepSeek's accuracy with just one GPU? That's some serious efficiency. Curious how this stacks up in real-world apps! 😎
Google's Gemma 3 sounds like a game-changer! Achieving 98% of DeepSeek's accuracy with just one GPU is wild. Makes me wonder how this’ll shake up the AI race—more power to the little guys? 🤔





Home






