Google's Gemma 3 Achieves 98% of DeepSeek's Accuracy with Just One GPU
May 1, 2025
RichardJackson
0
The economics of artificial intelligence have become a major focus recently, especially with startup DeepSeek AI showcasing impressive economies of scale in using GPU chips. But Google isn't about to be outdone. On Wednesday, the tech giant unveiled its latest open-source large language model, Gemma 3, which nearly matches the accuracy of DeepSeek's R1 model, yet uses significantly less computing power.
Google measured this performance using "Elo" scores, a system commonly used in chess and sports to rank competitors. Gemma 3 scored a 1338, just shy of R1's 1363, which means R1 technically outperforms Gemma 3. However, Google estimates that it would take 32 of Nvidia's H100 GPU chips to reach R1's score, while Gemma 3 achieves its results with only one H100 GPU. Google touts this balance of compute and Elo score as the "sweet spot."
In a blog post, Google describes Gemma 3 as "the most capable model you can run on a single GPU or TPU," referring to its own custom AI chip, the "tensor processing unit." The company claims that Gemma 3 "delivers state-of-the-art performance for its size," outshining models like Llama-405B, DeepSeek-V3, and o3-mini in human preference evaluations on LMArena's leaderboard. This performance makes it easier to create engaging user experiences on a single GPU or TPU host.
Google
Google's model also surpasses Meta's Llama 3 in Elo score, which Google estimates would require 16 GPUs. It's worth noting that these figures for competing models are Google's estimates; DeepSeek AI has only disclosed using 1,814 of Nvidia's less-powerful H800 GPUs for R1.
More in-depth information can be found in a developer blog post on HuggingFace, where the Gemma 3 repository is available. Designed for on-device use rather than data centers, Gemma 3 has a significantly smaller number of parameters compared to R1 and other open-source models. With parameter counts ranging from 1 billion to 27 billion, Gemma 3 is quite modest by current standards, while R1 boasts a hefty 671 billion parameters, though it can selectively use just 37 billion.
The key to Gemma 3's efficiency is a widely used AI technique called distillation, where trained model weights from a larger model are transferred to a smaller one, enhancing its capabilities. Additionally, the distilled model undergoes three quality control measures: Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from Machine Feedback (RLMF), and Reinforcement Learning from Execution Feedback (RLEF). These help refine the model's outputs, making them more helpful and improving its math and coding abilities.
Google's developer blog details these approaches, and another post discusses optimization techniques for the smallest 1 billion parameter model, aimed at mobile devices. These include quantization, updating key-value cache layouts, improving variable loading times, and GPU weight sharing.
Google compares Gemma 3 not only on Elo scores but also against its predecessor, Gemma 2, and its closed-source Gemini models on various benchmarks like LiveCodeBench. While Gemma 3 generally falls short of Gemini 1.5 and Gemini 2.0 in accuracy, Google notes that it "shows competitive performance compared to closed Gemini models," despite having fewer parameters.
Google
A significant upgrade in Gemma 3 over Gemma 2 is its longer "context window," expanding from 8,000 to 128,000 tokens. This allows the model to process larger texts like entire papers or books. Gemma 3 is also multi-modal, capable of handling both text and image inputs, unlike its predecessor. Additionally, it supports over 140 languages, a vast improvement over Gemma 2's English-only capabilities.
Beyond these main features, there are several other interesting aspects to Gemma 3. One issue with large language models is the potential to memorize parts of their training data, which could lead to privacy breaches. Google's researchers tested Gemma 3 for this and found it memorizes long-form text at a lower rate than its predecessors, suggesting improved privacy protection.
For those interested in the nitty-gritty, the Gemma 3 technical paper provides a thorough breakdown of the model's capabilities and development.
Related article
AI Cold Calling: Streamline Lead Generation and Schedule Appointments
Are you tired of the endless cycle of manual tasks that come with traditional cold calling? Picture a world where an AI solution takes care of everything from finding leads to scheduling appointments, leaving you free to focus on what you do best—closing deals. This article explores the game-changin
Fake reviews are a big problem -- and here's how AI could help fix it
Since its inception in 2007, Trustpilot has become a go-to platform for user reviews, amassing a staggering 238 million reviews across nearly a million businesses and spanning 50 nationalities. While Trustpilot does feature reviews of US businesses, I found that local shops I searched for weren't li
Figma AI Beta Launches, Empowering Design with AI Tools
Figma AI Beta has finally hit the scene, and it's shaking up the world of design with its cutting-edge AI tools. After months of eager waiting, this release is a game-changer for UI designers, offering a blend of efficiency and creative freedom that's hard to beat. Let's dive into what Figma AI Beta
Comments (0)
0/200






The economics of artificial intelligence have become a major focus recently, especially with startup DeepSeek AI showcasing impressive economies of scale in using GPU chips. But Google isn't about to be outdone. On Wednesday, the tech giant unveiled its latest open-source large language model, Gemma 3, which nearly matches the accuracy of DeepSeek's R1 model, yet uses significantly less computing power.
Google measured this performance using "Elo" scores, a system commonly used in chess and sports to rank competitors. Gemma 3 scored a 1338, just shy of R1's 1363, which means R1 technically outperforms Gemma 3. However, Google estimates that it would take 32 of Nvidia's H100 GPU chips to reach R1's score, while Gemma 3 achieves its results with only one H100 GPU. Google touts this balance of compute and Elo score as the "sweet spot."
In a blog post, Google describes Gemma 3 as "the most capable model you can run on a single GPU or TPU," referring to its own custom AI chip, the "tensor processing unit." The company claims that Gemma 3 "delivers state-of-the-art performance for its size," outshining models like Llama-405B, DeepSeek-V3, and o3-mini in human preference evaluations on LMArena's leaderboard. This performance makes it easier to create engaging user experiences on a single GPU or TPU host.
Google
Google's model also surpasses Meta's Llama 3 in Elo score, which Google estimates would require 16 GPUs. It's worth noting that these figures for competing models are Google's estimates; DeepSeek AI has only disclosed using 1,814 of Nvidia's less-powerful H800 GPUs for R1.
More in-depth information can be found in a developer blog post on HuggingFace, where the Gemma 3 repository is available. Designed for on-device use rather than data centers, Gemma 3 has a significantly smaller number of parameters compared to R1 and other open-source models. With parameter counts ranging from 1 billion to 27 billion, Gemma 3 is quite modest by current standards, while R1 boasts a hefty 671 billion parameters, though it can selectively use just 37 billion.
The key to Gemma 3's efficiency is a widely used AI technique called distillation, where trained model weights from a larger model are transferred to a smaller one, enhancing its capabilities. Additionally, the distilled model undergoes three quality control measures: Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from Machine Feedback (RLMF), and Reinforcement Learning from Execution Feedback (RLEF). These help refine the model's outputs, making them more helpful and improving its math and coding abilities.
Google's developer blog details these approaches, and another post discusses optimization techniques for the smallest 1 billion parameter model, aimed at mobile devices. These include quantization, updating key-value cache layouts, improving variable loading times, and GPU weight sharing.
Google compares Gemma 3 not only on Elo scores but also against its predecessor, Gemma 2, and its closed-source Gemini models on various benchmarks like LiveCodeBench. While Gemma 3 generally falls short of Gemini 1.5 and Gemini 2.0 in accuracy, Google notes that it "shows competitive performance compared to closed Gemini models," despite having fewer parameters.
Google
A significant upgrade in Gemma 3 over Gemma 2 is its longer "context window," expanding from 8,000 to 128,000 tokens. This allows the model to process larger texts like entire papers or books. Gemma 3 is also multi-modal, capable of handling both text and image inputs, unlike its predecessor. Additionally, it supports over 140 languages, a vast improvement over Gemma 2's English-only capabilities.
Beyond these main features, there are several other interesting aspects to Gemma 3. One issue with large language models is the potential to memorize parts of their training data, which could lead to privacy breaches. Google's researchers tested Gemma 3 for this and found it memorizes long-form text at a lower rate than its predecessors, suggesting improved privacy protection.
For those interested in the nitty-gritty, the Gemma 3 technical paper provides a thorough breakdown of the model's capabilities and development.












