option
Home
News
Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size

Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size

April 13, 2025
164

Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size

While Meta grapples with the scrutiny surrounding its latest Llama 4 model family, Nvidia has quietly rolled out a new, fully open-source large language model (LLM) based on Meta's earlier Llama-3.1-405B-Instruct model. Named Llama-3.1-Nemotron-Ultra-253B-v1, this model boasts 253 billion parameters and is engineered to excel in advanced reasoning, instruction following, and AI assistant workflows. Nvidia first hinted at this model during its annual GPU Technology Conference (GTC) in March.

The release underscores Nvidia's ongoing commitment to enhancing performance through architectural innovation and meticulous post-training processes. Announced on April 7, 2025, the model's code, weights, and post-training data are now freely accessible on Hugging Face. It's designed to seamlessly switch between complex reasoning tasks and simpler outputs based on system prompts, offering developers flexibility in their applications.

Designed for Efficient Inference

Building on Nvidia's prior efforts in optimizing LLMs for inference, the Llama-3.1-Nemotron-Ultra-253B incorporates a Neural Architecture Search (NAS) process to refine its architecture. This includes innovative features like skipped attention layers, fused feedforward networks (FFNs), and variable FFN compression ratios. These modifications reduce the model's memory usage and computational requirements, making it deployable on a single 8x H100 GPU node without compromising output quality.

Nvidia claims this model delivers robust performance while being cost-effective for data center deployments. It's compatible with Nvidia's B100 and Hopper microarchitectures, and has been tested in both BF16 and FP8 precision modes.

Post-Training for Reasoning and Alignment

The model underwent a comprehensive post-training regimen. This included supervised fine-tuning across various domains such as math, code generation, chat, and tool use, followed by reinforcement learning with Group Relative Policy Optimization (GRPO) to enhance its instruction-following and reasoning capabilities.

Further refinement came through a knowledge distillation phase over 65 billion tokens, and continual pretraining on an additional 88 billion tokens. The training data sources included FineWeb, Buzz-V1.2, and Dolma, with post-training prompts and responses drawn from both public corpora and synthetic generation methods. This approach helped the model differentiate between its reasoning modes.

Improved Performance Across Numerous Domains and Benchmarks

When enabled for reasoning, the model showed significant improvements on various benchmarks. For instance, on the MATH500 benchmark, its performance surged from 80.40% in standard mode to 97.00% with reasoning enabled. Similarly, AIME25 scores jumped from 16.67% to 72.50%, and LiveCodeBench results more than doubled, from 29.03% to 66.31%.

The model also excelled in tool-based tasks and general question answering (GPQA), scoring 76.01% in reasoning mode compared to 56.60% without. These benchmarks were conducted with a maximum sequence length of 32,000 tokens, and each test was repeated up to 16 times for accuracy.

Compared to the state-of-the-art MoE model DeepSeek R1, which has 671 billion parameters, Nvidia's model holds its own despite having fewer parameters. It outperforms DeepSeek R1 in tasks like GPQA (76.01 vs. 71.5), IFEval instruction following (89.45 vs. 83.3), and LiveCodeBench coding tasks (66.31 vs. 65.9). However, DeepSeek R1 edges out slightly in certain math evaluations, particularly AIME25 (79.8 vs. 72.50) and MATH500 (97.3 vs. 97.00).

These results indicate that Nvidia's dense model can match or exceed MoE models in reasoning and general instruction alignment, though it lags slightly in math-intensive categories.

Usage and Integration

The model integrates seamlessly with the Hugging Face Transformers library (version 4.48.3 recommended) and supports sequences up to 128,000 tokens. Developers can toggle reasoning behavior using system prompts and choose decoding strategies based on task needs. For reasoning tasks, Nvidia suggests using temperature sampling (0.6) with a top-p value of 0.95, while greedy decoding is recommended for deterministic outputs.

Llama-3.1-Nemotron-Ultra-253B supports multilingual applications, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. It's well-suited for various LLM use cases such as chatbot development, AI agent workflows, retrieval-augmented generation (RAG), and code generation.

Licensed for Commercial Use

Released under the Nvidia Open Model License and governed by the Llama 3.1 Community License Agreement, the model is ready for commercial applications. Nvidia stresses the importance of responsible AI development, urging teams to assess the model's alignment, safety, and bias for their specific use cases.

Oleksii Kuchaiev, Nvidia's Director of AI Model Post-Training, shared the excitement about this open release on X, highlighting its dense 253B design with toggleable reasoning capabilities, and the inclusion of open weights and data.

Related article
ElevenLabs names BlackRock, Jamie Foxx, Eva Longoria as new investors ElevenLabs names BlackRock, Jamie Foxx, Eva Longoria as new investors ElevenLabs, the voice AI company, has disclosed additional investors in its $500 million Series D round, originally announced in February. These include institutional investors like BlackRock, Wellington, D.E. Shaw, and Schroders; corporations such a
Meta AI now responds to buyer messages on Facebook Marketplace Meta AI now responds to buyer messages on Facebook Marketplace Facebook Marketplace introduces new Meta AI features, including automated replies to buyer inquiries, the company announced Thursday. The platform also leverages AI to accelerate item listings, summarize seller profiles, and now lets sellers offer sh
Meta signs deal for millions of Amazon AI CPUs Meta signs deal for millions of Amazon AI CPUs Amazon has secured a significant partnership with Meta, once again relying on its own custom-designed chips. Meta has agreed to deploy millions of AWS Graviton chips to meet its expanding AI demands, Amazon confirmed on Friday.Note that AWS Graviton
Related Special Topic Recommendations
Video creation Best AI Text to Video Platforms for Script Writing and Visual Storytelling
Best AI Text to Video Platforms for Script Writing and Visual Storytelling

2026 Latest Best AI Text to Video Platforms: Top-rated tools for script writing and visual storytelling. Discover powerful, game-changing solutions to transform your text into engaging videos. Compare free vs paid options with our weekly updated rankings and real-world tests. Find your perfect platform to boost creativity and productivity. Explore the curated selection at XIX.AI.

10 tools
xix.ai
chatbot AI Multi-Agent Orchestrators: Design Complex Automated Workflows through Natural Language
AI Multi-Agent Orchestrators: Design Complex Automated Workflows through Natural Language

2026 Latest: Discover the best AI multi-agent orchestrators to design complex automated workflows through natural language. Our curated list features top-rated, powerful platforms for seamless task automation and intelligent process management. Compare free vs paid options with real-world insights. Unlock your AI edge with XIX.AI's expert weekly updated rankings.

10 tools
xix.ai
Image editing Best AI Noise Reduction Software: Remove Grain & Artifacts from Low-Light Night Photography
Best AI Noise Reduction Software: Remove Grain & Artifacts from Low-Light Night Photography

Discover the 2026 best AI noise reduction software for low-light night photography. Our top-rated, curated list compares free vs paid tools, featuring real-world tests and weekly updated rankings. Remove grain & artifacts effortlessly. Unlock your AI edge at XIX.AI.

10 tools
xix.ai
chatbot Best Custom AI Girlfriend Generators: Design Unique Personalities, Hobbies, and Backstories
Best Custom AI Girlfriend Generators: Design Unique Personalities, Hobbies, and Backstories

Discover the 2026 best custom AI girlfriend generators on XIX.AI. Explore our top-rated, curated list for designing unique personalities, hobbies, and deep backstories. Compare free vs paid options with real-world insights. Unlock your perfect creative companion today.

10 tools
xix.ai
Productivity AI Architecture Designers: Build Scalable System Architectures Using Natural Language
AI Architecture Designers: Build Scalable System Architectures Using Natural Language

Discover the 2026 best AI architecture design tools on XIX.AI. Our curated, top-rated list features powerful, game-changing solutions to build scalable system architectures using natural language. Compare free vs paid options with real-world insights. Unlock your AI edge and streamline development today.

10 tools
xix.ai
Comic Creation AI Character Profile Creators: Generate Detailed Backstories & Visual Refs for Manga Leads
AI Character Profile Creators: Generate Detailed Backstories & Visual Refs for Manga Leads

2026 Latest Best AI Character Profile Creators: Discover top-rated tools to generate detailed backstories and visual references for your manga leads. Our curated, weekly-updated list compares free vs paid options based on real-world tests. Find powerful, game-changing solutions to craft compelling characters and streamline your creative workflow. Explore the rankings on XIX.AI and unlock your perfect storytelling ally today.

10 tools
xix.ai
Comments (54)
0/500
JonathanNelson
JonathanNelson December 9, 2025 at 3:30:42 AM EST

Интересно, как Nvidia удалось упаковать все эти параметры в модель размером вдвое меньше. Выходит, вложения в архитектуру дают больше преимуществ, чем просто увеличение данных? Хотя, конечно, с учётом их вычислительных ресурсов не стоит удивляться. Что особенно ценно, так это тот факт, что модель открыта. На этом фоне заявления Meta порой звучат слишком громко и с многочисленными оговорками 🤔 Это может изменить правила игры для независимых исследователей!

CharlesYoung
CharlesYoung November 2, 2025 at 11:30:34 PM EST

¿Nvidia saca otro modelo open-source más potente que DeepSeek R1? 🤔 Me pregunto si esto realmente marcará una diferencia práctica para los desarrolladores o es solo otra carrera por los números en los benchmarks. ¡253 mil millones de parámetros parece excesivo!

DouglasMartínez
DouglasMartínez August 18, 2025 at 11:01:00 AM EDT

Nvidia's new model sounds like a beast! Half the size of DeepSeek R1 but still outperforms it? That's wild efficiency. Can't wait to see how devs play with this open-source gem! 🚀

StephenRoberts
StephenRoberts July 31, 2025 at 10:48:18 PM EDT

Nvidia's new model sounds like a beast! Half the size of DeepSeek R1 but still outshines it? That's some serious tech flex. Can't wait to see how devs play with this open-source gem! 😎

AnthonyRoberts
AnthonyRoberts April 24, 2025 at 4:35:07 AM EDT

Nvidia's new Llama-3.1 Nemotron Ultra is a beast! It's amazing how it outperforms DeepSeek R1 with half the size. I've been using it for my projects and the results are incredible. Just wish it was a bit faster, but overall, a solid choice! 🚀

JohnRoberts
JohnRoberts April 22, 2025 at 8:03:45 PM EDT

¡El Llama-3.1 Nemotron Ultra de Nvidia es impresionante! Supera al DeepSeek R1 con la mitad del tamaño, lo cual es alucinante. Lo he estado usando en mis proyectos y es súper eficiente. Lo único es que puede ser un poco complicado de configurar. Aún así, una excelente opción para quien busque un LLM potente. 🚀

OR