Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size

Home

News

April 13, 2025

LarryMartinez

101

# Nvidia # meta # nemotron # nlp

Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size

While Meta grapples with the scrutiny surrounding its latest Llama 4 model family, Nvidia has quietly rolled out a new, fully open-source large language model (LLM) based on Meta's earlier Llama-3.1-405B-Instruct model. Named Llama-3.1-Nemotron-Ultra-253B-v1, this model boasts 253 billion parameters and is engineered to excel in advanced reasoning, instruction following, and AI assistant workflows. Nvidia first hinted at this model during its annual GPU Technology Conference (GTC) in March.

The release underscores Nvidia's ongoing commitment to enhancing performance through architectural innovation and meticulous post-training processes. Announced on April 7, 2025, the model's code, weights, and post-training data are now freely accessible on Hugging Face. It's designed to seamlessly switch between complex reasoning tasks and simpler outputs based on system prompts, offering developers flexibility in their applications.

Designed for Efficient Inference

Building on Nvidia's prior efforts in optimizing LLMs for inference, the Llama-3.1-Nemotron-Ultra-253B incorporates a Neural Architecture Search (NAS) process to refine its architecture. This includes innovative features like skipped attention layers, fused feedforward networks (FFNs), and variable FFN compression ratios. These modifications reduce the model's memory usage and computational requirements, making it deployable on a single 8x H100 GPU node without compromising output quality.

Nvidia claims this model delivers robust performance while being cost-effective for data center deployments. It's compatible with Nvidia's B100 and Hopper microarchitectures, and has been tested in both BF16 and FP8 precision modes.

Post-Training for Reasoning and Alignment

The model underwent a comprehensive post-training regimen. This included supervised fine-tuning across various domains such as math, code generation, chat, and tool use, followed by reinforcement learning with Group Relative Policy Optimization (GRPO) to enhance its instruction-following and reasoning capabilities.

Further refinement came through a knowledge distillation phase over 65 billion tokens, and continual pretraining on an additional 88 billion tokens. The training data sources included FineWeb, Buzz-V1.2, and Dolma, with post-training prompts and responses drawn from both public corpora and synthetic generation methods. This approach helped the model differentiate between its reasoning modes.

Improved Performance Across Numerous Domains and Benchmarks

When enabled for reasoning, the model showed significant improvements on various benchmarks. For instance, on the MATH500 benchmark, its performance surged from 80.40% in standard mode to 97.00% with reasoning enabled. Similarly, AIME25 scores jumped from 16.67% to 72.50%, and LiveCodeBench results more than doubled, from 29.03% to 66.31%.

The model also excelled in tool-based tasks and general question answering (GPQA), scoring 76.01% in reasoning mode compared to 56.60% without. These benchmarks were conducted with a maximum sequence length of 32,000 tokens, and each test was repeated up to 16 times for accuracy.

Compared to the state-of-the-art MoE model DeepSeek R1, which has 671 billion parameters, Nvidia's model holds its own despite having fewer parameters. It outperforms DeepSeek R1 in tasks like GPQA (76.01 vs. 71.5), IFEval instruction following (89.45 vs. 83.3), and LiveCodeBench coding tasks (66.31 vs. 65.9). However, DeepSeek R1 edges out slightly in certain math evaluations, particularly AIME25 (79.8 vs. 72.50) and MATH500 (97.3 vs. 97.00).

These results indicate that Nvidia's dense model can match or exceed MoE models in reasoning and general instruction alignment, though it lags slightly in math-intensive categories.

Usage and Integration

The model integrates seamlessly with the Hugging Face Transformers library (version 4.48.3 recommended) and supports sequences up to 128,000 tokens. Developers can toggle reasoning behavior using system prompts and choose decoding strategies based on task needs. For reasoning tasks, Nvidia suggests using temperature sampling (0.6) with a top-p value of 0.95, while greedy decoding is recommended for deterministic outputs.

Llama-3.1-Nemotron-Ultra-253B supports multilingual applications, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. It's well-suited for various LLM use cases such as chatbot development, AI agent workflows, retrieval-augmented generation (RAG), and code generation.

Licensed for Commercial Use

Released under the Nvidia Open Model License and governed by the Llama 3.1 Community License Agreement, the model is ready for commercial applications. Nvidia stresses the importance of responsible AI development, urging teams to assess the model's alignment, safety, and bias for their specific use cases.

Oleksii Kuchaiev, Nvidia's Director of AI Model Post-Training, shared the excitement about this open release on X, highlighting its dense 253B design with toggleable reasoning capabilities, and the inclusion of open weights and data.

Salesforce Unveils AI Digital Teammates in Slack to Rival Microsoft Copilot Salesforce launched a new workplace AI strategy, introducing specialized “digital teammates” integrated into Slack conversations, the company revealed on Monday.The new tool, Agentforce in Slack, enab

Oracle's $40B Nvidia Chip Investment Boosts Texas AI Data Center Oracle is set to invest approximately $40 billion in Nvidia chips to power a major new data center in Texas, developed by OpenAI, as reported by the Financial Times. This deal, one of the largest chip

Meta AI App to Introduce Premium Tier and Ads Meta's AI app may soon feature a paid subscription, mirroring offerings from competitors like OpenAI, Google, and Microsoft. During a Q1 2025 earnings call, Meta CEO Mark Zuckerberg outlined plans for

Comments (52)

0/200

Submit

DouglasMartínez

August 18, 2025 at 11:01:00 AM EDT

Nvidia's new model sounds like a beast! Half the size of DeepSeek R1 but still outperforms it? That's wild efficiency. Can't wait to see how devs play with this open-source gem! 🚀

StephenRoberts

July 31, 2025 at 10:48:18 PM EDT

Nvidia's new model sounds like a beast! Half the size of DeepSeek R1 but still outshines it? That's some serious tech flex. Can't wait to see how devs play with this open-source gem! 😎

AnthonyRoberts

April 24, 2025 at 4:35:07 AM EDT

Nvidia's new Llama-3.1 Nemotron Ultra is a beast! It's amazing how it outperforms DeepSeek R1 with half the size. I've been using it for my projects and the results are incredible. Just wish it was a bit faster, but overall, a solid choice! 🚀

JohnRoberts

April 22, 2025 at 8:03:45 PM EDT

¡El Llama-3.1 Nemotron Ultra de Nvidia es impresionante! Supera al DeepSeek R1 con la mitad del tamaño, lo cual es alucinante. Lo he estado usando en mis proyectos y es súper eficiente. Lo único es que puede ser un poco complicado de configurar. Aún así, una excelente opción para quien busque un LLM potente. 🚀

BillyAdams

April 22, 2025 at 7:54:38 PM EDT

O novo Llama-3.1 Nemotron Ultra da Nvidia é uma fera! É incrível como supera o DeepSeek R1 com metade do tamanho. Tenho usado para meus projetos e os resultados são incríveis. Só desejo que fosse um pouco mais rápido, mas no geral, uma escolha sólida! 🚀

ChristopherTaylor

April 22, 2025 at 5:27:44 PM EDT

¡El nuevo Llama-3.1 Nemotron Ultra de Nvidia es una maravilla! Me sorprende cómo supera a DeepSeek R1 con la mitad del tamaño. Lo he usado para mis proyectos y los resultados son increíbles. Solo desearía que fuera un poco más rápido, pero en general, una opción sólida. ¡🚀