Nvidia Unveils Open-Source AI Model Nemotron-Nano-9B-v2 with Toggleable Reasoning
Small language models are making waves. Following the debut of MIT spinoff Liquid AI's smartwatch-sized vision model and Google's smartphone-ready offering, Nvidia is now entering the scene with its own slimmed-down contender: Nemotron-Nano-9B-V2. This new model leads its class on key benchmarks and introduces a unique feature that allows users to enable or disable AI "reasoning"—essentially a self-checking process before delivering a final answer.
Although 9 billion parameters exceed the scale of the multimillion-parameter micro-models we've recently reported on, Nvidia highlights this as a significant optimization from its original 12 billion parameters. The revised size is specifically engineered to run on a single, widely available Nvidia A10 GPU.
As Oleksii Kuchiaev, Nvidia's Director of AI Model Post-Training, explained in response to a question on X: "We pruned the 12B model to 9B to fit perfectly on the A10, a popular deployment GPU. It's also a hybrid architecture, which enables it to handle larger batch sizes and achieve speeds up to six times faster than traditional transformer models of a similar size."
For perspective, many leading large language models operate in the 70+ billion parameter range. Parameters are the internal settings that define a model's behavior, where higher counts typically indicate greater capability but also demand significantly more computational power.
The model supports multiple languages, including English, German, Spanish, French, Italian, and Japanese. Extended capabilities also cover Korean, Portuguese, Russian, and Chinese. It is well-suited for tasks ranging from following instructions to generating code.
Nemotron-Nano-9B-V2 and its pre-training datasets are currently available on Hugging Face and through Nvidia's own model catalog.
A Fusion of Transformer and Mamba Architectures
The model is built upon Nemotron-H, a family of hybrid Mamba-Transformer models that serve as the foundation for Nvidia's latest AI offerings.
While dominant LLMs typically rely solely on Transformer architecture and its attention mechanisms, these can become prohibitively expensive in terms of memory and computation as the length of input sequences increases.
Nemotron-H models and others utilizing the Mamba architecture—pioneered by researchers at Carnegie Mellon University and Princeton—incorporate selective state space models (SSMs). These SSMs efficiently manage extremely long sequences by maintaining an internal state.
These layers scale linearly with sequence length, allowing them to process much longer contexts than standard self-attention without the same computational overhead.
A hybrid Mamba-Transformer design reduces costs by replacing most attention layers with linear-time state space layers. This can yield up to 2–3 times higher throughput on long-context tasks while maintaining comparable accuracy.
Nvidia is not alone in this approach; other AI research labs, such as AI2, have also released models based on the Mamba architecture.
Toggle Reasoning On or Off with Simple Commands
Nemotron-Nano-9B-v2 is designed as a unified, text-only model capable of both conversational interaction and complex reasoning, trained entirely from scratch.
By default, the system generates a detailed reasoning trace before producing its final answer. Users can control this behavior using simple command tokens like /think or /no_think.
The model also introduces runtime "thinking budget" management. This allows developers to set a maximum limit on the number of tokens the model can use for internal reasoning before it must deliver a response.
This mechanism is intended to balance accuracy with response latency, which is crucial for applications like customer support chatbots or autonomous agents.
Benchmarks Show Strong Performance
Evaluation results demonstrate competitive accuracy against other leading small-scale open models. When tested with reasoning enabled using the NeMo-Skills suite, Nemotron-Nano-9B-v2 achieved scores of 72.1% on AIME25, 97.8% on MATH500, 64.0% on GPQA, and 71.1% on LiveCodeBench.
Scores on instruction-following and long-context benchmarks are also strong: 90.3% on IFEval and 78.9% on the RULER 128K test, with additional measurable gains on BFCL v3 and the HLE benchmark.

Across multiple evaluations, Nano-9B-v2 consistently shows higher accuracy than a common point of comparison, the Qwen3-8B model.

Nvidia presents these results with accuracy-versus-budget curves that illustrate how performance improves as the token allowance for reasoning increases. The company notes that careful budget control enables developers to optimize for both quality and speed in production environments.
Trained on Synthetic Datasets
Both the Nano model and the broader Nemotron-H family are trained on a mixture of carefully curated web data, proprietary sources, and synthetic training data.
The training corpora include general text, code, mathematics, scientific literature, legal and financial documents, as well as alignment-focused question-answering datasets.
Nvidia confirms the use of synthetic reasoning traces generated by other large models to enhance performance on complex benchmark tasks.
Licensing and Commercial Use
The Nano-9B-v2 model is released under the Nvidia Open Model License Agreement, which was last updated in June 2025.
This license is designed to be permissive and enterprise-friendly. Nvidia explicitly states that the models are commercially usable out of the box and that developers are free to create and distribute derivative works.
Critically, Nvidia does not claim ownership of any outputs generated by the model, leaving all rights and responsibilities with the developer or organization using it.
For enterprise developers, this means the model can be deployed into production immediately without negotiating a separate commercial license or paying fees based on usage volume, revenue, or user count. Unlike some tiered open licenses from other providers, there are no clauses that trigger a paid license requirement once a company reaches a certain scale.
That said, the agreement does include several important conditions for enterprises to follow:
- Guardrails: Users cannot bypass or disable built-in safety mechanisms (referred to as "guardrails") without implementing suitable, equivalent replacements for their specific deployment.
- Redistribution: Any redistribution of the model or its derivatives must include the full text of the Nvidia Open Model License and proper attribution ("Licensed by Nvidia Corporation under the Nvidia Open Model License").
- Compliance: Users must adhere to all applicable trade regulations and restrictions, such as U.S. export control laws.
- Trustworthy AI Terms: Usage must align with Nvidia's Trustworthy AI guidelines, which cover principles for responsible deployment and ethical considerations.
- Litigation Clause: The license automatically terminates if a user initiates copyright or patent litigation against another party, alleging infringement related to the model.
These conditions focus on ensuring legal compliance and responsible use, rather than restricting commercial scale. Enterprises do not need to seek additional permission or pay royalties to Nvidia for building products, monetizing services, or scaling their user base. Instead, they must ensure their deployment practices respect safety, provide proper attribution, and meet all compliance obligations.
Market Positioning
With Nemotron-Nano-9B-v2, Nvidia targets developers who need to balance reasoning capability with deployment efficiency on a smaller scale.
The runtime budget control and reasoning-toggle features are designed to give system builders greater flexibility in managing the trade-off between accuracy and response speed.
Their availability on Hugging Face and Nvidia's model catalog signals an intent for broad accessibility, encouraging experimentation and integration.
Nvidia's release of Nemotron-Nano-9B-v2 underscores the company's continued focus on efficiency and controllable reasoning in language models.
By merging hybrid architectures with advanced compression and training techniques, Nvidia aims to provide developers with tools that maintain high accuracy while reducing both operational costs and latency.
Related article
Nvidia's OpenClaw variant may solve its biggest challenge: security
Nvidia CEO Jensen Huang believes every company needs an OpenClaw strategy — and Nvidia is ready to supply it.During his GTC keynote on Monday, Huang announced that Nvidia has built NemoClaw, an enterprise-grade platform derived from the viral, local
Pentagon signs deals with Nvidia, Microsoft, AWS to deploy AI on classified networks
After previously reaching agreements with Google, SpaceX, and OpenAI, the U.S. Defense Department announced Friday that it has now signed deals with Nvidia, Microsoft, Amazon Web Services, and Reflection AI to deploy their AI technologies and models
Nvidia GTC Unveils NemoClaw, Robot Olaf, and $1 Trillion Bet
Loading the player…CEO Jensen Huang took the stage at Nvidia's GTC conference this week in his signature leather jacket to deliver a two-and-a-half-hour keynote, projecting $1 trillion in AI chip sales through 2027, declaring that every company needs
Related Special Topic Recommendations
Comments (1)
0/500
Small language models are making waves. Following the debut of MIT spinoff Liquid AI's smartwatch-sized vision model and Google's smartphone-ready offering, Nvidia is now entering the scene with its own slimmed-down contender: Nemotron-Nano-9B-V2. This new model leads its class on key benchmarks and introduces a unique feature that allows users to enable or disable AI "reasoning"—essentially a self-checking process before delivering a final answer.
Although 9 billion parameters exceed the scale of the multimillion-parameter micro-models we've recently reported on, Nvidia highlights this as a significant optimization from its original 12 billion parameters. The revised size is specifically engineered to run on a single, widely available Nvidia A10 GPU.
As Oleksii Kuchiaev, Nvidia's Director of AI Model Post-Training, explained in response to a question on X: "We pruned the 12B model to 9B to fit perfectly on the A10, a popular deployment GPU. It's also a hybrid architecture, which enables it to handle larger batch sizes and achieve speeds up to six times faster than traditional transformer models of a similar size."
For perspective, many leading large language models operate in the 70+ billion parameter range. Parameters are the internal settings that define a model's behavior, where higher counts typically indicate greater capability but also demand significantly more computational power.
The model supports multiple languages, including English, German, Spanish, French, Italian, and Japanese. Extended capabilities also cover Korean, Portuguese, Russian, and Chinese. It is well-suited for tasks ranging from following instructions to generating code.
Nemotron-Nano-9B-V2 and its pre-training datasets are currently available on Hugging Face and through Nvidia's own model catalog.
A Fusion of Transformer and Mamba Architectures
The model is built upon Nemotron-H, a family of hybrid Mamba-Transformer models that serve as the foundation for Nvidia's latest AI offerings.
While dominant LLMs typically rely solely on Transformer architecture and its attention mechanisms, these can become prohibitively expensive in terms of memory and computation as the length of input sequences increases.
Nemotron-H models and others utilizing the Mamba architecture—pioneered by researchers at Carnegie Mellon University and Princeton—incorporate selective state space models (SSMs). These SSMs efficiently manage extremely long sequences by maintaining an internal state.
These layers scale linearly with sequence length, allowing them to process much longer contexts than standard self-attention without the same computational overhead.
A hybrid Mamba-Transformer design reduces costs by replacing most attention layers with linear-time state space layers. This can yield up to 2–3 times higher throughput on long-context tasks while maintaining comparable accuracy.
Nvidia is not alone in this approach; other AI research labs, such as AI2, have also released models based on the Mamba architecture.
Toggle Reasoning On or Off with Simple Commands
Nemotron-Nano-9B-v2 is designed as a unified, text-only model capable of both conversational interaction and complex reasoning, trained entirely from scratch.
By default, the system generates a detailed reasoning trace before producing its final answer. Users can control this behavior using simple command tokens like /think or /no_think.
The model also introduces runtime "thinking budget" management. This allows developers to set a maximum limit on the number of tokens the model can use for internal reasoning before it must deliver a response.
This mechanism is intended to balance accuracy with response latency, which is crucial for applications like customer support chatbots or autonomous agents.
Benchmarks Show Strong Performance
Evaluation results demonstrate competitive accuracy against other leading small-scale open models. When tested with reasoning enabled using the NeMo-Skills suite, Nemotron-Nano-9B-v2 achieved scores of 72.1% on AIME25, 97.8% on MATH500, 64.0% on GPQA, and 71.1% on LiveCodeBench.
Scores on instruction-following and long-context benchmarks are also strong: 90.3% on IFEval and 78.9% on the RULER 128K test, with additional measurable gains on BFCL v3 and the HLE benchmark.

Across multiple evaluations, Nano-9B-v2 consistently shows higher accuracy than a common point of comparison, the Qwen3-8B model.

Nvidia presents these results with accuracy-versus-budget curves that illustrate how performance improves as the token allowance for reasoning increases. The company notes that careful budget control enables developers to optimize for both quality and speed in production environments.
Trained on Synthetic Datasets
Both the Nano model and the broader Nemotron-H family are trained on a mixture of carefully curated web data, proprietary sources, and synthetic training data.
The training corpora include general text, code, mathematics, scientific literature, legal and financial documents, as well as alignment-focused question-answering datasets.
Nvidia confirms the use of synthetic reasoning traces generated by other large models to enhance performance on complex benchmark tasks.
Licensing and Commercial Use
The Nano-9B-v2 model is released under the Nvidia Open Model License Agreement, which was last updated in June 2025.
This license is designed to be permissive and enterprise-friendly. Nvidia explicitly states that the models are commercially usable out of the box and that developers are free to create and distribute derivative works.
Critically, Nvidia does not claim ownership of any outputs generated by the model, leaving all rights and responsibilities with the developer or organization using it.
For enterprise developers, this means the model can be deployed into production immediately without negotiating a separate commercial license or paying fees based on usage volume, revenue, or user count. Unlike some tiered open licenses from other providers, there are no clauses that trigger a paid license requirement once a company reaches a certain scale.
That said, the agreement does include several important conditions for enterprises to follow:
- Guardrails: Users cannot bypass or disable built-in safety mechanisms (referred to as "guardrails") without implementing suitable, equivalent replacements for their specific deployment.
- Redistribution: Any redistribution of the model or its derivatives must include the full text of the Nvidia Open Model License and proper attribution ("Licensed by Nvidia Corporation under the Nvidia Open Model License").
- Compliance: Users must adhere to all applicable trade regulations and restrictions, such as U.S. export control laws.
- Trustworthy AI Terms: Usage must align with Nvidia's Trustworthy AI guidelines, which cover principles for responsible deployment and ethical considerations.
- Litigation Clause: The license automatically terminates if a user initiates copyright or patent litigation against another party, alleging infringement related to the model.
These conditions focus on ensuring legal compliance and responsible use, rather than restricting commercial scale. Enterprises do not need to seek additional permission or pay royalties to Nvidia for building products, monetizing services, or scaling their user base. Instead, they must ensure their deployment practices respect safety, provide proper attribution, and meet all compliance obligations.
Market Positioning
With Nemotron-Nano-9B-v2, Nvidia targets developers who need to balance reasoning capability with deployment efficiency on a smaller scale.
The runtime budget control and reasoning-toggle features are designed to give system builders greater flexibility in managing the trade-off between accuracy and response speed.
Their availability on Hugging Face and Nvidia's model catalog signals an intent for broad accessibility, encouraging experimentation and integration.
Nvidia's release of Nemotron-Nano-9B-v2 underscores the company's continued focus on efficiency and controllable reasoning in language models.
By merging hybrid architectures with advanced compression and training techniques, Nvidia aims to provide developers with tools that maintain high accuracy while reducing both operational costs and latency.
Nvidia's OpenClaw variant may solve its biggest challenge: security
Nvidia CEO Jensen Huang believes every company needs an OpenClaw strategy — and Nvidia is ready to supply it.During his GTC keynote on Monday, Huang announced that Nvidia has built NemoClaw, an enterprise-grade platform derived from the viral, local
Pentagon signs deals with Nvidia, Microsoft, AWS to deploy AI on classified networks
After previously reaching agreements with Google, SpaceX, and OpenAI, the U.S. Defense Department announced Friday that it has now signed deals with Nvidia, Microsoft, Amazon Web Services, and Reflection AI to deploy their AI technologies and models
Nvidia GTC Unveils NemoClaw, Robot Olaf, and $1 Trillion Bet
Loading the player…CEO Jensen Huang took the stage at Nvidia's GTC conference this week in his signature leather jacket to deliver a two-and-a-half-hour keynote, projecting $1 trillion in AI chip sales through 2027, declaring that every company needs





Home






