New AI Model Outperforms LLMs with 100x Speed Boost and Minimal Training Data
Singapore-based AI startup Sapient Intelligence has engineered a novel AI architecture that can rival—and in certain scenarios, dramatically surpass—large language models (LLMs) on intricate reasoning challenges, despite using a much smaller model size and consuming far less data.
This system, named the Hierarchical Reasoning Model (HRM), draws inspiration from the human brain's use of separate mechanisms for slow, methodical planning and fast, intuitive processing. The model delivers remarkable outcomes using only a fraction of the data and memory demanded by modern LLMs. Such efficiency holds significant potential for enterprise AI deployments, where data is often limited and computational power is a constraint.
The limitations of chain-of-thought reasoning
When confronting a complex task, contemporary LLMs mostly depend on chain-of-thought (CoT) prompting, where problems are broken down into intermediate, text-based steps—effectively compelling the model to verbalize its thought process as it progresses toward an answer.
While CoT has enhanced the reasoning capabilities of LLMs, it suffers from inherent weaknesses. In their research paper, the Sapient Intelligence team contends that "CoT is merely a stopgap for reasoning, not a true solution. It depends on rigid, human-determined breakdowns, where one wrong step or a misordered sequence can completely derail the entire process."
This reliance on generating explicit text ties the model's reasoning to the token level, frequently demanding enormous training datasets and resulting in lengthy, sluggish replies. This method also misses the kind of "latent reasoning" that happens internally, without being directly expressed in words.
The researchers observe, "A more streamlined method is essential to reduce these intensive data needs."
A brain-inspired hierarchical framework
To advance beyond CoT, the team investigated "latent reasoning," wherein the model thinks through problems using its internal, abstract representations instead of producing tangible "thinking tokens." This aligns more closely with human cognition; the paper mentions, "the brain maintains extended, logical reasoning chains with notable efficiency in a latent space, without needing to continually convert thoughts back into language."
However, implementing this kind of profound, internal reasoning in AI is difficult. Merely adding layers to a deep learning model frequently triggers the "vanishing gradient" issue, where learning signals fade across layers, hampering effective training. Conversely, recurrent designs that iterate through computations can experience "early convergence," where the model fixes on a solution prematurely without thoroughly examining the problem.

The Hierarchical Reasoning Model (HRM) is inspired by the structure of the brain Source: arXiv In search of a superior method, the Sapient team looked to neuroscience for guidance. "The human brain presents a persuasive model for attaining the computational depth that current artificial systems are missing," the researchers state. "It structures computation hierarchically across cortical areas working at varying timescales, enabling deep, multi-stage analysis."
Influenced by this, they created HRM with two interconnected, recurrent modules: a high-level (H) module for slow, abstract strategizing, and a low-level (L) module for rapid, detailed processing. This arrangement facilitates a mechanism the team labels "hierarchical convergence." Essentially, the fast L-module tackles a segment of the problem, running several cycles until it finds a stable, local answer. Then, the slow H-module incorporates this outcome, refines its overarching plan, and assigns the L-module a new, better-defined sub-problem. This effectively reboots the L-module, stopping it from stagnating (early convergence) and enabling the complete system to carry out an extended series of reasoning stages using a streamlined architecture that avoids vanishing gradients.

HRM (left) smoothly converges on the solution across computation cycles and avoids early convergence (center, RNNs) and vanishing gradients (right, classic deep neural networks) Source: arXiv Per the paper, "This mechanism enables the HRM to perform a succession of separate, steady, nested calculations, where the H-module guides the global problem-solving approach and the L-module carries out the intensive search or refinement for each phase." This nested-loop architecture lets the model conduct deep analysis in its latent space without requiring extended CoT prompts or massive datasets.
A logical concern is whether this "latent reasoning" sacrifices interpretability. Guan Wang, Founder and CEO of Sapient Intelligence, challenges this notion, clarifying that the model's internal operations can be interpreted and illustrated, much like CoT offers insight into a model's cognition. He further notes that CoT itself can be unreliable. "CoT does not accurately represent a model's true internal reasoning," Wang informed VentureBeat, citing research indicating that models can occasionally produce right answers with flawed reasoning, and the opposite. "It is still fundamentally opaque."

Example of how HRM reasons over a maze problem across different compute cycles Source: arXiv HRM at work
To evaluate their model, the researchers compared HRM against benchmarks demanding intensive search and backtracking, like the Abstraction and Reasoning Corpus (ARC-AGI), highly challenging Sudoku puzzles, and intricate maze navigation tasks.
The findings reveal that HRM learns to solve problems that are unsolvable for even sophisticated LLMs. For example, on the "Sudoku-Extreme" and "Maze-Hard" tests, top-tier CoT models entirely failed, recording 0% accuracy. Meanwhile, HRM reached near-flawless accuracy after training on merely 1,000 examples per task.
On the ARC-AGI benchmark, a measure of abstract reasoning and generalization, the 27M-parameter HRM attained 40.3%. This beats prominent CoT-based models such as the much larger o3-mini-high (34.5%) and Claude 3.7 Sonnet (21.2%). This achievement, realized without a vast pre-training dataset and with minimal data, underscores the strength and efficiency of its design.

HRM outperforms large models on complex reasoning tasks Source: arXiv While puzzle-solving showcases the model's capability, its practical impact is seen in a different category of challenges. According to Wang, developers should keep using LLMs for language-centric or creative assignments, but for "complex or deterministic tasks," an HRM-style framework provides superior results with reduced hallucinations. He highlights "sequential problems needing intricate decision-making or long-range planning," particularly in latency-critical areas like embodied AI and robotics, or domains with sparse data, such as scientific research.
In these situations, HRM doesn't just find solutions; it learns to improve its problem-solving. "In our master-level Sudoku tests... HRM gradually requires fewer steps as training continues—similar to a beginner evolving into a specialist," Wang elaborated.
For businesses, this is where the architecture's efficiency impacts profitability. Rather than the sequential, token-by-token production of CoT, HRM's parallel computation enables what Wang approximates as a "100x acceleration in task completion speed." This results in reduced inference latency and the capacity to operate advanced reasoning on edge devices.
The financial benefits are also considerable. "Specialized reasoning engines like HRM present a more viable option for particular complex reasoning duties compared to large, expensive, and high-latency API-driven models," Wang stated. To illustrate the efficiency, he mentioned that training the model for professional Sudoku requires about two GPU hours, and for the demanding ARC-AGI benchmark, between 50 and 200 GPU hours—a minimal share of the resources required for enormous foundation models. This creates an opportunity to address specialized business issues, from logistics planning to complicated system troubleshooting, in contexts where both data and funding are restricted.
Moving forward, Sapient Intelligence is already progressing to transform HRM from a niche problem-solving tool into a broader, general-purpose reasoning component. "We are actively building brain-inspired models based on HRM," Wang said, pointing to encouraging early outcomes in healthcare, climate prediction, and robotics. He hinted that these future models will be substantially different from current text-based systems, particularly through the integration of self-correcting functions.
The research implies that for a set of problems that have confounded today's AI leaders, the way forward might not be larger models, but more intelligent, better-organized frameworks modeled on the most advanced reasoning system: the human brain.
Related article
Bain forecasts US$100 billion SaaS market in agentic AI automation
Bain & Company has estimated a $100 billion market in the U.S. for SaaS companies leveraging agentic AI. The firm said this market stems from automating coordination tasks within enterprise systems.This estimate comes from the second installment in B
Kakao Mobility outlines Level 4 autonomous driving roadmap for physical AI
Kakao Mobility is planning to develop Level 4 autonomous driving technologies internally as part of its physical AI strategy.
At the 2026 World IT Show conference in Seoul's COEX, Kim Jin-kyu — vice president and head of Kakao Mobility's Physical AI
Physical AI edges closer to factory floors as humanoid robots undergo trials
Humanoid, a British technology company, is set to deploy humanoid robots at factories run by German industrial supplier Schaeffler, according to Reuters.
According to a Humanoid spokesperson, the agreement is expected to bring between 1,000 and 2,000
Related Special Topic Recommendations
Comments (0)
0/500
Singapore-based AI startup Sapient Intelligence has engineered a novel AI architecture that can rival—and in certain scenarios, dramatically surpass—large language models (LLMs) on intricate reasoning challenges, despite using a much smaller model size and consuming far less data.
This system, named the Hierarchical Reasoning Model (HRM), draws inspiration from the human brain's use of separate mechanisms for slow, methodical planning and fast, intuitive processing. The model delivers remarkable outcomes using only a fraction of the data and memory demanded by modern LLMs. Such efficiency holds significant potential for enterprise AI deployments, where data is often limited and computational power is a constraint.
The limitations of chain-of-thought reasoning
When confronting a complex task, contemporary LLMs mostly depend on chain-of-thought (CoT) prompting, where problems are broken down into intermediate, text-based steps—effectively compelling the model to verbalize its thought process as it progresses toward an answer.
While CoT has enhanced the reasoning capabilities of LLMs, it suffers from inherent weaknesses. In their research paper, the Sapient Intelligence team contends that "CoT is merely a stopgap for reasoning, not a true solution. It depends on rigid, human-determined breakdowns, where one wrong step or a misordered sequence can completely derail the entire process."
This reliance on generating explicit text ties the model's reasoning to the token level, frequently demanding enormous training datasets and resulting in lengthy, sluggish replies. This method also misses the kind of "latent reasoning" that happens internally, without being directly expressed in words.
The researchers observe, "A more streamlined method is essential to reduce these intensive data needs."
A brain-inspired hierarchical framework
To advance beyond CoT, the team investigated "latent reasoning," wherein the model thinks through problems using its internal, abstract representations instead of producing tangible "thinking tokens." This aligns more closely with human cognition; the paper mentions, "the brain maintains extended, logical reasoning chains with notable efficiency in a latent space, without needing to continually convert thoughts back into language."
However, implementing this kind of profound, internal reasoning in AI is difficult. Merely adding layers to a deep learning model frequently triggers the "vanishing gradient" issue, where learning signals fade across layers, hampering effective training. Conversely, recurrent designs that iterate through computations can experience "early convergence," where the model fixes on a solution prematurely without thoroughly examining the problem.

In search of a superior method, the Sapient team looked to neuroscience for guidance. "The human brain presents a persuasive model for attaining the computational depth that current artificial systems are missing," the researchers state. "It structures computation hierarchically across cortical areas working at varying timescales, enabling deep, multi-stage analysis."
Influenced by this, they created HRM with two interconnected, recurrent modules: a high-level (H) module for slow, abstract strategizing, and a low-level (L) module for rapid, detailed processing. This arrangement facilitates a mechanism the team labels "hierarchical convergence." Essentially, the fast L-module tackles a segment of the problem, running several cycles until it finds a stable, local answer. Then, the slow H-module incorporates this outcome, refines its overarching plan, and assigns the L-module a new, better-defined sub-problem. This effectively reboots the L-module, stopping it from stagnating (early convergence) and enabling the complete system to carry out an extended series of reasoning stages using a streamlined architecture that avoids vanishing gradients.

Per the paper, "This mechanism enables the HRM to perform a succession of separate, steady, nested calculations, where the H-module guides the global problem-solving approach and the L-module carries out the intensive search or refinement for each phase." This nested-loop architecture lets the model conduct deep analysis in its latent space without requiring extended CoT prompts or massive datasets.
A logical concern is whether this "latent reasoning" sacrifices interpretability. Guan Wang, Founder and CEO of Sapient Intelligence, challenges this notion, clarifying that the model's internal operations can be interpreted and illustrated, much like CoT offers insight into a model's cognition. He further notes that CoT itself can be unreliable. "CoT does not accurately represent a model's true internal reasoning," Wang informed VentureBeat, citing research indicating that models can occasionally produce right answers with flawed reasoning, and the opposite. "It is still fundamentally opaque."

HRM at work
To evaluate their model, the researchers compared HRM against benchmarks demanding intensive search and backtracking, like the Abstraction and Reasoning Corpus (ARC-AGI), highly challenging Sudoku puzzles, and intricate maze navigation tasks.
The findings reveal that HRM learns to solve problems that are unsolvable for even sophisticated LLMs. For example, on the "Sudoku-Extreme" and "Maze-Hard" tests, top-tier CoT models entirely failed, recording 0% accuracy. Meanwhile, HRM reached near-flawless accuracy after training on merely 1,000 examples per task.
On the ARC-AGI benchmark, a measure of abstract reasoning and generalization, the 27M-parameter HRM attained 40.3%. This beats prominent CoT-based models such as the much larger o3-mini-high (34.5%) and Claude 3.7 Sonnet (21.2%). This achievement, realized without a vast pre-training dataset and with minimal data, underscores the strength and efficiency of its design.

While puzzle-solving showcases the model's capability, its practical impact is seen in a different category of challenges. According to Wang, developers should keep using LLMs for language-centric or creative assignments, but for "complex or deterministic tasks," an HRM-style framework provides superior results with reduced hallucinations. He highlights "sequential problems needing intricate decision-making or long-range planning," particularly in latency-critical areas like embodied AI and robotics, or domains with sparse data, such as scientific research.
In these situations, HRM doesn't just find solutions; it learns to improve its problem-solving. "In our master-level Sudoku tests... HRM gradually requires fewer steps as training continues—similar to a beginner evolving into a specialist," Wang elaborated.
For businesses, this is where the architecture's efficiency impacts profitability. Rather than the sequential, token-by-token production of CoT, HRM's parallel computation enables what Wang approximates as a "100x acceleration in task completion speed." This results in reduced inference latency and the capacity to operate advanced reasoning on edge devices.
The financial benefits are also considerable. "Specialized reasoning engines like HRM present a more viable option for particular complex reasoning duties compared to large, expensive, and high-latency API-driven models," Wang stated. To illustrate the efficiency, he mentioned that training the model for professional Sudoku requires about two GPU hours, and for the demanding ARC-AGI benchmark, between 50 and 200 GPU hours—a minimal share of the resources required for enormous foundation models. This creates an opportunity to address specialized business issues, from logistics planning to complicated system troubleshooting, in contexts where both data and funding are restricted.
Moving forward, Sapient Intelligence is already progressing to transform HRM from a niche problem-solving tool into a broader, general-purpose reasoning component. "We are actively building brain-inspired models based on HRM," Wang said, pointing to encouraging early outcomes in healthcare, climate prediction, and robotics. He hinted that these future models will be substantially different from current text-based systems, particularly through the integration of self-correcting functions.
The research implies that for a set of problems that have confounded today's AI leaders, the way forward might not be larger models, but more intelligent, better-organized frameworks modeled on the most advanced reasoning system: the human brain.





Home






