option
Home
News
Sakana AI's TreeQuest Boosts AI Performance with Multi-Model Collaboration

Sakana AI's TreeQuest Boosts AI Performance with Multi-Model Collaboration

August 24, 2025
68

Japanese AI lab Sakana AI has unveiled a technique enabling multiple large language models (LLMs) to work together, forming a highly effective AI team. Named Multi-LLM AB-MCTS, this method allows models to engage in trial-and-error, leveraging their unique strengths to tackle complex tasks beyond the reach of any single model.

For businesses, this approach offers a way to build more powerful AI systems. Instead of relying on one provider or model, companies can dynamically harness the strengths of various frontier models, assigning the best AI for each task segment to achieve optimal outcomes.

Harnessing Collective Intelligence

Frontier AI models are advancing quickly, each with distinct strengths and weaknesses based on their training data and design. One model might shine in coding, another in creative writing. Sakana AI’s team views these differences as assets, not flaws.

“We consider these unique capabilities as valuable tools for building collective intelligence,” the researchers note in their blog. They argue that, like human teams achieving breakthroughs through diversity, AI systems can accomplish more by collaborating. “By combining their strengths, AI systems can solve challenges no single model could overcome.”

Enhancing Performance at Inference

Sakana AI’s algorithm, an “inference-time scaling” technique (also called “test-time scaling”), is gaining traction in AI research. Unlike “training-time scaling,” which focuses on larger models and datasets, inference-time scaling boosts performance by optimizing computational resources post-training.

One method uses reinforcement learning to encourage models to produce detailed chain-of-thought (CoT) sequences, as seen in models like OpenAI o3 and DeepSeek-R1. Another approach, repeated sampling, prompts the model multiple times to generate diverse solutions, akin to brainstorming. Sakana AI’s method refines these concepts.

“Our framework improves on Best-of-N sampling,” said Takuya Akiba, a Sakana AI research scientist and paper co-author, in an interview with VentureBeat. “It enhances reasoning techniques like extended CoT via reinforcement learning. By strategically choosing the search approach and the right LLM, it optimizes performance within limited calls, excelling in complex tasks.”

How Adaptive Branching Search Functions

The technique’s core is the Adaptive Branching Monte Carlo Tree Search (AB-MCTS) algorithm. It enables LLMs to perform intelligent trial-and-error by balancing “searching deeper” (refining a promising solution) and “searching wider” (generating new solutions). AB-MCTS blends these strategies, allowing the system to refine ideas or pivot to new ones when needed.

This is powered by Monte Carlo Tree Search (MCTS), a decision-making algorithm used in DeepMind’s AlphaGo. AB-MCTS employs probability models to decide whether to refine or restart at each step.

Different test-time scaling strategies Source: Sakana AI

Multi-LLM AB-MCTS takes this further by deciding not only “what” to do (refine or generate) but also “which” LLM to use. Initially unaware of the best model for a task, the system tests a mix of LLMs, learning over time which ones perform better and assigning them more work.

Testing the AI Team

The Multi-LLM AB-MCTS system was evaluated on the ARC-AGI-2 benchmark, which tests human-like visual reasoning on novel problems, a tough challenge for AI.

The team combined frontier models like o4-mini, Gemini 2.5 Pro, and DeepSeek-R1.

The model collective solved over 30% of the 120 test problems, far surpassing any single model’s performance. The system dynamically assigned the best model for each task, quickly identifying the most effective LLM when a clear solution path existed.

AB-MCTS vs individual models (source: Sakana AI)
AB-MCTS vs individual models Source: Sakana AI

Remarkably, the system solved problems previously unsolvable by any single model. In one instance, an incorrect solution from o4-mini was refined by DeepSeek-R1 and Gemini-2.5 Pro, which corrected the error and delivered the right answer.

“This shows Multi-LLM AB-MCTS can combine frontier models to tackle previously unsolvable challenges, expanding the limits of collective AI intelligence,” the researchers state.

AB-MTCS can select different models at different stages of solving a problem (source: Sakana AI)
AB-MTCS can select different models at different stages of solving a problem Source: Sakana AI

“Each model’s strengths and hallucination tendencies vary,” Akiba noted. “By pairing models with lower hallucination risks, we can achieve both powerful reasoning and reliability, addressing a key business concern.”

From Research to Practical Use

Sakana AI has released TreeQuest, an open-source framework under the Apache 2.0 license, enabling developers and businesses to implement Multi-LLM AB-MCTS. Its flexible API supports custom scoring and logic for diverse tasks.

“We’re still exploring AB-MCTS for specific business challenges, but its potential is clear,” Akiba said.

Beyond the ARC-AGI-2 benchmark, AB-MCTS has succeeded in tasks like complex coding and improving machine learning model accuracy.

“AB-MCTS excels in iterative trial-and-error tasks, such as optimizing software performance metrics,” Akiba added. “For instance, it could automatically reduce web service response latency.”

This open-source tool could enable a new generation of robust, reliable enterprise AI applications.

Related article
Multiverse Computing Launches Free Compressed Generative AI Model Multiverse Computing Launches Free Compressed Generative AI Model Large language models face a significant challenge: their immense size. Spanish startup Multiverse Computing is tackling this problem by creating compressed models designed to bridge the gap between the capabilities of cutting-edge AI and what busine
Secret Tracking Data Exposes Theft of AI Models Secret Tracking Data Exposes Theft of AI Models A new method can invisibly watermark models like ChatGPT in seconds without retraining, leaving no trace in standard outputs and resisting all practical removal attempts. The key distinction between watermarking and 'copyright-baiting' is that waterm
AI Systems Tricked into Approving Absurd Scientific Papers AI Systems Tricked into Approving Absurd Scientific Papers New research reveals that AI systems can now produce fraudulent scientific papers that other AI models mistakenly accept as authentic. These fabricated studies bypass detection methods that were previously effective, highlighting the risk of research
Related Special Topic Recommendations
writing Top AI Fiction Profile Creators: Generate Consistent Character Motivations and Fatal Flaws
Top AI Fiction Profile Creators: Generate Consistent Character Motivations and Fatal Flaws

Discover the 2026 best AI fiction profile creators for crafting deep characters. XIX.AI's curated list features top-rated, game-changing tools that generate consistent motivations and fatal flaws. Compare free vs paid options with real-world tests. Unlock your storytelling potential now.

10 tools
xix.ai
Business Top AI Pricing Optimization Software: Track Competitors & Auto-Adjust Store Prices
Top AI Pricing Optimization Software: Track Competitors & Auto-Adjust Store Prices

Discover the 2026 best AI pricing optimization software on XIX.AI. Our curated list features top-rated, game-changing tools that track competitors and auto-adjust your store prices for maximum profit. Compare free vs paid options with real-world tests. Unlock your pricing edge now.

10 tools
xix.ai
code Best AI Code Reviewers: Automate Clean Code Compliance & Refactor Legacy Repo Files
Best AI Code Reviewers: Automate Clean Code Compliance & Refactor Legacy Repo Files

Discover the 2026 best AI code reviewers on XIX.AI. Our curated list features top-rated, game-changing tools for automating clean code compliance and refactoring legacy repo files. Compare free vs paid options with real-world tests and weekly updated rankings. Unlock your AI edge today.

10 tools
xix.ai
Text-to-speech Top AI TTS Apps for Dyslexia: Support Learning and Reading Efficiency for Students
Top AI TTS Apps for Dyslexia: Support Learning and Reading Efficiency for Students

Discover the 2026 latest top-rated AI TTS apps curated for dyslexia support. Our expert rankings compare free vs paid tools, highlighting powerful features for enhanced reading efficiency and learning. Explore must-try, game-changing solutions to unlock student potential. Start your journey at XIX.AI.

10 tools
xix.ai
Comic Creation Top AI Generators for Shonen Manga: Create High-Octane Action Sequences & Energy Effects
Top AI Generators for Shonen Manga: Create High-Octane Action Sequences & Energy Effects

Discover the 2026 best AI generators for Shonen manga at XIX.AI. Our top-rated, curated list features powerful tools for creating high-octane action sequences and dynamic energy effects. Compare free vs paid options with real-world tests. Unlock your creative potential and start crafting epic manga today!

15 tools
xix.ai
Business Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically
Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically

2026 Latest Best AI Expense Trackers: Top-rated tools to scan receipts & categorize corporate spend automatically. Discover powerful, game-changing solutions for effortless expense management, accurate financial tracking, and streamlined compliance. Our curated, weekly-updated comparison of free vs paid options helps you find the perfect fit. Unlock your AI edge with XIX.AI's expert picks.

10 tools
xix.ai
Comments (1)
0/500
JoseJackson
JoseJackson February 12, 2026 at 3:00:30 AM EST

Вот это да! Модели ИИ, которые сотрудничают, как настоящая команда? Это напоминает мне о том, как разные эксперты работают вместе над сложным проектом. Интересно, сможет ли этот подход решить проблему узкой специализации ИИ, когда одна модель хороша в одном, но бесполезна в другом. Надеюсь, это не приведёт к тому, что ИИ начнут 'ссориться' между собой из-за разных точек зрения! 🤔

OR