option
Home
News
New $1.5B Router Model Hits 93% Accuracy, Eliminating Expensive Retraining Costs

New $1.5B Router Model Hits 93% Accuracy, Eliminating Expensive Retraining Costs

November 24, 2025
89

Researchers at Katanemo Labs have unveiled Arch-Router, an advanced routing model and framework engineered to intelligently direct user queries to the most appropriate large language model (LLM).

For companies developing products that leverage multiple LLMs, Arch-Rolver tackles a central dilemma: how to automatically route each request to the ideal model for the task, without depending on inflexible logic or expensive retraining whenever updates are needed.

The challenges of LLM routing

As the variety of available LLMs expands, developers are shifting from single-model configurations to multi-model architectures that utilize the distinct capabilities of different models for specialized functions—such as generating code, summarizing text, or editing images.

LLM routing has become an essential technique for constructing and running such systems, serving as an intelligent traffic director that guides each user query to the model best suited to handle it.

Current routing approaches generally fit into two main groups: task-based routing, which assigns queries according to predefined task categories, and performance-based routing, which seeks the best trade-off between expense and output quality.

However, task-based systems often falter when user intent is ambiguous or changes over the course of a conversation—especially in multi-turn dialogues. Performance-based routing, meanwhile, tends to prioritize static benchmark results, frequently overlooking actual user preferences and adapting slowly to new models without costly retraining.

As the researchers at Katanemo Labs state in their paper, a deeper issue is that “existing routing methods have practical limitations in real-world applications. Most are optimized for benchmark performance but ignore human preferences, which are guided by subjective evaluation criteria.”

The team emphasizes the importance of routing systems that “reflect subjective human judgments, provide greater transparency, and remain easily adjustable as both models and applications evolve.”

A new framework for preference-aligned routing

To overcome these issues, the researchers developed a “preference-aligned routing” framework that matches incoming queries to routing rules based on custom user preferences.

In this system, users define their routing policies using natural language through a two-tier “Domain-Action Taxonomy.” This structure reflects how people naturally describe tasks: starting with a broad category—the Domain, such as “legal” or “finance”—and drilling down to a specific task—the Action, like “summarization” or “coding.”

Each policy is then mapped to a preferred model, empowering developers to base routing choices on practical requirements rather than benchmark metrics alone. According to the paper, “This taxonomy acts as a mental model to help users create well-defined, structured routing policies.”

The routing procedure operates in two phases. First, a preference-aligned router model evaluates the user’s query alongside all available policies and picks the best-fitting one. Second, a mapping function connects the selected policy to its assigned LLM.

Because the logic for selecting a model is separated from the policy definition, developers can add, remove, or update models just by editing the routing rules—without retraining or changing the router. This separation enables the necessary flexibility for production environments, where models and applications are constantly changing.

Preference-aligned routing framework (source: arXiv)
Preference-aligned routing framework Source: arXiv

The policy selection is powered by Arch-Router, a compact 1.5-billion-parameter language model optimized for preference-aware routing. Arch-Router takes the user query and the full list of policy descriptions as input, then outputs the identifier of the most suitable policy.

Since policies are included in the input, the system can adjust to new or updated routes during inference through in-context learning—no retraining required. This generative strategy enables Arch-Router to leverage its pre-trained understanding to interpret the meaning of both the query and the policies, and to analyze complete conversation histories in one go.

One common worry with including lengthy policy lists in a prompt is the risk of higher latency. However, the team built Arch-Router for high efficiency. “Even with extensive routing policies, we can expand Arch-Router’s context window with very little effect on latency,” says Salman Paracha, co-author of the paper and Founder/CEO of Katanemo Labs. He points out that latency is mainly determined by output length, and Arch-Router only outputs a short policy name—such as “image_editing” or “document_creation.”

Arch-Router in action

To create Arch-Router, the team fine-tuned a 1.5B parameter variant of the Qwen 2.5 model using a carefully assembled dataset of 43,000 examples. They then benchmarked it against leading proprietary models from OpenAI, Anthropic, and Google across four public datasets designed to test conversational AI systems.

The findings indicate that Arch-Router achieved the top overall routing score of 93.17%, outperforming all other models—including top-tier proprietary ones—by an average of 7.71%. The model’s edge became more apparent in longer conversations, showcasing its superior ability to maintain context across multiple exchanges.

Arch-Router vs other models (source: arXiv)
Arch-Router vs other models Source: arXiv

In real-world use, this methodology is already being applied in multiple settings, notes Paracha. For instance, in open-source coding platforms, developers rely on Arch-Router to guide different parts of their workflow—like “code design,” “code understanding,” and “code generation”—to the LLMs most effective for each step. Similarly, organizations can route document creation tasks to a model such as Claude 3.7 Sonnet while sending image editing requests to Gemini 2.5 Pro.

The system is also well-suited “for personal assistants across various fields, where users perform a range of activities from summarizing text to answering factual queries,” Paracha explained, adding that “in such situations, Arch-Router helps product teams consolidate and improve the user’s overall experience.”

This framework is built into Arch, Katanemo Labs’ AI-native proxy server for agents, which supports the implementation of granular traffic management rules. For example, when adding a new LLM, a team can route a small percentage of traffic under a certain policy to the new model, validate its performance using internal analytics, and then confidently shift all traffic over. The company is also working to integrate its tools with evaluation platforms to make this workflow even smoother for corporate developers.

At its core, the objective is to help organizations move beyond disconnected AI implementations. “Arch-Router—and the Arch platform overall—enables developers and businesses to evolve from fragmented LLM usage to a unified, policy-governed system,” Paracha states. “When users perform a wide range of tasks, our platform converts that diversity of tasks and models into a cohesive experience, making the final product feel seamless and intuitive.”

Related article
Satya Nadella ready to exploit new OpenAI deal Satya Nadella ready to exploit new OpenAI deal On Wednesday, a Wall Street analyst asked Microsoft CEO Satya Nadella directly how the revised OpenAI partnership would affect the company’s financials.Nadella described the new agreement as a win for everyone. “We feel good about our partnership wit
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Google rolls out Gemini in Chrome to India Google rolls out Gemini in Chrome to India On Wednesday, Google announced it is expanding Gemini integration for Chrome to new regions, including India, Canada, and New Zealand. This rollout allows desktop users to access Gemini via a sidebar, where they can ask Google’s AI chatbot about on-s
Related Special Topic Recommendations
Comic Creation Top AI Generators for Shonen Manga: Create High-Octane Action Sequences & Energy Effects
Top AI Generators for Shonen Manga: Create High-Octane Action Sequences & Energy Effects

Discover the 2026 best AI generators for Shonen manga at XIX.AI. Our top-rated, curated list features powerful tools for creating high-octane action sequences and dynamic energy effects. Compare free vs paid options with real-world tests. Unlock your creative potential and start crafting epic manga today!

15 tools
xix.ai
Business Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically
Best AI Expense Trackers: Scan Receipts & Categorize Corporate Spend Automatically

2026 Latest Best AI Expense Trackers: Top-rated tools to scan receipts & categorize corporate spend automatically. Discover powerful, game-changing solutions for effortless expense management, accurate financial tracking, and streamlined compliance. Our curated, weekly-updated comparison of free vs paid options helps you find the perfect fit. Unlock your AI edge with XIX.AI's expert picks.

10 tools
xix.ai
Business Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling
Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling

Discover the 2026 latest top-rated AI recruiting tools on XIX.AI. Our curated list features powerful, game-changing solutions for screening resumes and automating candidate interview scheduling. Compare free vs paid options with real-world tests and weekly updated rankings. Find your perfect hiring assistant and streamline your recruitment today!

10 tools
xix.ai
Productivity AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels
AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels

Discover the 2026 best AI personal wellness and focus coaches on XIX.AI. Our curated rankings feature top-rated, game-changing tools to manage burnout and boost mental energy. Compare free vs paid options with real-world insights. Unlock your path to peak productivity and well-being today.

10 tools
xix.ai
chatbot Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities
Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities

Discover the 2026 latest top-rated AI romantic chatbots for building genuine, long-term connections. Our curated list features powerful, consistent personalities, free vs paid comparisons, and real-world tests. Find your perfect companion and start building today at XIX.AI.

10 tools
xix.ai
Education and Learning Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows
Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows

Discover the 2026 best AI data science mentors to master SQL, Pandas & ML workflows. Explore our top-rated, curated selection at XIX.AI for powerful, game-changing guidance. Compare free vs paid options with real-world insights. Unlock your data science mastery today.

10 tools
xix.ai
Comments (1)
0/500
WillGarcía
WillGarcía April 5, 2026 at 10:00:35 PM EDT

Arch-Routerの構想は面白いね。社内でどのLLMを使うか毎回悩んでたから、これがあれば効率化に繋がりそう。ただ、精度93%って、結局残りの7%で重大なミスルーティングが起きたりしない? 医療や法務のようなクリティカルな分野への適用は少し不安かな。😅 開発元のKatanemo Labs、これでインフラ市場に本格参戦するつもり?

OR