Composo: Monitoring AI App Performance for Enterprises

AI and large language models (LLMs) are super promising, but let's be real—they can be a bit hit or miss. No one's quite sure when we'll iron out all the kinks, so it's no surprise that startups are jumping in to help businesses make sure their LLM-powered apps actually do what they're supposed to.
Enter Composo, a London-based startup that thinks it's got a leg up on solving this issue. They've got custom models that help companies check if their LLM apps are accurate and up to snuff.
Composo's not alone in this space; they're up against the likes of Agenta, Freeplay, Humanloop, and LangSmith, all of whom are trying to offer a better, LLM-based way to test apps instead of relying on humans, checklists, or old-school tools. But Composo says it's different because it offers both a no-code option and an API. This means more people can use it, not just developers—domain experts and execs can jump in and check for inconsistencies, quality, and accuracy themselves.
Here's how it works: Composo mixes a reward model, trained on what people want to see from an AI app, with specific criteria for that app. It then scores how well the app's output matches those criteria. For example, if you've got a medical triage chatbot, you can set custom guidelines to watch for red flag symptoms, and Composo will tell you how well the app sticks to those rules.
They've just launched a public API for Composo Align, which helps evaluate LLM apps based on any criteria you set.
It seems to be paying off—they've got big names like Accenture, Palantir, and McKinsey on their client list, and they've recently nabbed $2 million in pre-seed funding. That might not sound like a lot, especially in the AI world where cash is usually flowing, but Composo's co-founder and CEO, Sebastian Fox, says they don't need a ton of money. "For the next three years at least, we don’t foresee ourselves raising hundreds of millions because there’s a lot of people building foundation models and doing so very effectively, and that’s not our USP," said Fox, who used to be a consultant at McKinsey. "Instead, each morning, if I wake up and see a news piece that OpenAI has made a huge advance in their models, that is good for my business."
With the new funds, Composo plans to beef up its engineering team (led by co-founder and CTO Luke Markham, a former machine learning engineer at Graphcore), snag more clients, and ramp up R&D. "The focus from this year is much more about scaling the technology that we now have across those companies," Fox said.
The seed round was led by British AI pre-seed fund Twin Path Ventures, with JVH Ventures and EWOR also chipping in. EWOR had already backed Composo through its accelerator program. "Composo is addressing a critical bottleneck in the adoption of enterprise AI," a Twin Path spokesperson said.
This bottleneck is a big deal for the whole AI scene, especially for businesses, according to Fox. "People are over the hype of excitement and are now thinking, 'Well, actually, does this really change anything about my business in its current form? Because it’s not reliable enough, and it’s not consistent enough. And even if it is, you can’t prove to me how much it is,'" he explained.
This could make Composo super valuable for companies wanting to use AI but worried about the risks. That's why they're industry-agnostic but still focus on compliance, legal, healthcare, and security.
As for what sets them apart, Fox says it's not easy to replicate what they've done. "There’s both the architecture of the model and the data that we’ve used to train it," he said, noting that Composo Align was trained on a "large dataset of expert evaluations."
Sure, tech giants could throw their weight around and try to solve this problem, but Composo thinks it's got a head start. "The other [thing] is the data that we accrue over time," Fox said, talking about how they build up evaluation preferences.
Because it can assess apps against a flexible set of criteria, Composo also thinks it's better positioned for the rise of agentic AI than competitors with more rigid approaches. "In my opinion, we are definitely not at the stage where agents work well, and that’s actually what we’re trying to help solve," Fox said.
*TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.*
Related article
Notion transforms its workspace into a hub for AI agents
Notion, the productivity software company, is entering the agentic era.During a live-streamed product announcement on Wednesday, Notion—best known for its collaborative note-taking app—unveiled a new developer platform that extends the capabilities o
ElevenLabs names BlackRock, Jamie Foxx, Eva Longoria as new investors
ElevenLabs, the voice AI company, has disclosed additional investors in its $500 million Series D round, originally announced in February. These include institutional investors like BlackRock, Wellington, D.E. Shaw, and Schroders; corporations such a
WordPress.com now allows AI agents to write and publish posts, plus more
WordPress.com, the popular web hosting and publishing platform, is now embracing AI agents—a move that could reshape the look and feel of the web. The company announced Friday that it will allow AI agents to draft, edit, and publish content on custom
Related Special Topic Recommendations
Comments (56)
0/500
用AI去监控AI应用,这做法挺有趣的,有点「套娃」的味道。不过这类服务确实有需求,现在模型输出的稳定性确实是个痛点,尤其是对企业级应用来说。我们团队之前试过几个大模型API,偶尔抽风起来真的让人头疼,有个监测工具至少能提前预警吧。😅
This article on Composo is pretty eye-opening! It's wild how AI apps can be so powerful yet so unpredictable. Startups tackling LLM performance issues is a smart move—businesses need that reliability. Curious to see how this tech evolves! 😎
This article on Composo is pretty cool! It's wild how AI apps can be so powerful yet so unpredictable. Nice to see startups tackling the performance monitoring side—hope it makes LLMs more reliable for businesses! 😎
This article on Composo is super insightful! It’s wild how LLMs are so powerful yet so unpredictable. Excited to see startups tackling this to make AI apps more reliable! 😎
This article on Composo is pretty eye-opening! It's wild how AI apps can be so powerful yet so unpredictable. I wonder how startups like this will tackle the chaos of LLMs in real-world use. 🤔 Anyone else curious about the future of AI monitoring?

Notion transforms its workspace into a hub for AI agents
Notion, the productivity software company, is entering the agentic era.During a live-streamed product announcement on Wednesday, Notion—best known for its collaborative note-taking app—unveiled a new developer platform that extends the capabilities o
ElevenLabs names BlackRock, Jamie Foxx, Eva Longoria as new investors
ElevenLabs, the voice AI company, has disclosed additional investors in its $500 million Series D round, originally announced in February. These include institutional investors like BlackRock, Wellington, D.E. Shaw, and Schroders; corporations such a
WordPress.com now allows AI agents to write and publish posts, plus more
WordPress.com, the popular web hosting and publishing platform, is now embracing AI agents—a move that could reshape the look and feel of the web. The company announced Friday that it will allow AI agents to draft, edit, and publish content on custom
用AI去监控AI应用,这做法挺有趣的,有点「套娃」的味道。不过这类服务确实有需求,现在模型输出的稳定性确实是个痛点,尤其是对企业级应用来说。我们团队之前试过几个大模型API,偶尔抽风起来真的让人头疼,有个监测工具至少能提前预警吧。😅
This article on Composo is pretty eye-opening! It's wild how AI apps can be so powerful yet so unpredictable. Startups tackling LLM performance issues is a smart move—businesses need that reliability. Curious to see how this tech evolves! 😎
This article on Composo is pretty cool! It's wild how AI apps can be so powerful yet so unpredictable. Nice to see startups tackling the performance monitoring side—hope it makes LLMs more reliable for businesses! 😎
This article on Composo is super insightful! It’s wild how LLMs are so powerful yet so unpredictable. Excited to see startups tackling this to make AI apps more reliable! 😎
This article on Composo is pretty eye-opening! It's wild how AI apps can be so powerful yet so unpredictable. I wonder how startups like this will tackle the chaos of LLMs in real-world use. 🤔 Anyone else curious about the future of AI monitoring?





Home






