Open Source LLMs Included in Europe's Digital Sovereignty Roadmap

Home

News

April 17, 2025

AnthonyMartinez

116

Open Source LLMs Included in Europe

Last week, Europe's digital sovereignty agenda got a significant boost with the announcement of a new initiative aimed at developing a series of fully open-source large language models (LLMs) that cater to all European Union languages. This ambitious project, dubbed OpenEuroLLM, targets not only the 24 official EU languages but also extends to languages from countries negotiating EU entry, like Albania, emphasizing future-proofing.

OpenEuroLLM is a collaborative effort involving around 20 organizations, co-led by Jan Hajič, a computational linguist from Charles University in Prague, and Peter Sarlin, the CEO and co-founder of the Finnish AI lab Silo AI, which was acquired by AMD for $665 million last year. This initiative aligns with Europe's broader push towards digital sovereignty, aiming to keep critical infrastructure and tools within the continent. This move echoes the actions of major cloud providers and AI companies like OpenAI, who have been investing in local infrastructure to ensure EU data remains on European soil.

Moreover, the EU has recently signed an $11 billion deal to establish a sovereign satellite constellation, positioning itself as a competitor to Elon Musk's Starlink. OpenEuroLLM fits perfectly into this narrative, focusing on maintaining Europe's technological autonomy.

Funding and Challenges

Despite its ambitious goals, the budget allocated for developing the models is €37.4 million, with approximately €20 million coming from the EU’s Digital Europe Programme. This amount pales in comparison to the investments made by corporate AI giants, though the total budget increases when considering funding for related work. A significant part of the expense is compute power, with OpenEuroLLM partnering with EuroHPC supercomputer centers in Spain, Italy, Finland, and the Netherlands, which are part of a €7 billion broader EuroHPC project.

The diverse group of participants, ranging from academia to corporations, raises questions about the project's feasibility. Anastasia Stasenko, co-founder of LLM company Pleias, expressed skepticism about the effectiveness of such a large consortium compared to more agile, focused private AI firms like Mistral AI and LightOn. These smaller teams, she argues, have a more direct responsibility and can react more swiftly to challenges.

Building from Scratch or Leveraging Existing Work?

OpenEuroLLM's starting point is somewhat ambiguous. Since 2022, Jan Hajič has been coordinating the High Performance Language Technologies (HPLT) project, which focuses on developing free and reusable datasets, models, and workflows using high-performance computing. This project, set to end in late 2025, shares many partners with OpenEuroLLM, excluding those from the U.K.

Hajič views HPLT as a precursor to OpenEuroLLM, noting that it provides a solid foundation in data, expertise, tools, and computing experience. He anticipates releasing the first versions of OpenEuroLLM by mid-2026, with the final versions expected by the project's conclusion in 2028. However, the project's GitHub profile remains sparse, indicating a start from scratch in some respects. Hajič mentioned that the project officially began on February 1, 2024, after a year of preparation.

The OpenEuroLLM consortium includes organizations from Czechia, the Netherlands, Germany, Sweden, Finland, and Norway, alongside corporate entities like Silo AI, Aleph Alpha, Ellamind, Prompsit Language Engineering, and LightOn. Notably absent is Mistral, a French AI unicorn, despite Hajič's attempts to engage them in discussions.

Goals and Deliverables

The project's primary objective is to create a series of foundation models for transparent AI in Europe, preserving the linguistic and cultural diversity of all EU languages, both current and future. The deliverables are still being finalized but are expected to include a core multilingual LLM for general-purpose tasks and smaller, quantized versions for edge applications where efficiency is key.

Hajič emphasized the importance of quality, stating that the project aims to avoid releasing half-baked solutions, given the high stakes and public funding involved. Achieving equal proficiency across all languages, especially those with limited digital resources, remains a challenge. The project plans to use benchmarks that accurately represent these languages and cultures.

Data from the HPLT project, including a 4.5 petabyte dataset from web crawls and over 20 billion documents, will be utilized, supplemented by data from Common Crawl.

Open Source Dilemmas

The debate over what constitutes "open source" in AI is ongoing. The Open Source Initiative (OSI) has defined "open source AI," but some argue that it should include not just models but also datasets, pretrained models, and weights. OpenEuroLLM aims to be "truly open," but Hajič acknowledges potential limitations due to European copyright laws and data redistribution restrictions. Some training data may need to be kept confidential but available for auditing as per the EU AI Act.

Overlap with Existing Projects

The launch of OpenEuroLLM has drawn comparisons to the recently launched EuroLLM, which shares similar goals and is also co-funded by the EU. EuroLLM, which released its first model in September and a follow-up in December, has sparked concerns about redundancy and the need for collaboration rather than competition. Andre Martins, head of research at Unbabel, highlighted these similarities on social media, urging for open collaboration among the different communities.

Hajič acknowledged the unfortunate overlap but expressed hope for cooperation, noting that OpenEuroLLM's funding restrictions limit collaborations with non-EU entities, including U.K. universities.

Funding and Expectations

The emergence of China's DeepSeek, with its promising cost-to-performance ratio, has raised questions about the true costs of building AI models. Peter Sarlin, technical co-lead of OpenEuroLLM, noted the lack of detailed information about DeepSeek's development but remains confident in OpenEuroLLM's funding, which primarily covers personnel costs. The compute expenses are expected to be covered by the EuroHPC centers.

Sarlin emphasized that OpenEuroLLM is not aiming to create a consumer or enterprise product but rather to provide an open-source foundation model as AI infrastructure for European companies. He believes the allocated budget is sufficient for this purpose, drawing on his experience with Silo AI, which has already developed models supporting several European languages and is preparing to launch the "Europa" models covering all European languages.

Digital Sovereignty and Collaboration

Despite the challenges and criticisms, Hajič remains optimistic about the potential of collaborative projects like OpenEuroLLM. He believes that combining academic expertise with corporate focus could lead to innovative outcomes. The ultimate goal is not to compete with Big Tech or billion-dollar AI startups but to enhance Europe's digital sovereignty by developing foundation LLMs built by and for Europe.

Even if OpenEuroLLM does not produce the top-performing model, Hajič sees value in having a "good" model that is entirely based in Europe, contributing positively to the continent's technological autonomy.

AI-Powered Cover Letters: Expert Guide for Journal Submissions In today's competitive academic publishing environment, crafting an effective cover letter can make the crucial difference in your manuscript's acceptance. Discover how AI-powered tools like ChatGPT can streamline this essential task, helping you cre

US to Sanction Foreign Officials Over Social Media Regulations US Takes Stand Against Global Digital Content Regulations The State Department issued a sharp diplomatic rebuke this week targeting European digital governance policies, signaling escalating tensions over control of online platforms. Secretary Marco

Ultimate Guide to AI-Powered YouTube Video Summarizers In our information-rich digital landscape, AI-powered YouTube video summarizers have become indispensable for efficient content consumption. This in-depth guide explores how to build a sophisticated summarization tool using cutting-edge NLP technolog

Comments (18)

0/200

Submit

StevenMartin

August 16, 2025 at 1:00:59 PM EDT

Wow, OpenEuroLLM sounds like a game-changer for Europe's tech scene! Building LLMs for all EU languages is ambitious—imagine the boost for local AI startups. But can they keep up with the big players like OpenAI? 🤔

PaulHill

August 7, 2025 at 2:01:06 PM EDT

Super cool to see Europe pushing for open-source LLMs! Can't wait to see how OpenEuroLLM handles all those languages. 🌍

ElijahCollins

July 23, 2025 at 12:59:29 AM EDT

Wow, OpenEuroLLM sounds like a game-changer for Europe’s tech scene! Building open-source LLMs for all EU languages is ambitious—imagine the possibilities for local businesses and multilingual AI apps. But I wonder, will they keep up with the pace of global AI giants? 🤔

PeterYoung

April 21, 2025 at 11:11:01 PM EDT

OpenEuroLLM sounds like a game-changer for Europe! Finally, we're getting open-source LLMs that cover all EU languages. It's about time we took control of our digital future. Can't wait to see how this develops! 🚀

CharlesThomas

April 21, 2025 at 8:18:24 PM EDT

オープンソースのLLMがEU全言語に対応するなんて素晴らしい！これでデジタルの未来を自分たちでコントロールできるようになるね。どう発展していくか楽しみだよ！🌟

MatthewGonzalez

April 21, 2025 at 8:16:04 PM EDT

OpenEuroLLM parece ser uma grande mudança para a Europa! Finalmente, LLMs de código aberto que cobrem todos os idiomas da UE. Está na hora de assumirmos o controle do nosso futuro digital. Mal posso esperar para ver como isso vai se desenvolver! 🚀