option
Home
News
Open Source LLMs Included in Europe's Digital Sovereignty Roadmap

Open Source LLMs Included in Europe's Digital Sovereignty Roadmap

April 17, 2025
224

Open Source LLMs Included in Europe

Last week, Europe's digital sovereignty agenda got a significant boost with the announcement of a new initiative aimed at developing a series of fully open-source large language models (LLMs) that cater to all European Union languages. This ambitious project, dubbed OpenEuroLLM, targets not only the 24 official EU languages but also extends to languages from countries negotiating EU entry, like Albania, emphasizing future-proofing.

OpenEuroLLM is a collaborative effort involving around 20 organizations, co-led by Jan Hajič, a computational linguist from Charles University in Prague, and Peter Sarlin, the CEO and co-founder of the Finnish AI lab Silo AI, which was acquired by AMD for $665 million last year. This initiative aligns with Europe's broader push towards digital sovereignty, aiming to keep critical infrastructure and tools within the continent. This move echoes the actions of major cloud providers and AI companies like OpenAI, who have been investing in local infrastructure to ensure EU data remains on European soil.

Moreover, the EU has recently signed an $11 billion deal to establish a sovereign satellite constellation, positioning itself as a competitor to Elon Musk's Starlink. OpenEuroLLM fits perfectly into this narrative, focusing on maintaining Europe's technological autonomy.

Funding and Challenges

Despite its ambitious goals, the budget allocated for developing the models is €37.4 million, with approximately €20 million coming from the EU’s Digital Europe Programme. This amount pales in comparison to the investments made by corporate AI giants, though the total budget increases when considering funding for related work. A significant part of the expense is compute power, with OpenEuroLLM partnering with EuroHPC supercomputer centers in Spain, Italy, Finland, and the Netherlands, which are part of a €7 billion broader EuroHPC project.

The diverse group of participants, ranging from academia to corporations, raises questions about the project's feasibility. Anastasia Stasenko, co-founder of LLM company Pleias, expressed skepticism about the effectiveness of such a large consortium compared to more agile, focused private AI firms like Mistral AI and LightOn. These smaller teams, she argues, have a more direct responsibility and can react more swiftly to challenges.

Building from Scratch or Leveraging Existing Work?

OpenEuroLLM's starting point is somewhat ambiguous. Since 2022, Jan Hajič has been coordinating the High Performance Language Technologies (HPLT) project, which focuses on developing free and reusable datasets, models, and workflows using high-performance computing. This project, set to end in late 2025, shares many partners with OpenEuroLLM, excluding those from the U.K.

Hajič views HPLT as a precursor to OpenEuroLLM, noting that it provides a solid foundation in data, expertise, tools, and computing experience. He anticipates releasing the first versions of OpenEuroLLM by mid-2026, with the final versions expected by the project's conclusion in 2028. However, the project's GitHub profile remains sparse, indicating a start from scratch in some respects. Hajič mentioned that the project officially began on February 1, 2024, after a year of preparation.

The OpenEuroLLM consortium includes organizations from Czechia, the Netherlands, Germany, Sweden, Finland, and Norway, alongside corporate entities like Silo AI, Aleph Alpha, Ellamind, Prompsit Language Engineering, and LightOn. Notably absent is Mistral, a French AI unicorn, despite Hajič's attempts to engage them in discussions.

Goals and Deliverables

The project's primary objective is to create a series of foundation models for transparent AI in Europe, preserving the linguistic and cultural diversity of all EU languages, both current and future. The deliverables are still being finalized but are expected to include a core multilingual LLM for general-purpose tasks and smaller, quantized versions for edge applications where efficiency is key.

Hajič emphasized the importance of quality, stating that the project aims to avoid releasing half-baked solutions, given the high stakes and public funding involved. Achieving equal proficiency across all languages, especially those with limited digital resources, remains a challenge. The project plans to use benchmarks that accurately represent these languages and cultures.

Data from the HPLT project, including a 4.5 petabyte dataset from web crawls and over 20 billion documents, will be utilized, supplemented by data from Common Crawl.

Open Source Dilemmas

The debate over what constitutes "open source" in AI is ongoing. The Open Source Initiative (OSI) has defined "open source AI," but some argue that it should include not just models but also datasets, pretrained models, and weights. OpenEuroLLM aims to be "truly open," but Hajič acknowledges potential limitations due to European copyright laws and data redistribution restrictions. Some training data may need to be kept confidential but available for auditing as per the EU AI Act.

Overlap with Existing Projects

The launch of OpenEuroLLM has drawn comparisons to the recently launched EuroLLM, which shares similar goals and is also co-funded by the EU. EuroLLM, which released its first model in September and a follow-up in December, has sparked concerns about redundancy and the need for collaboration rather than competition. Andre Martins, head of research at Unbabel, highlighted these similarities on social media, urging for open collaboration among the different communities.

Hajič acknowledged the unfortunate overlap but expressed hope for cooperation, noting that OpenEuroLLM's funding restrictions limit collaborations with non-EU entities, including U.K. universities.

Funding and Expectations

The emergence of China's DeepSeek, with its promising cost-to-performance ratio, has raised questions about the true costs of building AI models. Peter Sarlin, technical co-lead of OpenEuroLLM, noted the lack of detailed information about DeepSeek's development but remains confident in OpenEuroLLM's funding, which primarily covers personnel costs. The compute expenses are expected to be covered by the EuroHPC centers.

Sarlin emphasized that OpenEuroLLM is not aiming to create a consumer or enterprise product but rather to provide an open-source foundation model as AI infrastructure for European companies. He believes the allocated budget is sufficient for this purpose, drawing on his experience with Silo AI, which has already developed models supporting several European languages and is preparing to launch the "Europa" models covering all European languages.

Digital Sovereignty and Collaboration

Despite the challenges and criticisms, Hajič remains optimistic about the potential of collaborative projects like OpenEuroLLM. He believes that combining academic expertise with corporate focus could lead to innovative outcomes. The ultimate goal is not to compete with Big Tech or billion-dollar AI startups but to enhance Europe's digital sovereignty by developing foundation LLMs built by and for Europe.

Even if OpenEuroLLM does not produce the top-performing model, Hajič sees value in having a "good" model that is entirely based in Europe, contributing positively to the continent's technological autonomy.

Related article
Cursor AI Coding Startup to Hire 200 in Asia-Pacific After Significant Investment from SpaceX Cursor AI Coding Startup to Hire 200 in Asia-Pacific After Significant Investment from SpaceX AI coding startup Cursor has unveiled a major global expansion, planning to hire 200 employees across the Asia-Pacific region over the next six months. Key roles include marketing engineers, field engineers, and AI deployment engineers. This move und
Claude Used to Create Malicious npm Packages: Over 670 Compromised Threaten Open Source Claude Used to Create Malicious npm Packages: Over 670 Compromised Threaten Open Source A recent cybersecurity incident reveals how large language models (LLMs) are being weaponized for malicious software development. Security researcher Sibi Moosa spotted an attacker using the alias "mousie-5212-super-formatter" leveraging Anthropic's
Reliance unveils $110B AI investment plan as India accelerates tech drive Reliance unveils $110B AI investment plan as India accelerates tech drive Mukesh Ambani, the billionaire chairman of India's Reliance conglomerate, announced on Thursday a ₹10 trillion (roughly $110 billion) plan to build AI computing infrastructure across India over the next seven years.Speaking at the India AI Impact Sum
Related Special Topic Recommendations
Animation Creation AI Anime Generator for Donghua: Create Web Novel Characters & Comic Avatars
AI Anime Generator for Donghua: Create Web Novel Characters & Comic Avatars

Discover the 2026 best AI anime generators for donghua. Our top-rated, curated list features powerful tools to create stunning web novel characters and comic avatars. Compare free vs paid options with real-world tests. Find your perfect creative partner and bring your stories to life today at XIX.AI.

10 tools
xix.ai
Comic Creation Top AI Auto-Colorization Tools for Manga: Apply Flat Colors with Zero Consistency Errors
Top AI Auto-Colorization Tools for Manga: Apply Flat Colors with Zero Consistency Errors

Discover the 2026 best AI auto-colorization tools for manga at XIX.AI. Our curated list features top-rated, game-changing solutions that apply flat colors with zero consistency errors, boosting your productivity. Explore free vs paid comparisons, real-world tests, and weekly updated rankings to find your perfect match. Unlock your AI edge today.

10 tools
xix.ai
writing Top AI Fiction Profile Creators: Generate Consistent Character Motivations and Fatal Flaws
Top AI Fiction Profile Creators: Generate Consistent Character Motivations and Fatal Flaws

Discover the 2026 best AI fiction profile creators for crafting deep characters. XIX.AI's curated list features top-rated, game-changing tools that generate consistent motivations and fatal flaws. Compare free vs paid options with real-world tests. Unlock your storytelling potential now.

10 tools
xix.ai
Business Top AI Pricing Optimization Software: Track Competitors & Auto-Adjust Store Prices
Top AI Pricing Optimization Software: Track Competitors & Auto-Adjust Store Prices

Discover the 2026 best AI pricing optimization software on XIX.AI. Our curated list features top-rated, game-changing tools that track competitors and auto-adjust your store prices for maximum profit. Compare free vs paid options with real-world tests. Unlock your pricing edge now.

10 tools
xix.ai
code Best AI Code Reviewers: Automate Clean Code Compliance & Refactor Legacy Repo Files
Best AI Code Reviewers: Automate Clean Code Compliance & Refactor Legacy Repo Files

Discover the 2026 best AI code reviewers on XIX.AI. Our curated list features top-rated, game-changing tools for automating clean code compliance and refactoring legacy repo files. Compare free vs paid options with real-world tests and weekly updated rankings. Unlock your AI edge today.

10 tools
xix.ai
Text-to-speech Top AI TTS Apps for Dyslexia: Support Learning and Reading Efficiency for Students
Top AI TTS Apps for Dyslexia: Support Learning and Reading Efficiency for Students

Discover the 2026 latest top-rated AI TTS apps curated for dyslexia support. Our expert rankings compare free vs paid tools, highlighting powerful features for enhanced reading efficiency and learning. Explore must-try, game-changing solutions to unlock student potential. Start your journey at XIX.AI.

10 tools
xix.ai
Comments (23)
0/500
EdwardJackson
EdwardJackson March 25, 2026 at 4:00:43 AM EDT

A bold plan, but the practicality worries me. Training LLMs for dozens of languages with nuanced cultural contexts sounds massively resource-intensive. Can this truly compete with existing centralized models, or will it be more of a symbolic sovereignty project?

JosephWalker
JosephWalker March 25, 2026 at 4:00:43 AM EDT

看到欧洲也要搞自己的开源大模型,有点意思!不过OpenEuroLLM真能做到覆盖所有欧盟语言吗?成本和技术难度想想都吓人,希望别最后成了个半吊子项目

AlbertThomas
AlbertThomas March 25, 2026 at 4:00:43 AM EDT

Iniciativa bacana, mas será que a Europa vai conseguir acompanhar o ritmo de IA quando o foco é espalhar os recursos por tantos idiomas? 🤔 Pode ficar defasado antes de ficar pronto...

RonaldWilliams
RonaldWilliams February 4, 2026 at 7:00:27 AM EST

欧洲在AI基础设施上的自主布局确实明智,开源大语言模型能降低对单一技术供应商的依赖,不过资金和人才招募可能会是现实挑战。希望这个OpenEuroLLM项目能真正考虑小语种使用者的需求,而不仅仅是英法德这些主流语言 🌍

JustinAnderson
JustinAnderson December 30, 2025 at 11:30:57 AM EST

¡Vaya, esto sí es interesante! Un modelo de IA europeo y de código abierto... ¿Será la respuesta a la dependencia tecnológica que tenemos con EE.UU. y China? Me pregunto si realmente tendrá la misma potencia que los modelos cerrados de las grandes empresas. 🤔 Si logran cubrir todos los idiomas de la UE, sería un logro enorme para la diversidad cultural digital. ¡Ojalá vaya más allá de lo político y tenga un impacto real! 😊

StevenMartin
StevenMartin August 16, 2025 at 1:00:59 PM EDT

Wow, OpenEuroLLM sounds like a game-changer for Europe's tech scene! Building LLMs for all EU languages is ambitious—imagine the boost for local AI startups. But can they keep up with the big players like OpenAI? 🤔

OR