AI Crawlers Surge Wikimedia Commons Bandwidth Demand by 50%

The Wikimedia Foundation, the parent body behind Wikipedia and numerous other crowd-sourced knowledge platforms, announced on Wednesday a staggering 50% increase in bandwidth usage for multimedia downloads from Wikimedia Commons since January 2024. This surge, as detailed in a blog post on Tuesday, isn't driven by an uptick in human curiosity, but rather by automated scrapers hungry for data to train AI models.
“Our infrastructure is designed to handle sudden surges in traffic from humans during major events, but the volume of traffic from scraper bots is unmatched and poses increasing risks and costs,” the post explains.
Wikimedia Commons serves as a freely accessible hub for images, videos, and audio files, all available under open licenses or in the public domain.
Delving deeper, Wikimedia revealed that a whopping 65% of the most resource-intensive traffic—measured by the type of content consumed—comes from bots. Yet, these bots account for just 35% of overall pageviews. The discrepancy, according to Wikimedia, stems from how frequently accessed content is cached closer to users, while less popular content, which bots often target, is stored in the more costly "core data center."
“While human readers tend to focus on specific, often similar, topics, crawler bots tend to ‘bulk read’ a larger number of pages and visit less popular ones as well,” Wikimedia noted. “This results in these requests being forwarded to the core datacenter, which significantly increases our resource consumption costs.”
As a result, the Wikimedia Foundation's site reliability team is dedicating substantial time and resources to blocking these crawlers to prevent disruptions for everyday users. This doesn't even touch on the escalating cloud costs the Foundation is contending with.
This scenario is part of a broader trend that's endangering the open internet. Just last month, software engineer and open-source advocate Drew DeVault lamented that AI crawlers are blatantly ignoring “robots.txt” files intended to deter automated traffic. Similarly, Gergely Orosz, known as the "pragmatic engineer," recently voiced his frustration over how AI scrapers from companies like Meta have spiked bandwidth demands for his projects.
While open-source infrastructures are particularly vulnerable, developers are responding with ingenuity and determination. TechCrunch highlighted last week that some tech companies are stepping up. For instance, Cloudflare introduced AI Labyrinth, designed to slow down crawlers with AI-generated content.
Yet, it remains a constant game of cat and mouse, one that might push many publishers to retreat behind logins and paywalls, ultimately harming the open nature of the web we all rely on.
Related article
Agentic AI Revolutionizes Investing to Outperform Wall Street in 2025
For years, Wall Street firms have dominated the stock market, leveraging superior resources to shape profits. Now, cutting-edge technology, especially Artificial Intelligence, is balancing the scales.
Perplexity received 780 million queries last month, CEO says
{
"content": "Perplexity, the AI-powered search engine, is experiencing explosive growth—handling a staggering 780 million queries in May alone. CEO Aravind Srinivas dropped this bombshell during his appearance at Bloomberg’s Tech Summit, revealing that the platform is growing at over 20% month-ov
MURF AI vs. Descript: Comparing Top Text-to-Speech Tools
In the digital age, high-quality text-to-speech (TTS) solutions are vital for content creators, marketers, and educators. MURF AI Voices and Descript’s Overdub stand out as leading platforms, each off
Comments (10)
0/200
ThomasJones
April 17, 2025 at 12:00:00 AM EDT
Wikimedia Commons bandwidth usage up by 50%? 😲 That's insane! I guess all those AI crawlers are hungry for our data. It's cool that Wikimedia is keeping us posted, but man, this is gonna slow things down. Hope they find a way to handle it without messing up our experience! 🤞
0
RaymondGreen
April 18, 2025 at 12:00:00 AM EDT
ウィキメディア・コモンズの帯域使用量が50%増えたって?😲 信じられない!AIクローラーがデータを欲しがってるんだね。ウィキメディアが情報を共有してくれるのはいいけど、これで遅くなるのは嫌だな。ユーザー体験を壊さずに対応できるといいね!🤞
0
RogerSanchez
April 17, 2025 at 12:00:00 AM EDT
위키미디어 커먼즈의 대역폭 사용량이 50% 증가했다고? 😲 믿기지 않아! AI 크롤러들이 우리 데이터를 원하는 거겠지. 위키미디어가 정보를 공유해주는 건 좋지만, 이 때문에 느려지면 곤란해. 사용자 경험을 망치지 않고 해결할 방법을 찾았으면 좋겠어! 🤞
0
CarlTaylor
April 17, 2025 at 12:00:00 AM EDT
O uso de banda do Wikimedia Commons aumentou 50%? 😲 Isso é loucura! Acho que esses rastreadores de IA estão famintos pelos nossos dados. É legal que o Wikimedia nos mantenha informados, mas cara, isso vai atrasar tudo. Espero que eles encontrem uma maneira de lidar com isso sem estragar nossa experiência! 🤞
0
AlbertLee
April 18, 2025 at 12:00:00 AM EDT
¿El uso de ancho de banda de Wikimedia Commons aumentó un 50%? 😲 ¡Eso es una locura! Supongo que esos rastreadores de IA están hambrientos de nuestros datos. Es genial que Wikimedia nos mantenga informados, pero hombre, esto va a ralentizar todo. Espero que encuentren una manera de manejarlo sin arruinar nuestra experiencia. 🤞
0
ThomasHernández
April 17, 2025 at 12:00:00 AM EDT
The surge in bandwidth demand by AI crawlers on Wikimedia Commons is insane! It's cool to see AI being used so extensively, but it's also a bit worrying. Hope they find a way to manage it without affecting the user experience too much. 🤔
0
The Wikimedia Foundation, the parent body behind Wikipedia and numerous other crowd-sourced knowledge platforms, announced on Wednesday a staggering 50% increase in bandwidth usage for multimedia downloads from Wikimedia Commons since January 2024. This surge, as detailed in a blog post on Tuesday, isn't driven by an uptick in human curiosity, but rather by automated scrapers hungry for data to train AI models.
“Our infrastructure is designed to handle sudden surges in traffic from humans during major events, but the volume of traffic from scraper bots is unmatched and poses increasing risks and costs,” the post explains.
Wikimedia Commons serves as a freely accessible hub for images, videos, and audio files, all available under open licenses or in the public domain.
Delving deeper, Wikimedia revealed that a whopping 65% of the most resource-intensive traffic—measured by the type of content consumed—comes from bots. Yet, these bots account for just 35% of overall pageviews. The discrepancy, according to Wikimedia, stems from how frequently accessed content is cached closer to users, while less popular content, which bots often target, is stored in the more costly "core data center."
“While human readers tend to focus on specific, often similar, topics, crawler bots tend to ‘bulk read’ a larger number of pages and visit less popular ones as well,” Wikimedia noted. “This results in these requests being forwarded to the core datacenter, which significantly increases our resource consumption costs.”
As a result, the Wikimedia Foundation's site reliability team is dedicating substantial time and resources to blocking these crawlers to prevent disruptions for everyday users. This doesn't even touch on the escalating cloud costs the Foundation is contending with.
This scenario is part of a broader trend that's endangering the open internet. Just last month, software engineer and open-source advocate Drew DeVault lamented that AI crawlers are blatantly ignoring “robots.txt” files intended to deter automated traffic. Similarly, Gergely Orosz, known as the "pragmatic engineer," recently voiced his frustration over how AI scrapers from companies like Meta have spiked bandwidth demands for his projects.
While open-source infrastructures are particularly vulnerable, developers are responding with ingenuity and determination. TechCrunch highlighted last week that some tech companies are stepping up. For instance, Cloudflare introduced AI Labyrinth, designed to slow down crawlers with AI-generated content.
Yet, it remains a constant game of cat and mouse, one that might push many publishers to retreat behind logins and paywalls, ultimately harming the open nature of the web we all rely on.




Wikimedia Commons bandwidth usage up by 50%? 😲 That's insane! I guess all those AI crawlers are hungry for our data. It's cool that Wikimedia is keeping us posted, but man, this is gonna slow things down. Hope they find a way to handle it without messing up our experience! 🤞




ウィキメディア・コモンズの帯域使用量が50%増えたって?😲 信じられない!AIクローラーがデータを欲しがってるんだね。ウィキメディアが情報を共有してくれるのはいいけど、これで遅くなるのは嫌だな。ユーザー体験を壊さずに対応できるといいね!🤞




위키미디어 커먼즈의 대역폭 사용량이 50% 증가했다고? 😲 믿기지 않아! AI 크롤러들이 우리 데이터를 원하는 거겠지. 위키미디어가 정보를 공유해주는 건 좋지만, 이 때문에 느려지면 곤란해. 사용자 경험을 망치지 않고 해결할 방법을 찾았으면 좋겠어! 🤞




O uso de banda do Wikimedia Commons aumentou 50%? 😲 Isso é loucura! Acho que esses rastreadores de IA estão famintos pelos nossos dados. É legal que o Wikimedia nos mantenha informados, mas cara, isso vai atrasar tudo. Espero que eles encontrem uma maneira de lidar com isso sem estragar nossa experiência! 🤞




¿El uso de ancho de banda de Wikimedia Commons aumentó un 50%? 😲 ¡Eso es una locura! Supongo que esos rastreadores de IA están hambrientos de nuestros datos. Es genial que Wikimedia nos mantenga informados, pero hombre, esto va a ralentizar todo. Espero que encuentren una manera de manejarlo sin arruinar nuestra experiencia. 🤞




The surge in bandwidth demand by AI crawlers on Wikimedia Commons is insane! It's cool to see AI being used so extensively, but it's also a bit worrying. Hope they find a way to manage it without affecting the user experience too much. 🤔












