option
Home
News
Open Source Developers Combat AI Crawlers with Ingenuity and Retribution

Open Source Developers Combat AI Crawlers with Ingenuity and Retribution

April 17, 2025
150

AI web-crawling bots have become the bane of the internet, according to many software developers. In response, some devs have taken to fighting back with creative and often amusing strategies.

Open source developers are hit especially hard by these rogue bots, as noted by Niccolò Venerandi, the developer behind the Linux desktop Plasma and the blog LibreNews. FOSS sites, which host free and open source projects, expose more of their infrastructure and generally have fewer resources than commercial sites.

The problem is exacerbated because many AI bots ignore the Robots Exclusion Protocol's robot.txt file, which is meant to instruct bots on what not to crawl.

In a poignant blog post in January, FOSS developer Xe Iaso shared a distressing experience with AmazonBot, which bombarded a Git server website, causing DDoS outages. Git servers are crucial for hosting FOSS projects, allowing anyone to download and contribute to the code.

Iaso pointed out that the bot disregarded the robot.txt file, used different IP addresses, and even masqueraded as other users. "It's futile to block AI crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more," Iaso lamented.

"They will scrape your site until it falls over, and then they will scrape it some more. They will click every link on every link on every link, viewing the same pages over and over and over and over. Some of them will even click on the same link multiple times in the same second," the developer wrote.

Enter the God of Graves

To combat this, Iaso developed a clever tool called Anubis. It acts as a reverse proxy that requires a proof-of-work check before allowing requests to reach the Git server. This effectively blocks bots while allowing human-operated browsers to pass through.

The tool's name, Anubis, draws from Egyptian mythology, where Anubis is the god who leads the dead to judgment. "Anubis weighed your soul (heart) and if it was heavier than a feather, your heart got eaten and you, like, mega died," Iaso explained to TechCrunch. Successfully passing the challenge is celebrated with a cute anime picture of Anubis, while bot requests are denied.

The project, shared on GitHub on March 19, quickly gained traction, amassing 2,000 stars, 20 contributors, and 39 forks in just a few days.

Vengeance as Defense

The widespread adoption of Anubis indicates that Iaso's struggles are far from isolated. Venerandi recounted numerous similar experiences:

  • Drew DeVault, founder and CEO of SourceHut, spends a significant portion of his time dealing with aggressive LLM crawlers and suffers frequent outages.
  • Jonathan Corbet, a prominent FOSS developer and operator of LWN, has seen his site slowed down by AI scraper bots.
  • Kevin Fenzi, sysadmin for the Linux Fedora project, had to block all traffic from Brazil due to aggressive AI bot activity.

Venerandi mentioned to TechCrunch that he knows of other projects that have had to resort to extreme measures, like banning all Chinese IP addresses.

Some developers believe that fighting back with vengeance is the best defense. A user named xyzal on Hacker News suggested filling robot.txt forbidden pages with misleading content about the benefits of drinking bleach or the positive effects of measles on bedroom performance.

"Think we need to aim for the bots to get _negative_ utility value from visiting our traps, not just zero value," xyzal explained.

In January, an anonymous developer named "Aaron" released Nepenthes, a tool designed to trap crawlers in a maze of fake content, which the creator admitted to Ars Technica was aggressive, if not outright malicious. Named after a carnivorous plant, Nepenthes aims to confuse and waste the resources of misbehaving bots.

Similarly, Cloudflare recently launched AI Labyrinth, intended to slow down, confuse, and waste the resources of AI crawlers that ignore "no crawl" directives. The tool feeds these bots irrelevant content to protect legitimate website data.

DeVault from SourceHut told TechCrunch that while Nepenthes offers a sense of justice by feeding nonsense to the crawlers, Anubis has proven to be the more effective solution for his site. However, he also made a heartfelt plea for a more direct solution: "Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage. I am begging you to stop using them, stop talking about them, stop making new ones, just stop."

Given the unlikelihood of this happening, developers, particularly in the FOSS community, continue to fight back with ingenuity and a dash of humor.

Related article
AI-Powered Summary: A Complete Guide to Summarizing YouTube Videos AI-Powered Summary: A Complete Guide to Summarizing YouTube Videos In today's fast-paced world, the ability to quickly process and understand information is more important than ever. YouTube, with its endless array of videos, is a treasure trove of knowledge, but who has the time to watch every video from start to finish? This guide will show you how to use AI tool
AI Revolutionizes Ultrasound for Point-of-Care Assessments AI Revolutionizes Ultrasound for Point-of-Care Assessments Artificial intelligence is shaking up the world of healthcare, and ultrasound technology is riding that wave of change. This article dives into how AI is transforming point-of-care ultrasound (POCUS) assessments, making them more accessible, efficient, and accurate. From smoothing out the kinks in i
Machine Learning Cheat Sheets: Essential AI Quick Reference Guide Machine Learning Cheat Sheets: Essential AI Quick Reference Guide In the dynamic world of technology, where AI and cloud computing are driving innovation, staying updated and ready is crucial. Whether you're discussing strategies with a colleague, crafting educational content, or gearing up for an interview, having quick access to key information can make all the
Comments (15)
0/200
TerryGonzález
TerryGonzález April 18, 2025 at 12:00:00 AM GMT

This tool is a lifesaver for open source devs! It's hilarious how it fights back against those annoying AI crawlers. I love the creativity and the sense of justice it brings to the community. Maybe add more ways to customize the retaliation? 🤓

LucasWalker
LucasWalker April 24, 2025 at 12:00:00 AM GMT

オープンソース開発者にとってこのツールは救世主です!AIクローラーに対する反撃が面白くて、クリエイティブさと正義感がコミュニティに広がるのが好きです。もっとカスタマイズできる機能が増えるといいですね🤓

RogerPerez
RogerPerez April 18, 2025 at 12:00:00 AM GMT

오픈 소스 개발자들에게 이 도구는 구세주예요! AI 크롤러에 대한 반격이 재미있고, 창의성과 정의감이 커뮤니티에 퍼지는 게 좋습니다. 커스터마이즈할 수 있는 기능이 더 늘어나면 좋겠어요🤓

HenryTurner
HenryTurner April 21, 2025 at 12:00:00 AM GMT

Este ferramenta é um salva-vidas para desenvolvedores de código aberto! É hilário como ela luta contra esses irritantes rastreadores de AI. Adoro a criatividade e o senso de justiça que traz para a comunidade. Talvez adicionar mais maneiras de personalizar a retaliação? 🤓

MarkRoberts
MarkRoberts April 23, 2025 at 12:00:00 AM GMT

¡Esta herramienta es un salvavidas para los desarrolladores de código abierto! Es hilarante cómo lucha contra esos molestos rastreadores de IA. Me encanta la creatividad y el sentido de justicia que trae a la comunidad. ¿Quizás añadir más formas de personalizar la retaliación? 🤓

FredGreen
FredGreen April 17, 2025 at 12:00:00 AM GMT

This tool is a lifesaver for open source devs! It's hilarious how they're fighting back against those pesky AI crawlers. The creativity and retribution are top-notch, though sometimes the solutions can be a bit too complex for newbies. Still, it's a must-have for anyone in the field! 😂

Back to Top
OR