Chinese AI Censorship Exposed by Leaked Data
April 10, 2025
WillGarcía
8
China's use of AI to enhance its censorship capabilities has reached a new level, as revealed by a leaked database containing 133,000 examples of content flagged for sensitivity by the Chinese government. This sophisticated large language model (LLM) is designed to automatically detect and censor content related to a wide range of topics, from poverty in rural areas to corruption within the Communist Party and even subtle political satire.

This photo taken on June 4, 2019, shows the Chinese flag behind razor wire at a housing compound in Yengisar, south of Kashgar, in China’s western Xinjiang region.Image Credits:Greg Baker / AFP / Getty Images
According to Xiao Qiang, a researcher at UC Berkeley who specializes in Chinese censorship, this database is "clear evidence" that the Chinese government or its affiliates are using LLMs to bolster their repression efforts. Unlike traditional methods that depend on human moderators and keyword filtering, this AI-driven approach can significantly enhance the efficiency and precision of state-controlled information management.
The dataset, discovered by security researcher NetAskari on an unsecured Elasticsearch database hosted on a Baidu server, includes recent entries from December 2024. It's unclear who exactly created the dataset, but its purpose is evident: to train an LLM to identify and flag content related to sensitive topics such as pollution, food safety, financial fraud, labor disputes, and military matters. Political satire, especially when it involves historical analogies or references to Taiwan, is also a high-priority target.

Image Credits:Charles rollet
The training data includes various examples of content that could potentially stir social unrest, such as complaints about corrupt police officers, reports on rural poverty, and news about expelled Communist Party officials. The dataset also contains extensive references to Taiwan and military-related topics, with the Chinese word for Taiwan (台湾) appearing over 15,000 times.
The dataset's intended use is described as "public opinion work," a term that Michael Caster of Article 19 explains is typically associated with the Cyberspace Administration of China (CAC) and involves censorship and propaganda efforts. This aligns with Chinese President Xi Jinping's view of the internet as the "frontline" of the Communist Party's public opinion work.
This development is part of a broader trend of authoritarian regimes adopting AI technology for repressive purposes. OpenAI recently reported that an unidentified actor, likely from China, used generative AI to monitor social media and forward anti-government posts to the Chinese government. The same technology was also used to generate critical comments about a prominent Chinese dissident, Cai Xia.
While China's traditional censorship methods rely on basic algorithms to block blacklisted terms, the use of LLMs represents a significant advancement. These AI systems can detect even subtle criticism on a massive scale and continuously improve as they process more data.
"I think it's crucial to highlight how AI-driven censorship is evolving, making state control over public discourse even more sophisticated, especially at a time when Chinese AI models such as DeepSeek are making headwaves," Xiao Qiang told TechCrunch.
Related article
Analysis Reveals AI's Responses on China Vary by Language
Exploring AI Censorship: A Language-Based AnalysisIt's no secret that AI models from Chinese labs, such as DeepSeek, are subject to strict censorship rules. A 2023 regulation from China's ruling party explicitly prohibits these models from generating content that could undermine national unity or so
China Tops Global Rankings in Computer Vision Surveillance Research: CSET
A recent study from the Center for Security and Emerging Technology (CSET) has shed light on China's significant lead in the research of AI-related surveillance technologies. The report, titled **Trends in AI Research for the Visual Surveillance of Populations**, delves into how China's research sec
Eric Schmidt Opposes AGI Manhattan Project
In a policy paper released on Wednesday, former Google CEO Eric Schmidt, along with Scale AI CEO Alexandr Wang and Center for AI Safety Director Dan Hendrycks, advised against the U.S. launching a Manhattan Project-style initiative to develop AI systems with "superhuman" intelligence, commonly refer
Comments (30)
0/200
FrankMartínez
April 10, 2025 at 6:58:08 PM GMT
This app is eye-opening but kinda scary. It shows how AI is used for censorship in China, which is pretty intense. The database is huge, but navigating it feels clunky. It's a good wake-up call about AI's potential for harm, but the interface could use some work.
0
GregoryWilson
April 11, 2025 at 3:36:22 PM GMT
このアプリは目を開かせるけど、ちょっと怖いです。中国でのAIによる検閲の使い方を示していて、かなり強烈です。データベースは巨大ですが、操作がぎこちない感じがします。AIの害の可能性についての良い警告ですが、インターフェースは改善の余地がありますね。
0
RoyLopez
April 11, 2025 at 1:45:57 PM GMT
이 앱은 눈을 뜨게 하지만 좀 무섭네요. 중국에서 AI가 검열에 어떻게 사용되는지 보여주는데, 꽤 강렬해요. 데이터베이스는 거대하지만, 사용하기가 좀 어색해요. AI의 해악 가능성에 대한 좋은 경고지만, 인터페이스는 개선의 여지가 있어요.
0
MichaelDavis
April 11, 2025 at 8:03:39 PM GMT
Este aplicativo é revelador, mas um pouco assustador. Mostra como a IA é usada para censura na China, o que é bastante intenso. O banco de dados é enorme, mas navegar por ele parece desajeitado. É um bom alerta sobre o potencial de dano da IA, mas a interface poderia ser melhorada.
0
CharlesWhite
April 12, 2025 at 4:05:41 AM GMT
Esta aplicación abre los ojos pero da un poco de miedo. Muestra cómo se usa la IA para la censura en China, lo cual es bastante intenso. La base de datos es enorme, pero navegar por ella se siente torpe. Es una buena llamada de atención sobre el potencial de daño de la IA, pero la interfaz podría mejorar.
0
CarlLewis
April 16, 2025 at 7:23:03 AM GMT
The leaked data on Chinese AI censorship is pretty scary. It's like Big Brother on steroids! 😱 But I'm not surprised, just wish there was a way to fight back against this kind of control. Any ideas? 🤔
0






China's use of AI to enhance its censorship capabilities has reached a new level, as revealed by a leaked database containing 133,000 examples of content flagged for sensitivity by the Chinese government. This sophisticated large language model (LLM) is designed to automatically detect and censor content related to a wide range of topics, from poverty in rural areas to corruption within the Communist Party and even subtle political satire.
According to Xiao Qiang, a researcher at UC Berkeley who specializes in Chinese censorship, this database is "clear evidence" that the Chinese government or its affiliates are using LLMs to bolster their repression efforts. Unlike traditional methods that depend on human moderators and keyword filtering, this AI-driven approach can significantly enhance the efficiency and precision of state-controlled information management.
The dataset, discovered by security researcher NetAskari on an unsecured Elasticsearch database hosted on a Baidu server, includes recent entries from December 2024. It's unclear who exactly created the dataset, but its purpose is evident: to train an LLM to identify and flag content related to sensitive topics such as pollution, food safety, financial fraud, labor disputes, and military matters. Political satire, especially when it involves historical analogies or references to Taiwan, is also a high-priority target.
The training data includes various examples of content that could potentially stir social unrest, such as complaints about corrupt police officers, reports on rural poverty, and news about expelled Communist Party officials. The dataset also contains extensive references to Taiwan and military-related topics, with the Chinese word for Taiwan (台湾) appearing over 15,000 times.
The dataset's intended use is described as "public opinion work," a term that Michael Caster of Article 19 explains is typically associated with the Cyberspace Administration of China (CAC) and involves censorship and propaganda efforts. This aligns with Chinese President Xi Jinping's view of the internet as the "frontline" of the Communist Party's public opinion work.
This development is part of a broader trend of authoritarian regimes adopting AI technology for repressive purposes. OpenAI recently reported that an unidentified actor, likely from China, used generative AI to monitor social media and forward anti-government posts to the Chinese government. The same technology was also used to generate critical comments about a prominent Chinese dissident, Cai Xia.
While China's traditional censorship methods rely on basic algorithms to block blacklisted terms, the use of LLMs represents a significant advancement. These AI systems can detect even subtle criticism on a massive scale and continuously improve as they process more data.
"I think it's crucial to highlight how AI-driven censorship is evolving, making state control over public discourse even more sophisticated, especially at a time when Chinese AI models such as DeepSeek are making headwaves," Xiao Qiang told TechCrunch.




This app is eye-opening but kinda scary. It shows how AI is used for censorship in China, which is pretty intense. The database is huge, but navigating it feels clunky. It's a good wake-up call about AI's potential for harm, but the interface could use some work.




このアプリは目を開かせるけど、ちょっと怖いです。中国でのAIによる検閲の使い方を示していて、かなり強烈です。データベースは巨大ですが、操作がぎこちない感じがします。AIの害の可能性についての良い警告ですが、インターフェースは改善の余地がありますね。




이 앱은 눈을 뜨게 하지만 좀 무섭네요. 중국에서 AI가 검열에 어떻게 사용되는지 보여주는데, 꽤 강렬해요. 데이터베이스는 거대하지만, 사용하기가 좀 어색해요. AI의 해악 가능성에 대한 좋은 경고지만, 인터페이스는 개선의 여지가 있어요.




Este aplicativo é revelador, mas um pouco assustador. Mostra como a IA é usada para censura na China, o que é bastante intenso. O banco de dados é enorme, mas navegar por ele parece desajeitado. É um bom alerta sobre o potencial de dano da IA, mas a interface poderia ser melhorada.




Esta aplicación abre los ojos pero da un poco de miedo. Muestra cómo se usa la IA para la censura en China, lo cual es bastante intenso. La base de datos es enorme, pero navegar por ella se siente torpe. Es una buena llamada de atención sobre el potencial de daño de la IA, pero la interfaz podría mejorar.




The leaked data on Chinese AI censorship is pretty scary. It's like Big Brother on steroids! 😱 But I'm not surprised, just wish there was a way to fight back against this kind of control. Any ideas? 🤔












