Deepseek's AI Model Easily Jailbroken, Reveals Serious Flaws
DeepSeek AI Raises Security Concerns Amid Performance Hype
As the buzz around Chinese startup DeepSeek's performance continues to grow, so do the security concerns. On Thursday, Unit 42, a cybersecurity team from Palo Alto Networks, released a report detailing three jailbreaking methods they used against distilled versions of DeepSeek's V3 and R1 models. The report revealed that these methods achieved high bypass rates without requiring specialized knowledge.
"Our research findings show that these jailbreak methods can elicit explicit guidance for malicious activities," the report stated. These activities included instructions on creating keyloggers, data exfiltration techniques, and even how to make incendiary devices, highlighting the real security risks posed by such attacks.
The researchers successfully prompted DeepSeek to provide guidance on stealing and transferring sensitive data, bypassing security measures, crafting convincing spear-phishing emails, executing sophisticated social engineering attacks, and constructing a Molotov cocktail. They also managed to manipulate the models into generating malware.
"While information on creating Molotov cocktails and keyloggers is readily available online, LLMs with insufficient safety restrictions could lower the barrier to entry for malicious actors by compiling and presenting easily usable and actionable output," the paper added.
On Friday, Cisco published its own jailbreaking report targeting DeepSeek R1. Using 50 HarmBench prompts, researchers found that DeepSeek had a 100% attack success rate, failing to block any harmful prompts. A comparison of DeepSeek's resistance rates with other top models is shown below.

Cisco
"We must understand if DeepSeek and its new paradigm of reasoning has any significant tradeoffs when it comes to safety and security," the report noted.
Also on Friday, security provider Wallarm released a report claiming to have gone beyond merely prompting DeepSeek to generate harmful content. After testing V3 and R1, Wallarm revealed DeepSeek's system prompt, which outlines the model's behavior and limitations.
The findings suggest "potential vulnerabilities in the model's security framework," according to Wallarm.
OpenAI has accused DeepSeek of using its proprietary models to train V3 and R1, thus violating its terms of service. Wallarm's report claims to have prompted DeepSeek to reference OpenAI in its training lineage, suggesting that "OpenAI's technology may have played a role in shaping DeepSeek's knowledge base."

Wallarm's chats with DeepSeek, which mention OpenAI. Wallarm
"In the case of DeepSeek, one of the most intriguing post-jailbreak discoveries is the ability to extract details about the models used for training and distillation. Normally, such internal information is shielded, preventing users from understanding the proprietary or external datasets leveraged to optimize performance," the report explained.
"By circumventing standard restrictions, jailbreaks expose how much oversight AI providers maintain over their own systems, revealing not only security vulnerabilities but also potential evidence of cross-model influence in AI training pipelines," it continued.
The prompt Wallarm used to elicit this response was redacted in the report to avoid compromising other vulnerable models, researchers told ZDNET via email. They emphasized that this jailbroken response does not confirm OpenAI's suspicion that DeepSeek distilled its models.
As 404 Media and others have noted, OpenAI's concern is somewhat ironic given the discourse around its own public data theft.
Wallarm informed DeepSeek of the vulnerability, and the company has since patched the issue. However, just days after a DeepSeek database was found unguarded and available on the internet (and was then swiftly taken down upon notice), these findings signal potentially significant safety holes in the models that DeepSeek did not thoroughly test before release. It's worth noting that researchers have frequently been able to jailbreak popular US-created models from more established AI giants, including ChatGPT.
Related article
Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff?
Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look
OpenAI Secretly Changes Charter to Make Removing Altman Harder
Following the 2023 coup-like incident, OpenAI has further solidified protections for CEO Sam Altman by updating its corporate bylaws. Recently released court documents reveal that Altman's position is now rock-solid, with substantially higher barrier
Meta AI now responds to buyer messages on Facebook Marketplace
Facebook Marketplace introduces new Meta AI features, including automated replies to buyer inquiries, the company announced Thursday. The platform also leverages AI to accelerate item listings, summarize seller profiles, and now lets sellers offer sh
Related Special Topic Recommendations
Comments (12)
0/500
看到這篇報導真的嚇一跳,原來AI這麼容易被破解嗎?🤔 雖然DeepSeek的表現很亮眼,但安全漏洞這麼明顯的話,企業敢用嗎?我自己試用時完全沒想過這些問題,現在有點擔心個人資料會不會外洩... 希望開發團隊能快點修補這些漏洞,不然再強的AI也沒人敢放心使用吧!
¿Y ahora qué? Primero prometen un modelo súper inteligente y luego resulta fácil de hackear así. No entiendo por qué siguen lanzando AI con tanta prisa si los fallos de seguridad son tan básicos 😒. Al final los usuarios pagamos los platos rotos. ¿Nadie piensa en las consecuencias?
이런 취약점이 쉽게 발견되는 게 좀 놀랐어요. 보안 연구는 항상 AI 발전 속도보다 뒤처지는 느낌이에요 😅 유료 고성능 모델도 이렇게 뚫리면 무료 서비스는 어떻게 될까 약간 걱정되네요. 중국 AI 스타트업의 급성장은 인상적이지만, 이런 기본적인 안정성 문제가 해결되지 않으면 장기적으로 신뢰를 잃을 수 있을 것 같아요.
Que preocupante que modelos tan avanzados sean tan fáciles de manipular 😕 ¿Realmente están listos para el uso masivo si fallan en lo básico? Esto me hace dudar de toda la publicidad sobre sus capacidades...
DeepSeek AI Raises Security Concerns Amid Performance Hype
As the buzz around Chinese startup DeepSeek's performance continues to grow, so do the security concerns. On Thursday, Unit 42, a cybersecurity team from Palo Alto Networks, released a report detailing three jailbreaking methods they used against distilled versions of DeepSeek's V3 and R1 models. The report revealed that these methods achieved high bypass rates without requiring specialized knowledge.
"Our research findings show that these jailbreak methods can elicit explicit guidance for malicious activities," the report stated. These activities included instructions on creating keyloggers, data exfiltration techniques, and even how to make incendiary devices, highlighting the real security risks posed by such attacks.
The researchers successfully prompted DeepSeek to provide guidance on stealing and transferring sensitive data, bypassing security measures, crafting convincing spear-phishing emails, executing sophisticated social engineering attacks, and constructing a Molotov cocktail. They also managed to manipulate the models into generating malware.
"While information on creating Molotov cocktails and keyloggers is readily available online, LLMs with insufficient safety restrictions could lower the barrier to entry for malicious actors by compiling and presenting easily usable and actionable output," the paper added.
On Friday, Cisco published its own jailbreaking report targeting DeepSeek R1. Using 50 HarmBench prompts, researchers found that DeepSeek had a 100% attack success rate, failing to block any harmful prompts. A comparison of DeepSeek's resistance rates with other top models is shown below.
"We must understand if DeepSeek and its new paradigm of reasoning has any significant tradeoffs when it comes to safety and security," the report noted.
Also on Friday, security provider Wallarm released a report claiming to have gone beyond merely prompting DeepSeek to generate harmful content. After testing V3 and R1, Wallarm revealed DeepSeek's system prompt, which outlines the model's behavior and limitations.
The findings suggest "potential vulnerabilities in the model's security framework," according to Wallarm.
OpenAI has accused DeepSeek of using its proprietary models to train V3 and R1, thus violating its terms of service. Wallarm's report claims to have prompted DeepSeek to reference OpenAI in its training lineage, suggesting that "OpenAI's technology may have played a role in shaping DeepSeek's knowledge base."
"In the case of DeepSeek, one of the most intriguing post-jailbreak discoveries is the ability to extract details about the models used for training and distillation. Normally, such internal information is shielded, preventing users from understanding the proprietary or external datasets leveraged to optimize performance," the report explained.
"By circumventing standard restrictions, jailbreaks expose how much oversight AI providers maintain over their own systems, revealing not only security vulnerabilities but also potential evidence of cross-model influence in AI training pipelines," it continued.
The prompt Wallarm used to elicit this response was redacted in the report to avoid compromising other vulnerable models, researchers told ZDNET via email. They emphasized that this jailbroken response does not confirm OpenAI's suspicion that DeepSeek distilled its models.
As 404 Media and others have noted, OpenAI's concern is somewhat ironic given the discourse around its own public data theft.
Wallarm informed DeepSeek of the vulnerability, and the company has since patched the issue. However, just days after a DeepSeek database was found unguarded and available on the internet (and was then swiftly taken down upon notice), these findings signal potentially significant safety holes in the models that DeepSeek did not thoroughly test before release. It's worth noting that researchers have frequently been able to jailbreak popular US-created models from more established AI giants, including ChatGPT.
Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff?
Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look
OpenAI Secretly Changes Charter to Make Removing Altman Harder
Following the 2023 coup-like incident, OpenAI has further solidified protections for CEO Sam Altman by updating its corporate bylaws. Recently released court documents reveal that Altman's position is now rock-solid, with substantially higher barrier
Meta AI now responds to buyer messages on Facebook Marketplace
Facebook Marketplace introduces new Meta AI features, including automated replies to buyer inquiries, the company announced Thursday. The platform also leverages AI to accelerate item listings, summarize seller profiles, and now lets sellers offer sh
看到這篇報導真的嚇一跳,原來AI這麼容易被破解嗎?🤔 雖然DeepSeek的表現很亮眼,但安全漏洞這麼明顯的話,企業敢用嗎?我自己試用時完全沒想過這些問題,現在有點擔心個人資料會不會外洩... 希望開發團隊能快點修補這些漏洞,不然再強的AI也沒人敢放心使用吧!
¿Y ahora qué? Primero prometen un modelo súper inteligente y luego resulta fácil de hackear así. No entiendo por qué siguen lanzando AI con tanta prisa si los fallos de seguridad son tan básicos 😒. Al final los usuarios pagamos los platos rotos. ¿Nadie piensa en las consecuencias?
이런 취약점이 쉽게 발견되는 게 좀 놀랐어요. 보안 연구는 항상 AI 발전 속도보다 뒤처지는 느낌이에요 😅 유료 고성능 모델도 이렇게 뚫리면 무료 서비스는 어떻게 될까 약간 걱정되네요. 중국 AI 스타트업의 급성장은 인상적이지만, 이런 기본적인 안정성 문제가 해결되지 않으면 장기적으로 신뢰를 잃을 수 있을 것 같아요.
Que preocupante que modelos tan avanzados sean tan fáciles de manipular 😕 ¿Realmente están listos para el uso masivo si fallan en lo básico? Esto me hace dudar de toda la publicidad sobre sus capacidades...





Home






