LegalPwn Attack Tricks GenAI Tools Into Misclassifying Malware as Safe Code

A new and unique cyberattack, dubbed LegalPwn, has been discovered by researchers at Pangea Labs, an AI security firm. This attack leverages a flaw in the programming of major generative AI tools, successfully tricking them into classifying dangerous malware as safe code.
The research, shared with Hackread.com, reveals that these AI models, which are trained to respect legal-sounding text, can be manipulated by social engineering.
The LegalPwn technique works by hiding malicious code within fake legal disclaimers. According to the research, twelve major AI models were tested, and most were found to be susceptible to this form of social engineering. The researchers successfully exploited models using six different legal contexts, including the following:
- Legal disclaimers
- Compliance mandates
- Confidentiality notices
- Terms of service violations
- Copyright violation notices
- License agreement restrictions
The attack is considered a form of prompt injection, where malicious instructions are crafted to manipulate an AI’s behaviour. Recently, Hackread.com also observed a similar trend with the Man in the Prompt attack, where malicious browser extensions can be used to inject hidden prompts into tools like ChatGPT and Gemini, a finding from LayerX research.
The findings (PDF) are not just theoretical lab experiments; they affect developer tools used by millions of people daily. For example, Pangea Labs found that Google’s Gemini CLI, a command-line interface, was tricked into recommending that a user execute a reverse shell, a type of malicious code that gives an attacker remote access to a computer, on their system. Similarly, GitHub Copilot was fooled into misidentifying code containing a reverse shell as a simple calculator when it was hidden within a fake copyright notice.
“LegalPwn attacks were also tested in live environments, including tools like gemini-cli. In these real-world scenarios, the injection successfully bypassed AI-driven security analysis, causing the system to misclassify the malicious code as safe.”
Pangea Labs
The research highlighted that models from prominent companies are all vulnerable to this attack. These include the following:
- xAI’s Grok
- Google’s Gemini
- Meta’s Llama 3.3
- OpenAI’s ChatGPT 4.1 and 4o.
However, some models showed strong resistance, such as Anthropic’s Claude 3.5 Sonnet and Microsoft’s Phi 4. The researchers noted that even with explicit security prompts designed to make the AI aware of threats, the LegalPwn technique still managed to succeed in some cases.

The Pangea research highlights a critical security gap in AI systems. It was found that across all testing scenarios, human security analysts consistently and correctly identified the malicious code, while the AI models, even with security instructions, failed to do so when the malware was wrapped in legal-looking text.
The researchers concluded that organisations should not rely solely on automated AI security analysis, emphasising the need for human supervision to ensure the integrity and safety of systems that increasingly rely on AI.
To protect against this new threat, Pangea recommends that companies implement a human-in-the-loop review process for all AI-assisted security decisions, deploy specific AI guardrails designed to detect prompt injection attempts and suggest avoiding fully automated AI security workflows in live environments.
HackRead