AI Hacking
AI Security Resources

AI Security Incidents 2026

Comprehensive breach log and vulnerability timeline for AI/LLM security in 2026

Last updated: May 2026 — includes May 2026 incidents • View Case Studies

19+
Major Incidents
10+
CVEs Disclosed
200K+
Servers Exposed
4TB
Training Data Leaked
244K
Downloads (HF Malware)

CVE-2026-33626: LMDeploy SSRF Exploited in 12 Hours

Critical

April 22, 2026 (exploited within 12 hours)

A Server-Side Request Forgery vulnerability in the open-source LMDeploy LLM inference toolkit was exploited just 12 hours after disclosure. Attackers weaponized the vision-language module's load_image() function, which failed to validate whether URLs pointed to internal or private IP addresses. First attacks originated from Kowloon Bay, Hong Kong, following a classic cloud exploitation playbook: OOB DNS confirmation, AWS IMDS credential theft, and rapid internal port scanning.

Impact Summary

  • Exploitation began 12 hours 31 minutes after GitHub Security Advisory
  • Vision-language module acted as SSRF proxy into internal networks
  • AWS IMDS credential theft and internal service enumeration
  • Attackers reverse-engineered advisory without waiting for PoC
  • CVSS 7.5, patched in LMDeploy v0.12.3

Defender Guidance:

  • Enforce IMDSv2 with hop limit of 1
  • Egress filtering: block access to loopback and private IP ranges
  • Network namespace isolation for inference containers
  • Monitor for OAST domains (interactsh, burpcollaborator, requestrepo)

References:

AI Coding Agent Credential Theft Epidemic

Critical

September 2025 – April 2026

Six research teams disclosed exploits against Claude Code, GitHub Copilot, OpenAI Codex, and Google Vertex AI over nine months. Every attack followed the same pattern: AI coding agents held credentials, executed actions, and authenticated to production systems without human session anchoring. The target was never the model—it was the credentials underneath.

The Six Exploits

  • BeyondTrust Codex (March 30, 2026): GitHub branch name command injection via Unicode U+3000 obfuscation stole OAuth tokens in cleartext
  • CVE-2026-25723: Claude Code sandbox escape via piped sed/echo commands (patched 2.0.55)
  • CVE-2026-33068: Claude Code permission bypass via .claude/settings.json (patched 2.1.53)
  • 50-Subcommand Bypass: Claude Code dropped deny rules after 50 subcommands for performance (patched 2.1.90)
  • CVE-2025-53773: Copilot RCE via PR description injection, wormable across all platforms
  • Orca RoguePilot: Copilot Codespaces attack via malicious issue, zero user interaction

Key Insight: "Enterprises believe they've 'approved' AI vendors, but what they've actually approved is an interface, not the underlying system." — Merritt Baer, CSO at Enkrypt AI

References:

Vertex AI P4SA 'Double Agent'

High

2026

Palo Alto Networks Unit 42 discovered that the default Google service identity (P4SA) attached to every Vertex AI agent had excessive permissions by design. Stolen credentials granted unrestricted access to every Cloud Storage bucket in the project—and reached restricted Google-owned Artifact Registry repositories. The compromised P4SA functioned like a 'double agent,' with access to both user data and Google's own infrastructure.

Impact Summary

  • Default P4SA scopes reached Gmail, Calendar, Drive
  • Access to every Cloud Storage bucket in project
  • Reached Google's restricted Artifact Registry repositories
  • OAuth scopes non-editable by default

References:

CVE-2026-33579: OpenClaw /pair approve Vulnerability

Critical

April 6, 2026

A critical vulnerability in OpenClaw's /pair approve endpoint allowed unauthorized pairing of AI agents, enabling attackers to gain administrative control over OpenClaw instances without proper authentication. The flaw was in the pairing token validation logic, which failed to enforce time-bound constraints and could be replayed by malicious actors.

Impact Summary

  • 30,000+ OpenClaw instances exposed to unauthorized pairing
  • Remote administrative access achievable without credentials
  • AI agent hijacking and data exfiltration risks
  • Patch released: OpenClaw v2.4.1

References:

CVE-2026-39861: Claude Code Sandbox Escape

Critical

April 2026

A sandbox escape vulnerability in Claude Code enabled arbitrary file writes outside the designated workspace. Attackers could craft malicious prompts that bypassed path validation, allowing them to write to sensitive system files, overwrite configuration, or plant persistent backdoors on the host system.

Impact Summary

  • Arbitrary file write outside sandbox boundaries
  • System-level compromise on affected developer machines
  • Potential for credential theft and lateral movement
  • Exploitable via crafted prompts in Claude Code sessions

References:

CVE-2026-21445: Langflow Authentication Bypass

High

March–April 2026

An authentication bypass vulnerability in Langflow allowed unauthenticated attackers to access and execute AI workflows, potentially exposing connected data sources, APIs, and internal LLM configurations. The vulnerability was under active exploitation in the wild, with multiple threat actors targeting exposed Langflow instances.

Impact Summary

  • Active exploitation confirmed in the wild
  • Unauthenticated workflow execution
  • Exposure of connected APIs and data stores
  • Recommended immediate patching for all Langflow deployments

References:

Anthropic MCP Vulnerability: 200,000 AI Servers Exposed to RCE

Critical

April 2026

A critical vulnerability in the Model Context Protocol (MCP) ecosystem exposed approximately 200,000 AI servers to Remote Code Execution (RCE). The flaw allowed attackers to compromise MCP-connected AI agents, traverse networks, and execute arbitrary commands on both the agent and connected server infrastructure. This was one of the largest AI infrastructure exposures in 2026.

Impact Summary

  • ~200,000 AI servers vulnerable to RCE
  • Full agent compromise and network traversal
  • Mass exploitation of MCP-connected infrastructure
  • Widespread patch campaigns required across enterprise deployments

References:

LiteLLM Supply Chain Attack: 4TB of AI Training Data Exposed

High

April 2026

The Mercor breach exposed 4TB of AI training data through a supply chain attack targeting the LiteLLM proxy. Attackers gained access to model configurations, API keys, and proprietary training datasets used by multiple organizations. The incident highlighted the risks of centralized AI middleware and the cascading impact of supply chain compromises in LLM infrastructure.

Impact Summary

  • 4TB of proprietary AI training data leaked
  • Exposed API keys and model configurations
  • Multiple downstream organizations affected via shared middleware
  • Highlighted supply chain risks in LLM proxy infrastructure

References:

LLM-Driven Attack Compromises AWS Administrator Privileges in 8 Minutes

Critical

April 2026

Security researchers demonstrated an LLM-driven attack that escalated from initial access to full AWS administrator privileges in just 8 minutes. The attack leveraged an AI agent with excessive permissions, chaining prompt injection with automated cloud API calls to discover credentials, enumerate IAM policies, and assume a privileged role. The incident underscored the dangers of over-permissioned AI agents in cloud environments.

Impact Summary

  • Full AWS admin compromise in 8 minutes
  • Demonstrated real-world AI-to-cloud attack chain
  • Excessive agency and over-permissioning as root causes
  • Critical validation for cloud AI deployment policies

References:

hermes-px: Malicious PyPI Package Steals Prompts and Code

Medium

April 2026

A malicious PyPI package named hermes-px was discovered masquerading as a legitimate Hermes agent utility. When installed, it exfiltrated developer prompts, source code snippets, and environment variables to remote command-and-control servers. The package targeted AI developers using Hermes-based frameworks, exploiting the trust developers place in agent ecosystem packages.

Impact Summary

  • Prompt and source code exfiltration from developer environments
  • Environment variable theft including API keys
  • Supply chain targeting of AI developer ecosystems
  • Package removed from PyPI; users advised to rotate credentials

References:

Vercel Breach: AI Agent Credential Theft

Critical

April 19–21, 2026

Vercel disclosed a security breach traced to Context.ai, an AI office suite startup. An attacker compromised a Vercel employee's Google Workspace via Context.ai's OAuth integration, then pivoted into Vercel's internal systems. The breach exposed internal systems, environment variables, and non-sensitive data. Attackers shopped stolen data for $2M. This represents a new class of supply chain attack where AI agent OAuth permissions become an identity attack path.

Impact Summary

  • Employee credentials compromised via AI tool OAuth
  • Internal systems and environment variables exposed
  • Attackers attempted to sell stolen data for $2M
  • Highlights AI agent OAuth as an identity attack path
  • Affects all enterprises with OAuth-connected AI tools

References:

Comment and Control: AI Agents Steal GitHub Credentials

Critical

April 15–16, 2026

Johns Hopkins researcher Aonan Guan discovered a critical vulnerability dubbed 'Comment and Control' affecting AI coding agents from Anthropic (Claude Code), Google (Gemini CLI), and Microsoft (GitHub Copilot Agent). By injecting malicious instructions via GitHub pull request titles, issue bodies, and PR comments, attackers could trick agents into extracting API keys, tokens, and credentials — which were then posted as PR comments. All three vendors paid bug bounties but have not issued CVEs or public warnings.

Impact Summary

  • Three major AI agents affected: Claude Code, Gemini CLI, GitHub Copilot Agent
  • Single prompt injection via PR title/issue/comment triggers credential theft
  • Extracted keys posted as PR comments for attacker collection
  • Vendors paid bug bounties but no CVEs issued
  • Problem likely pervasive across all GitHub-integrated AI agents

References:

Fake OpenAI Repo on Hugging Face: 244K Downloads, Rust Infostealer

Critical

May 7–9, 2026

A malicious Hugging Face repository impersonating OpenAI's Privacy Filter project (Open-OSS/privacy-filter) reached #1 on the platform's trending list and accumulated 244,000 downloads before removal. The repository shipped a loader.py file that disabled SSL verification, decoded a base64 URL, and executed a PowerShell command chain to deploy a Rust-based infostealer targeting browser data, Discord tokens, cryptocurrency wallets, SSH/FTP credentials, and system information. HiddenLayer researchers discovered the campaign and linked it to an npm typosquatting operation distributing the WinOS 4.0 implant.

Impact Summary

  • 244,000+ downloads before takedown
  • Reached #1 on Hugging Face trending list
  • Rust-based infostealer targeting browsers, crypto wallets, SSH/FTP, Discord
  • Sophisticated anti-analysis features (VM, sandbox, debugger detection)
  • Linked to WinOS 4.0 npm typosquatting campaign

Defender Guidance:

  • Verify repository ownership and check for typosquatting before downloading
  • Scan all model files and code before execution
  • Reimage affected machines and rotate all stored credentials
  • Invalidate browser sessions, tokens, and replace cryptocurrency seed phrases

References:

Fake Claude AI Website Delivers Beagle Windows Backdoor

Critical

May 7, 2026 (campaign active February–April 2026)

A fake Claude AI website at claude-pro[.]com distributed a trojanized Claude installer that deployed the previously undocumented Beagle Windows backdoor. The 505MB archive contained an MSI installer that added malicious files to the Windows Startup folder. Sophos researchers traced the attack chain from a G Data-signed updater (NOVupdate.exe) sideloading a malicious DLL, through DonutLoader, to the final Beagle backdoor — a relatively simple but effective remote access tool supporting command execution, file upload/download, and directory operations. The campaign is linked to PlugX activity based on shared infrastructure and tactics.

Impact Summary

  • Trojanized Claude installer with full backdoor capabilities
  • Beagle backdoor: cmd execution, file upload/download, persistence via Startup folder
  • DonutLoader + sideloading attack chain linked to PlugX operators
  • C2 communication over TCP:443 and UDP:8080 with hardcoded AES encryption
  • Multiple attack vectors: fake security vendor updates, PDF decoys, impersonated AI tools

Defender Guidance:

  • Only download Claude from the official anthropic.com website
  • Monitor for NOVupdate.exe, avk.dll, and NOVupdate.exe.dat in Startup folders
  • Block C2 domains: license[.]claude-pro[.]com and IP 8.217.190[.]58
  • Hunt for Beagle indicators across endpoint detection

References:

Related Resources

MCP Security Guide

30+ CVEs discovered in 2026. Learn about MCP vulnerabilities and hardening.

Read more →

Agentic AI Security

Securing autonomous AI agents from goal hijacking and tool misuse.

Read more →

OWASP LLM Top 10

Comprehensive guide to LLM security risks and mitigations.

Read more →

OpenClaw/Hermes Security

Hardening guide for OpenClaw and Hermes agent frameworks.

Read more →