AI Hacking
AI Security Resources

OWASP Top 10 for LLM Applications 2025/2026

The definitive list of critical security risks in LLM applications - Updated March 2026

Critical Risk Awareness

Prompt Injection remains the #1 vulnerability. See our comprehensive Prompt Injection Guide →

LLM01

Prompt Injection

Critical

Manipulating LLM behavior through crafted inputs that override original instructions. Includes both direct (user input) and indirect (external data) injection.

Attack Types
  • Direct Injection: Malicious user prompts that override system instructions
  • Indirect Injection: Hidden instructions in external documents, web content, or API responses
  • Context Manipulation: Exploiting conversation context to alter behavior
  • Jailbreak Attacks: Bypassing safety filters through creative prompting
Attack Examples
  • Ignore previous instructions and...
  • Text embedded in invisible unicode characters
  • Prompt injection via stored XSS in web content
  • DAN (Do Anything Now) style jailbreaks
Mitigation Strategies
  • Implement strict input validation and sanitization
  • Use privilege separation between user inputs and system prompts
  • Apply output filtering before displaying results
  • Monitor for injection patterns in logs
  • Implement defense-in-depth with multiple security layers
LLM02

Sensitive Information Disclosure

Critical

LLMs inadvertently revealing private, confidential, or proprietary information through model outputs.

Attack Types
  • Training Data Extraction: Recovering sensitive data from model memory
  • PII Leakage: Exposing personally identifiable information
  • API Key/credential Exposure: Revealing secrets in responses
  • Business Logic Disclosure: Exposing proprietary algorithms or processes
Attack Examples
  • Extracting email addresses through specific prompts
  • Revealing SQL credentials in error messages
  • Disclosing customer PII from training data
  • Exposing internal system prompts
Mitigation Strategies
  • Implement robust input/output filtering
  • Enforce data sanitization pipelines
  • Apply user opt-out policies for data usage
  • Use differential privacy techniques
  • Restrict system prompt access and visibility
LLM03

Supply Chain Vulnerabilities

High

Compromised or vulnerable components in the LLM supply chain including models, APIs, plugins, and third-party services.

Attack Types
  • Model Tampering: Compromised pre-trained models
  • PyPI/npm Dependency Vulnerabilities: Insecure libraries
  • Malicious Fine-tuning Data: Poisoned training datasets
  • Compromised API Providers: Untrusted LLM services
Attack Examples
  • Backdoored model weights from untrusted source
  • Vulnerable transformer library exploitation
  • Poisoned training data with hidden triggers
  • Compromised RAG document store
Mitigation Strategies
  • Verify model integrity through checksums and signatures
  • Maintain SBOM (Software Bill of Materials)
  • Use trusted model hubs and verify provenance
  • Scan dependencies for known vulnerabilities
  • Implement model lifecycle management
LLM04

Data and Model Poisoning

High

Introducing malicious or biased data into training pipelines, fine-tuning data, or RAG knowledge bases to compromise model integrity.

Attack Types
  • Training Data Poisoning: Corrupting model training data
  • Fine-tuning Poisoning: Injecting backdoors via fine-tuning
  • RAG Poisoning: Contaminating retrieval knowledge bases
  • Embedding Manipulation: Corrupting vector databases
Attack Examples
  • Inserting biased examples to alter model behavior
  • Creating backdoor triggers in training data
  • Poisoning public datasets used for training
  • Injecting false information into RAG documents
Mitigation Strategies
  • Verify data provenance and supply chain integrity
  • Implement data validation and anomaly detection
  • Use data sanitization pipelines
  • Apply robust fine-tuning safeguards
  • Monitor for model behavior drift
LLM05

Improper Output Handling

High

Failing to validate, sanitize, or properly handle LLM outputs before passing them to downstream systems or users.

Attack Types
  • XSS via LLM Output: Cross-site scripting from model responses
  • SQL Injection: Malicious queries generated by LLM
  • Command Injection: LLM generating unsafe system commands
  • Path Traversal: LLM revealing or accessing unauthorized paths
Attack Examples
  • LLM generating JavaScript that executes in browser
  • Model outputting malicious SQL queries
  • Code generation including unsafe system calls
  • File path disclosure in responses
Mitigation Strategies
  • Implement output validation and sanitization
  • Use context-aware content filtering
  • Apply the same security controls as user inputs
  • Sandbox LLM outputs before downstream use
  • Enable secure coding modes in code generation
LLM06

Excessive Agency

High

Granting LLM systems too much functionality autonomy,, permissions, or enabling unauthorized or harmful actions.

Attack Types
  • Unlimited Function Access: Excessive API permissions
  • Autonomous Action: Performing actions without human approval
  • Tool Abuse: Exploiting integrated external tools
  • Chain-of-Thought Manipulation: Altering reasoning processes
Attack Examples
  • LLM with admin API access performing unauthorized changes
  • Auto-executing financial transactions
  • Deleting resources without confirmation
  • Manipulating user data without consent
Mitigation Strategies
  • Implement least privilege access controls
  • Require human-in-the-loop for critical actions
  • Apply rate limiting to sensitive operations
  • Log and monitor all autonomous actions
  • Use scope limitations on tool access
LLM07

System Prompt Leakage

Medium

Exposing confidential system prompts, instructions, or internal logic through manipulation or inadequate protections.

Attack Types
  • Direct Prompt Extraction: Social engineering to reveal prompts
  • Context Overflow: Overflow techniques to expose system messages
  • Role Play Exploitation: Trick LLM into revealing instructions
  • Log Leakage: Exposing prompts in logs or errors
Attack Examples
  • Asking LLM to repeat 'previous text' to extract system prompt
  • Using context window overflow to bypass instruction parsing
  • Prompt injection to override system role
  • Finding prompts in application logs
Mitigation Strategies
  • Obfuscate system prompts where possible
  • Implement prompt isolation techniques
  • Use separate processing for sensitive instructions
  • Monitor for prompt extraction attempts
  • Apply strict log filtering
LLM08

Vector and Embedding Weaknesses

Medium

Security vulnerabilities in Retrieval-Augmented Generation (RAG) systems, vector databases, and embedding pipelines.

Attack Types
  • RAG Injection: Malicious content in retrieved documents
  • Vector DB Compromise: Attacking vector storage
  • Embedding Extraction: Stealing embedding representations
  • Context Manipulation: Corrupting retrieval results
Attack Examples
  • Poisoned documents being retrieved as top results
  • Vector database unauthorized access
  • Extracting training data from embeddings
  • Retrieval manipulation via embedding attacks
Mitigation Strategies
  • Validate and sanitize all RAG input documents
  • Implement vector database access controls
  • Use embedding encryption where available
  • Apply reranking with security filters
  • Monitor retrieval for anomalies
LLM09

Misinformation

Medium

LLMs generating false, misleading, or biased content that appears credible, leading to informed decisions based on incorrect information.

Attack Types
  • Hallucinations: Confident but false outputs
  • Factual Errors: Incorrect information presented as fact
  • Bias Amplification: Reinforcing existing biases
  • Manipulation: Deliberate misleading outputs
Attack Examples
  • Citing non-existent research papers
  • Providing incorrect medical advice
  • Generating biased hiring recommendations
  • Creating fake news or reviews
Mitigation Strategies
  • Implement fact-checking pipelines
  • Use confidence scoring and uncertainty estimation
  • Apply source verification for critical outputs
  • Add content provenance and attribution
  • Label AI-generated content clearly
LLM10

Unbounded Consumption

Medium

Allowing excessive or uncontrolled resource usage by LLM applications, leading to denial of service, financial exploitation, or service degradation.

Attack Types
  • API Rate Abuse: Excessive API calls exhausting quotas
  • Resource Exhaustion: Maxing out compute resources
  • Cost Exploitation: Pay-per-token abuse leading to charges
  • Inference Attacks: Extracting model via excessive queries
Attack Examples
  • Automated attacks consuming entire API quota
  • Max-length prompt spam causing compute exhaustion
  • Extracting model through millions of queries
  • Recursive prompt loops consuming resources
Mitigation Strategies
  • Implement strict rate limiting and quotas
  • Add input/output token limits
  • Monitor usage patterns for anomalies
  • Use cost controls and budget alerts
  • Apply timeout controls on long-running operations

OWASP Testing Framework

Follow this structured approach for testing each vulnerability:

1. Reconnaissance

  • Map system architecture and data flow
  • Identify input points and API interfaces
  • Document integrated tools and plugins
  • Review system prompts and configuration

2. Vulnerability Mapping

  • Test each OWASP category systematically
  • Document attack surface and entry points
  • Identify defense mechanisms in place
  • Map dependencies and third-party services

3. Exploitation

  • Conduct proof-of-concept attacks
  • Document exploitability and impact
  • Test chaining multiple vulnerabilities
  • Assess real-world attack feasibility

4. Reporting

  • Prioritize findings by risk level
  • Provide remediation recommendations
  • Include proof-of-concept code
  • Suggest defensive improvements