AI Hacking
AI Security Resources

AI Security Glossary

Comprehensive terminology for AI security testing and LLM safety

Prompt Injection

Attack technique where malicious user input manipulates LLM behavior to override system instructions or extract sensitive data.

Direct Injection

Prompt injection through direct user input to override system prompts.

Indirect Injection

Hidden malicious instructions in external data (documents, web content) processed by the LLM.

Jailbreak

Techniques to bypass LLM safety filters and generate prohibited content.

Prompt Leakage

Exposure of system prompts or confidential instructions through manipulation.

System Prompt

Hidden instructions that define LLM behavior, persona, and constraints.

Few-Shot Learning

Providing examples in prompts to guide LLM output without fine-tuning.

Chain-of-Thought (CoT)

Prompting technique encouraging LLMs to show reasoning steps.

Prompt Engineering

The practice of crafting inputs to get desired outputs from LLMs.

Transformer

Deep learning architecture underlying modern LLMs using attention mechanisms.

Attention Mechanism

Technique allowing models to focus on relevant parts of input when generating output.

Token

Basic unit of text processed by LLMs (typically subword fragments).

Tokenization

Process of converting text into tokens for model processing.

Context Window

Maximum input length an LLM can process in a single request.

Temperature

Parameter controlling randomness in LLM output generation.

Top-k Sampling

Generation technique limiting token selection to top k probable options.

Top-p Sampling

Nucleus sampling - selecting from smallest set of tokens exceeding probability threshold.

RAG (Retrieval-Augmented Generation)

Architecture combining LLM with external knowledge retrieval for grounded outputs.

Vector Embedding

Numerical representation of text capturing semantic meaning in high-dimensional space.

Vector Database

Storage system optimized for similarity search on embeddings.

Semantic Search

Search method based on meaning rather than exact keyword matching.

Chunking

Dividing documents into smaller segments for embedding and retrieval.

Embedding Model

Model converting text to vector representations.

Cross-Encoder

Model scoring query-document relevance by processing both together.

Fine-tuning

Adapting pre-trained models to specific tasks with additional training.

Transfer Learning

Applying knowledge from one task to improve performance on another.

LoRA (Low-Rank Adaptation)

Efficient fine-tuning technique adding small trainable matrices.

RLHF

Reinforcement Learning from Human Feedback - technique for aligning models.

Instruction Tuning

Fine-tuning on instruction-response pairs for better following directions.

Continual Pre-training

Further training base models on new data without task-specific labels.

Data Poisoning

Malicious data injection into training sets to introduce vulnerabilities or biases.

Model Inversion

Attack reconstructing training data from model outputs or parameters.

Membership Inference

Determining if specific data was used in model training.

Model Extraction

Stealing model functionality through repeated API queries.

Adversarial Example

Input specifically crafted to cause model misclassification.

Differential Privacy

Mathematical framework for protecting individual data in datasets.

Hallucination

LLM generating confident but factually incorrect content.

PII (Personally Identifiable Information)

Data that can identify an individual (names, emails, etc.).

Data Leakage

Unintended exposure of sensitive information through model outputs.

AI Agent

Autonomous system that can plan, execute actions, and iterate on goals.

Tool Use

LLM capability to invoke external functions or APIs to accomplish tasks.

Function Calling

Structured mechanism for LLMs to invoke predefined functions.

MCP (Model Context Protocol)

Protocol enabling LLMs to interact with external tools and data sources.

ReAct (Reasoning + Acting)

Prompting method combining reasoning traces with action execution.

Agent Loop

Continuous cycle of thought-action-observation in autonomous agents.

Chain-of-Tools

Sequencing multiple tool invocations to accomplish complex tasks.

Excessive Agency

Security risk when AI systems have too much autonomy or permissions.

Red Teaming

Adversarial testing to find vulnerabilities through simulated attacks.

Fuzz Testing

Automated testing with random/invalid inputs to find vulnerabilities.

Benchmark

Standardized test for evaluating model performance on specific tasks.

Evaluation Dataset

Curated set of inputs used to assess model capabilities or safety.

Guardrail

System or code preventing LLM from generating unsafe outputs.

Content Filter

Mechanism blocking generation of prohibited content types.

Bleed

Information leaking between conversation sessions or contexts.

SBOM (Software Bill of Materials)

List of software components for security and compliance tracking.

AIBOM (AI Bill of Materials)

Inventory of AI model components including data, weights, and dependencies.

EU AI Act

European Union regulation on artificial intelligence risk categories.

NIST AI RMF

NIST AI Risk Management Framework for trustworthy AI systems.

Model Card

Documentation detailing model capabilities, limitations, and performance.

Data Sheet

Documentation describing training data characteristics and provenance.

Responsible AI

Framework for ethical AI development and deployment practices.

DoS (Denial of Service)

Attack exhausting resources to make service unavailable.

Prompt Chaining

Multi-step attacks using sequential prompts to achieve objectives.

Role Playing

Technique where LLM adopts a persona, potentially bypassing restrictions.

Base64 Encoding

Encoding technique sometimes used to hide malicious prompts.

Unicode Homoglyph

Lookalike characters used to obfuscate malicious content.

Virtualization

Using nested contexts to hide instructions from model filters.

Defense in Depth

Multiple security layers protecting against various attack vectors.

Input Validation

Checking user input for safety before processing.

Output Sanitization

Filtering model outputs to remove sensitive or harmful content.

Rate Limiting

Restricting requests to prevent abuse or resource exhaustion.

Least Privilege

Granting minimum permissions necessary for system function.

Human-in-the-Loop

Requiring human approval for critical AI actions.

Sandboxing

Isolating AI systems to contain potential damage from attacks.