AI Security Glossary
Comprehensive terminology for AI security testing and LLM safety
Prompt Injection
Attack technique where malicious user input manipulates LLM behavior to override system instructions or extract sensitive data.
Direct Injection
Prompt injection through direct user input to override system prompts.
Indirect Injection
Hidden malicious instructions in external data (documents, web content) processed by the LLM.
Jailbreak
Techniques to bypass LLM safety filters and generate prohibited content.
Prompt Leakage
Exposure of system prompts or confidential instructions through manipulation.
System Prompt
Hidden instructions that define LLM behavior, persona, and constraints.
Few-Shot Learning
Providing examples in prompts to guide LLM output without fine-tuning.
Chain-of-Thought (CoT)
Prompting technique encouraging LLMs to show reasoning steps.
Prompt Engineering
The practice of crafting inputs to get desired outputs from LLMs.
Transformer
Deep learning architecture underlying modern LLMs using attention mechanisms.
Attention Mechanism
Technique allowing models to focus on relevant parts of input when generating output.
Token
Basic unit of text processed by LLMs (typically subword fragments).
Tokenization
Process of converting text into tokens for model processing.
Context Window
Maximum input length an LLM can process in a single request.
Temperature
Parameter controlling randomness in LLM output generation.
Top-k Sampling
Generation technique limiting token selection to top k probable options.
Top-p Sampling
Nucleus sampling - selecting from smallest set of tokens exceeding probability threshold.
RAG (Retrieval-Augmented Generation)
Architecture combining LLM with external knowledge retrieval for grounded outputs.
Vector Embedding
Numerical representation of text capturing semantic meaning in high-dimensional space.
Vector Database
Storage system optimized for similarity search on embeddings.
Semantic Search
Search method based on meaning rather than exact keyword matching.
Chunking
Dividing documents into smaller segments for embedding and retrieval.
Embedding Model
Model converting text to vector representations.
Cross-Encoder
Model scoring query-document relevance by processing both together.
Fine-tuning
Adapting pre-trained models to specific tasks with additional training.
Transfer Learning
Applying knowledge from one task to improve performance on another.
LoRA (Low-Rank Adaptation)
Efficient fine-tuning technique adding small trainable matrices.
RLHF
Reinforcement Learning from Human Feedback - technique for aligning models.
Instruction Tuning
Fine-tuning on instruction-response pairs for better following directions.
Continual Pre-training
Further training base models on new data without task-specific labels.
Data Poisoning
Malicious data injection into training sets to introduce vulnerabilities or biases.
Model Inversion
Attack reconstructing training data from model outputs or parameters.
Membership Inference
Determining if specific data was used in model training.
Model Extraction
Stealing model functionality through repeated API queries.
Adversarial Example
Input specifically crafted to cause model misclassification.
Differential Privacy
Mathematical framework for protecting individual data in datasets.
Hallucination
LLM generating confident but factually incorrect content.
PII (Personally Identifiable Information)
Data that can identify an individual (names, emails, etc.).
Data Leakage
Unintended exposure of sensitive information through model outputs.
AI Agent
Autonomous system that can plan, execute actions, and iterate on goals.
Tool Use
LLM capability to invoke external functions or APIs to accomplish tasks.
Function Calling
Structured mechanism for LLMs to invoke predefined functions.
MCP (Model Context Protocol)
Protocol enabling LLMs to interact with external tools and data sources.
ReAct (Reasoning + Acting)
Prompting method combining reasoning traces with action execution.
Agent Loop
Continuous cycle of thought-action-observation in autonomous agents.
Chain-of-Tools
Sequencing multiple tool invocations to accomplish complex tasks.
Excessive Agency
Security risk when AI systems have too much autonomy or permissions.
Red Teaming
Adversarial testing to find vulnerabilities through simulated attacks.
Fuzz Testing
Automated testing with random/invalid inputs to find vulnerabilities.
Benchmark
Standardized test for evaluating model performance on specific tasks.
Evaluation Dataset
Curated set of inputs used to assess model capabilities or safety.
Guardrail
System or code preventing LLM from generating unsafe outputs.
Content Filter
Mechanism blocking generation of prohibited content types.
Bleed
Information leaking between conversation sessions or contexts.
SBOM (Software Bill of Materials)
List of software components for security and compliance tracking.
AIBOM (AI Bill of Materials)
Inventory of AI model components including data, weights, and dependencies.
EU AI Act
European Union regulation on artificial intelligence risk categories.
NIST AI RMF
NIST AI Risk Management Framework for trustworthy AI systems.
Model Card
Documentation detailing model capabilities, limitations, and performance.
Data Sheet
Documentation describing training data characteristics and provenance.
Responsible AI
Framework for ethical AI development and deployment practices.
DoS (Denial of Service)
Attack exhausting resources to make service unavailable.
Prompt Chaining
Multi-step attacks using sequential prompts to achieve objectives.
Role Playing
Technique where LLM adopts a persona, potentially bypassing restrictions.
Base64 Encoding
Encoding technique sometimes used to hide malicious prompts.
Unicode Homoglyph
Lookalike characters used to obfuscate malicious content.
Virtualization
Using nested contexts to hide instructions from model filters.
Defense in Depth
Multiple security layers protecting against various attack vectors.
Input Validation
Checking user input for safety before processing.
Output Sanitization
Filtering model outputs to remove sensitive or harmful content.
Rate Limiting
Restricting requests to prevent abuse or resource exhaustion.
Least Privilege
Granting minimum permissions necessary for system function.
Human-in-the-Loop
Requiring human approval for critical AI actions.
Sandboxing
Isolating AI systems to contain potential damage from attacks.