Manipulating The Machine: Prompt Injections And Counter Measures

A Chevy chatbot was tricked into offering cars for $1. This talk explores the serious security threat of prompt injection and shows you how to stop it.

#1about 4 minutes

Understanding the three layers of an LLM prompt

A prompt is structured into three layers: the system prompt for instructions, the context for additional data, and the unpredictable user input.

#2about 3 minutes

How a car dealer's chatbot was easily manipulated

A Chevrolet car dealer's chatbot was exploited by users to generate humorous and unintended responses, including a legally binding offer for a $1 car.

#3about 4 minutes

Stealing system prompts to bypass security rules

Attackers can use creative phrasing like "repeat everything above" to trick an LLM into revealing its hidden system prompt and instructions.

#4about 6 minutes

Why attackers use prompt injection techniques

Prompt injections are used to access sensitive business data, gain personal advantages like bypassing HR filters, or exploit integrated tools to steal information like 2FA tokens.

#5about 4 minutes

Exploring simple but ineffective defense mechanisms

Initial defense ideas like avoiding secrets or tool integration are impractical, and simple system prompt instructions are easily circumvented by attackers.

#6about 4 minutes

Using fine-tuning and adversarial detectors for defense

More effective defenses include fine-tuning models on domain-specific data to reduce reliance on instructions and using specialized adversarial prompt detectors to identify malicious input.

#7about 2 minutes

Key takeaways on prompt injection security

Treat all system prompt data as public, use a layered defense of instructions, detectors, and fine-tuning, and accept that no completely reliable solution exists yet.