Hacking AI - how attackers impose their will on AI

Attackers can 3D print a turtle that AI systems will always see as a rifle. This is the new frontier of adversarial attacks on machine learning.

#1about 2 minutes

Understanding the core principles of hacking AI systems

AI systems can be hacked by manipulating their statistical outputs through data poisoning to force the model to produce attacker-controlled results.

#2about 2 minutes

Exploring the three primary data poisoning attack methods

Attackers compromise AI systems through prompt injection, manipulating training data to create backdoors, or injecting specific patterns into a live model.

#3about 3 minutes

Why the AI industry repeats early software security mistakes

The AI industry's tendency to trust all input data, unlike the hardened practices of software development, creates significant vulnerabilities for attackers to exploit.

#4about 3 minutes

How adversarial attacks manipulate image recognition models

Adversarial attacks overlay a carefully crafted noise pattern onto an image, causing subtle mathematical changes that force a neural network to misclassify the input.

#5about 5 minutes

Applying adversarial attacks in the physical world

Adversarial patterns can be printed on physical objects like stickers or clothing to deceive AI systems, such as tricking self-driving cars or evading surveillance cameras.

#6about 2 minutes

Creating robust 3D objects for adversarial attacks

By embedding adversarial noise into a 3D model's geometry, an object can be consistently misclassified by AI from any viewing angle, as shown by a turtle identified as a rifle.

#7about 2 minutes

Techniques for defending against adversarial image attacks

Defenses against adversarial attacks involve de-poisoning input images by reducing their information level, such as lowering bit depth, to disrupt the malicious noise pattern.

#8about 4 minutes

Understanding the complexity of prompt injection attacks

Prompt injection bypasses safety filters by framing forbidden requests in complex contexts, such as asking for Python code to perform an unethical task, exploiting the model's inability to grasp the full impact.

#9about 2 minutes

The inherent bias of manual prompt injection filters

Manual content filtering in AI models introduces human bias, as demonstrated by inconsistent rules for jokes about different genders, which highlights a fundamental scaling and fairness problem.

#10about 2 minutes

Q&A on creating patterns and de-poisoning images

The Q&A covers how adversarial patterns are now AI-generated and discusses image de-poisoning techniques like autoencoders, bit depth reduction, and rotation to reduce malicious information.