Mirko Ross
Hacking AI - how attackers impose their will on AI
#1about 2 minutes
Understanding the core principles of hacking AI systems
AI systems can be hacked by manipulating their statistical outputs through data poisoning to force the model to produce attacker-controlled results.
#2about 2 minutes
Exploring the three primary data poisoning attack methods
Attackers compromise AI systems through prompt injection, manipulating training data to create backdoors, or injecting specific patterns into a live model.
#3about 3 minutes
Why the AI industry repeats early software security mistakes
The AI industry's tendency to trust all input data, unlike the hardened practices of software development, creates significant vulnerabilities for attackers to exploit.
#4about 3 minutes
How adversarial attacks manipulate image recognition models
Adversarial attacks overlay a carefully crafted noise pattern onto an image, causing subtle mathematical changes that force a neural network to misclassify the input.
#5about 5 minutes
Applying adversarial attacks in the physical world
Adversarial patterns can be printed on physical objects like stickers or clothing to deceive AI systems, such as tricking self-driving cars or evading surveillance cameras.
#6about 2 minutes
Creating robust 3D objects for adversarial attacks
By embedding adversarial noise into a 3D model's geometry, an object can be consistently misclassified by AI from any viewing angle, as shown by a turtle identified as a rifle.
#7about 2 minutes
Techniques for defending against adversarial image attacks
Defenses against adversarial attacks involve de-poisoning input images by reducing their information level, such as lowering bit depth, to disrupt the malicious noise pattern.
#8about 4 minutes
Understanding the complexity of prompt injection attacks
Prompt injection bypasses safety filters by framing forbidden requests in complex contexts, such as asking for Python code to perform an unethical task, exploiting the model's inability to grasp the full impact.
#9about 2 minutes
The inherent bias of manual prompt injection filters
Manual content filtering in AI models introduces human bias, as demonstrated by inconsistent rules for jokes about different genders, which highlights a fundamental scaling and fairness problem.
#10about 2 minutes
Q&A on creating patterns and de-poisoning images
The Q&A covers how adversarial patterns are now AI-generated and discusses image de-poisoning techniques like autoencoders, bit depth reduction, and rotation to reduce malicious information.
Related jobs
Jobs that call for the skills explored in this talk.
Featured Partners
Related Videos
A hundred ways to wreck your AI - the (in)security of machine learning systems
Balázs Kiss
GenAI Security: Navigating the Unseen Iceberg
Maish Saidel-Keesing
AI: Superhero or Supervillain? How and Why with Scott Hanselman
Scott Hanselman
Machine Learning: Promising, but Perilous
Nura Kawa
Panel: How AI is changing the world of work
Pascal Reddig, TJ Griffiths, Fabian Schmidt, Oliver Winzenried & Matthias Niehoff & Mirko Ross
The AI Security Survival Guide: Practical Advice for Stressed-Out Developers
Mackenzie Jackson
How AI Models Get Smarter
Ankit Patel
The shadows that follow the AI generative models
Cheuk Ho
From learning to earning
Jobs that call for the skills explored in this talk.


Senior Backend Engineer – AI Integration (m/w/x)
chatlyn GmbH
Vienna, Austria
Senior
JavaScript
AI-assisted coding tools
Security-by-Design for Trustworthy Machine Learning Pipelines
Association Bernard Gregory
Machine Learning
Continuous Delivery
AI Engineer Security
Paradigma Digital
Municipality of Madrid, Spain
API
Azure
Python
FastAPI
Computer Vision
+3
Data Engineer - Machine Learning | Fraud & Abuse
DeepL
Charing Cross, United Kingdom
Remote
€40K
.NET
Python
Machine Learning
AI Security Consultant
IOActive Inc.
Municipality of Madrid, Spain
€125-175K
API
Python
PyTorch
TensorFlow
+1

