Confuse, Obfuscate, Disrupt: Using Adversarial Techniques for Better AI and True Anonymity

What if a single pixel could trick your AI into seeing a cat as a dog? Learn how adversarial attacks can expose hidden flaws and build more resilient systems.

#1about 1 minute

The importance of explainable AI and data quality

AI models are only as good as their training data, which is often plagued by bias, noise, and inaccuracies that explainable AI helps to uncover.

#2about 3 minutes

Identifying common data inconsistencies in AI models

Models can be compromised by issues like annotation errors, data imbalance, and adversarial samples, which can be measured with tools like Captum.

#3about 2 minutes

The dual purpose of adversarial AI attacks

Intentionally introducing adversarial inputs can be used for good to test model boundaries, or for bad to obfuscate data and protect personal privacy.

#4about 3 minutes

How to confuse NLP models with creative inputs

Natural language processing models can be disrupted using techniques like encoding, code-switching, misspellings, and even metaphors to prevent accurate interpretation.

#5about 4 minutes

Visualizing model predictions with the Captum library

The Captum library for PyTorch helps visualize which parts of an input, like words in a sentence or pixels in an image, contribute most to a model's final prediction.

#6about 6 minutes

Manipulating model outputs with subtle input changes

Simple misspellings can flip a sentiment analysis result from positive to negative, and adding a single pixel can cause an image classifier to misidentify a cat as a dog.

#7about 2 minutes

Using an adversarial pattern t-shirt to evade detection

A t-shirt printed with a specific adversarial pattern can disrupt a real-time person detection model, effectively making the wearer invisible to the AI system.

#8about 2 minutes

Techniques for defending models against adversarial attacks

Defenses against NLP attacks include normalization and grammar checks, while vision attacks can be mitigated with image blurring, bit-depth reduction, or advanced methods like FGSM.

#9about 2 minutes

Defeating a single-pixel attack with image blurring

Applying a simple Gaussian blur to an image containing an adversarial pixel smooths out the manipulation, allowing the model to correctly classify the image.