Attacking Machine Learning Models

The Implications

Doesn’t look like anything to me!

Adversarial Attacks

What are the reasons for such a vulnerability? How can we come up with adversarial examples? How can we defend our models against such attacks?

So, why does such a vulnerability exist?

Generating Adversarial Examples

Fast Gradient Sign Method

Basic Iterative Method

Targeted Fast Gradient Sign Method

What can we do to defend our models?

Hide your gradients!

Be prepared with Adversarial Training

Acknowledging Ignorance

How we assign null labels to examples that are perturbed with calculated noise.





