Tutorial on Adversarial Machine Learning - Part 1

The Good, The Bad, The Ugly

The Good

(a.k.a. The Age of Deep Neural Networks)

Deep Neural Networks (DNNs) have revolutionized various domains, showcasing remarkable achievements in computer vision , natural language processing , and speech processing . These powerful models have surpassed human-level accuracy in tasks like image classification and language processing, propelling them into real-world applications. Nowadays, DNNs are ubiquitous, powering the technology we use every day, from voice assistants like Siri to self-driving Tesla cars. Their impact is undeniable, transforming the way we interact with technology and reshaping industries across the globe.

The Bad

(a.k.a. The Dark Side of DNNs)

While DNNs have achieved unprecedented success and widespread adoption, their reliability and security remain a concern. DNNs are known as black-box models, meaning that their internal workings are not transparent to users, and even their creators. This lack of transparency makes it difficult to understand their behavior and trust their decisions.

For some low-stakes applications, such as fraud transactions detection or movie recommendation, it is not a big deal if the model makes a mistake. The consequences of incorrect predictions are not severe. However, for some high-stakes applications, such as autonomous driving, clinical diagnostics, auto-trading bots where the model’s decisions can lead to life-threatening conditions or economic collapse, it is crucial to understand the model’s behavior and trust its decisions. Just think about a situation, when you have a serious disease and a machine learning model predicts that you should take a specific medicine, would you trust the model’s decision to take the medicine?

AI’s application AI’s risk Consequence
Commercial Ads recommendation Matching “users-interest” incorrectly Seeing non-interested Ads
Auto-trading bot Triggering wrong signal Financial loss
Autopilot in Tesla Mist-Classifying “Stop-Sign” Fatal crash
Autonomous drone swarms Wrong targeting/attacking Fatal mistake - many deaths

Some examples of catastrophic failures of unreliable DNNs in real-life:

Conclusion: The more autonomous the AI system is, the more important it is to understand the model’s behavior and trust its decisions.

The Ugly

(a.k.a. Adversarial Examples)

In addition to their lack of transparency, DNNs are also vulnerable to adversarial attacks including backdoor attacks, poisoning attacks, and adversarial examples. A notable work from Szegedy et al. (2014) was the first work demonstrated that DNNs are susceptible to adversarial examples, subtle modifications to input data that can manipulate their behavior. And the worst part is that generating adversarial examples is easy and fast .

Adversarial examples (link to the demo).

The above example illustrates an adversarial example generated from a pre-trained ResNet50 model. The image on the left is the original image of a koala, which is correctly classified as a koala with nearly 50% confidence. The image in the middle is the adversarial perturbation, which is imperceptible to the human eye. The image on the right is the adversarial example generated from the original image on the left. The adversarial example is misclassified as a ballon with nearly 100% confidence.

The Efforts

Since the discovery of adversarial examples [4], it has been an extensive with the number of papers on this topic increasing exponentially, as shown in the figure below.

Number of adversarial examples papers published on arXiv from 2014 to May 2023. Data source from Carlini's post (link).

On the one hand, various attack methods have been proposed to enhance effectiveness , computational efficiency , transferability among inputs or among models .

On the other hand, there is also an extremely large number of defense methods proposed to mitigate adversarial attacks, from all aspects of the machine learning pipeline.

Despite numerous defense strategies being proposed to counter adversarial attacks, no method has yet provided comprehensive protection or completely illuminated the vulnerabilities of DNNs.

The Difficulties

(of Evaluating Adversarial Robustness)

Checking out the toughness of adversarial examples is a lot trickier than your usual machine learning model checks. This is mainly because adversarial examples don’t just pop up naturally, you have to create them using adversaries. And let’s just say, making these examples to honestly reflect the threat model takes a lot of genuine efforts.

Now, there’s this thing called gradient masking that folks often use to stop gradient information from being used to make adversarial examples. Attacks like PGD, for instance, need to work out the gradients of the model’s loss function to create adversarial examples. But sometimes, due to the way the model is built or trained, you might not be able to get the precise gradients you need, and this can throw a wrench in the works of adversarial attacks.

Also, adversarial attacks can be quite picky when it comes to specific settings. Like, a PGD attack might work great when you use a certain step size, number of iterations, and scale of logits, but not so well in other settings. Transferred attacks, on the other hand, care a lot about the model you choose as a substitute. So, you’ve got to make sure you’re checking the toughness of adversarial examples in lots of different settings.

Carlini et al. (2019) came up with a really handy checklist in 2019. It points out common slip-ups people make when they’re checking out adversarial robustness and offers some tips to dodge these pitfalls.

The Privilege

(a.k.a. AML never dies and AML’s reseachers never unemployed :joy:)

This is my humble and fun opinion so please don’t take it seriously :smile:. IMHO, researchers in AML have several privileges that other researchers in other fields don’t have: