A significant amount of work has been done in understanding various individual security/privacy risks in machine learning models. However, more work is needed in understanding how a given defense interacts with other, unrelated, risks. We have been exploring this problem in our recent work including a systematization of knowledge (SoK) paper in the 2024 IEEE Security and Privacy Symposium. We are also building a software tool to facilitate systematic empirical exploration of such interactions.
Unintended Interactions between ML defenses and risks
Machine learning (ML) models are susceptible to a wide range of risks including security threats like evasion, poisoning and unauthorized model ownership, privacy breaches through inference attacks, and fairness issues such as discriminatory behavior. Various defenses have been proposed to protect against each of these risks separately focusing on their effectiveness in addressing specific risks. Yet, using a defense may inadvertently increase or decrease susceptibility to other risks, leading to unintended interactions. Despite their practical relevance, such interactions have not yet been systematically explored in the research literature.
We first started by looking at pairwise interactions between simultaneous defenses (AAAI 2023). We looked at how deploying a model ownership verification technique (such as watermarking or fingerprinting) interacts with simultaneously deploying a defense against a different risk – we specifically looked at differential privacy and adversarial training. Our research revealed that often these defenses resulted in conflicts impacting the effectiveness of either defense and/or model utility. This prompted us to systematically study the potential unintended interactions between an ML defense (intended to address a specific risk) and other (unrelated) risks.
As part of our systematization (IEEE SP 2024), we carried out an exhaustive survey of existing work: we conjectured that overfitting and memorization are the likely causes that underlie such unintended interactions, and identified several factors that can influence overfitting or memorization, including factors such as characteristics of a model, its training data, or its objective function. We find that by exploring how these different factors contribute to making a defense or a risk more or less effective, we can anticipate unintended interactions between the defense and the risk.
Table: Overview of different defenses and risks considered in our SoK: For RD1 (Adversarial training), the interactions with all the risks other than R1 (Evasion) are unintended interactions. |
We present a framework to summarize how the effectiveness of a defense correlates with different influencing factors and how a change in a factor correlates with the susceptibility to different risks. Using this framework, we propose a guideline for studying the common factors between a given pair of defense and (unrelated) risk to conjecture the nature of their interactions (whether the risk increases/decreases when the defense is effective). We empirically evaluated two such pairs that were not studied in prior work, and were able to show that our guideline is effective at predicting unintended interactions.
We are grateful to the program committee of the 2024 IEEE Security and Privacy Symposium for recognizing this work with a Distinguished Paper Award.
Visit our project page for the papers, talks, and source code from this work
Based on this work, we identified the need for a software tool for systematic empirical evaluation of defenses and risks in ML models. Such a tool can facilitate systematic analysis of previously unexplored unintended interactions between defenses and risks. It can also serve as the means for systematic comparative evaluation of new defenses or attacks.
Next steps
We have been working on Amulet, an extensible, open-source pytorch library for systematic empirical evaluation of ML defenses and attacks. It currently has six defenses and eight attacks already integrated. We designed Amulet to be extensible so that ML researchers and analysts could easily add support for new defenses, attacks, and datasets. This summer, we expect to make Amulet available as open source under Apache license. We welcome the ML security/privacy community to try Amulet and contribute to making it better, by providing us feedback to improve it, and by integrating more defenses and attacks into Amulet.
Acknowledgements: This work was supported in part by Intel (as part of the Private AI Collaborative Research Institute and the Government of Ontario.