Monday, 15 June 2026

Deployment Concerns for ML Systems: Unintended Interactions

Existing research focuses on defenses against individual machine learning (ML) risks. However, trustworthy ML deployment also requires addressing challenges that arise when mitigating multiple risks simultaneously, such as unintended interactions. We identify and explore three such unintended interactions in ML.

Overview

Current trustworthy ML research largely focuses identifying and mitigating risks within individual pillars of trust, namely, security, privacy, fairness, transparency, and safety. While essential, this is insufficient for trustworthy deployment, which also requires addressing additional problems that arise when mitigating multiple risks simultaneously. One such problem is unintended interactions in ML. Over the past two years, we have explored three types of unintended interaction: (a) defense against one risk may increase or decrease other unrelated risks; (b) conflicts among defenses can decrease their effectiveness when combined; and (c) potential for collusion among adversaries can enable executing an attack to amplify others. We show that such interactions can be understood and predicted through factors governing defense effectiveness and risk susceptibility, and propose frameworks to study such interactions.

Unintended Interactions among Defenses and Unrelated Risks

Defenses targeting one risk can inadvertently increase or decrease susceptibility to other unrelated risks. For example, adversarial training (against the risk of evasion) may increase susceptibility to membership inference, and discriminatory behavior. However, a systematic understanding of the underlying causes of such interactions is lacking in the literature. Identifying these causes can enable practitioners to predict unexplored interactions without expensive empirical evaluation. However, this is challenging due to the incomprehensible behavior of ML models. This raises the following research question: what are the factors underlying unintended interactions among defenses and unrelated risks, and how can we effectively predict unexplored ones?

We developed the first framework to systematically explore unintended interactions based on the conjecture that overfitting and memorization are the primary causes. This is because different risks depend on factors that influence overfitting and memorization, and an effective defense that changes these underlying factors, in turn affects the susceptibility to the risks. Our framework includes various factors that influence overfitting and memorization: characteristics of the training dataset (e.g., size, distribution), the model (e.g., capacity), and the objective function (e.g., distance to boundary, loss curvature). Using this framework, we propose a guideline to explain interactions previously explored in the literature, and conjecture about unexplored ones by examining how a defense changes a factor, and how a change in that factor correlates with a change in susceptibility to an unrelated risk. Using this guideline, we conjecture two previously unexplored interactions and empirically validate them. This work was published in IEEE Symposium on Security and Privacy (S&P) 2024, and received a distinguished paper award (see blog). This work also resulted in Amulet, an open-source library for evaluating unintended interactions, which received funding from Intel.

Combining ML Defenses without Conflicts

Real-world models must simultaneously protect against multiple risks to security, privacy, and fairness. Combining existing defenses is plausible, but conflicting objectives can reduce the effectiveness of the constituent defenses. This leads to asking the following: how can we effectively combine ML defenses while avoiding conflicts? We need an easy-to-use combination technique that should be (a) accurate (correctly identifies whether a combination is effective), (b) scalable (allows two or more defenses to be combined), (c) non-invasive (requires no changes to the defenses being combined), and (d) general (applicable to different types of defenses).

We propose Def\Con, the first combination technique that meets all requirements by effectively combine defenses based on whether the defenses will conflict with each other. Def\Con identifies potential conflicts based on where each defense is applied in the ML pipeline (pre-, in-, or post-training), and whether the underlying mechanisms of the defenses interfere (e.g., a later defense minimizes a risk used by an earlier defense). Since these steps do not depend on specific defenses, Def\Con is generally applicable to various defenses. This helps practitioners identify compatible defenses that can be effectively combined, i.e., their effectiveness remains comparable to applying each defense separately. We show that Def\Con is more accurate than prior baselines, and correctly predicts seven out of the eight previously explored combinations, and 27 out of 30 unexplored combinations that we evaluated empirically. We also show, for the first time in the literature, that it is possible to effectively combine three defenses (i.e., Def\Con is scalable). Finally, Def\Con combines existing defenses without modification (i.e., Def\Con is non-invasive). This work was published in the Transactions on Machine Learning Research (TMLR), 2025.

Colluding Adversaries in ML Pipelines

So far, while answering the above two unintended interactions, we assumed that adversaries are independent actors in the ML pipeline with distinct objectives targeting different risks. However, practitioners must consider the potential for collusion among adversaries within the ML pipeline, where an adversary executes an attack (by exploiting the vulnerability related to the corresponding risk) to amplify the effectiveness of attacks by other adversaries. For instance, poisoning to increase effectiveness of privacy attacks (e.g., membership/attribute/distribution inference, data reconstruction), or using adversarial examples for evasion to increase effectiveness of privacy attacks. Existing work lacks a systematic framework to explore potential colluding adversaries, and to study the implications of the adversaries' characteristics for collusion. A unified framework can help practitioners and researchers (i) identify new threats against ML systems, (ii) design strong attacks for auditing, and (iii) design effective defenses against such attacks. This raises the following question: what are the underlying factors that aid the potential for collusion among adversaries, and how can we predict the collusion potential?

We present the first framework to explore two collusion types: (i) between train- and inference-time adversaries (or "train-inference collusion"), and (ii) among inference-time adversaries (or "inference-time collusion"). In both cases, we observe that a successful attack changes certain factors, which can influence the effectiveness of other attacks relying on those factors. However, these factors differ between both collusion types. For train-inference collusion, poisoning changes the factors underlying overfitting and memorization which can impact the effectiveness of inference-time attacks that rely on those factors. Here, the framework includes the factors underlying overfitting and memorization, which can be manipulated by the train-time adversary through poisoning. For inference-time collusion, knowledge inferred from the first attack is treated as an outcome that satisfies the prerequisites for the second attack. Here, factors related to adversary's knowledge (e.g., of training data, target model) are included as part of the framework. Using the common factors among attacks, we present a guideline to predict the potential for collusion. We validate the guideline empirically on five unexplored case of collusion, and analyze the effect of the adversary's characteristics on the collusion. This work was published in the USENIX Security Symposium (USENIX Sec), 2026.

Acknowledgements: This work was supported in part by Intel (as part of the Private AI Collaborative Research Institute, Wallenberg Visiting Professor Program, the Natural Sciences and Engineering Research Council of Canada (grant number RGPIN-2026-04826), and the Government of Ontario (RE011-038).

Deployment Concerns for ML Systems: Unintended Interactions

Existing research focuses on defenses against individual machine learning (ML) risks. However, trustworthy ML deployment also requires addre...