REStore: Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation - Irisa Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

REStore: Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation

Résumé

Backdoor attacks pose a significant threat to deep neural networks as they allow an adversary to inject a malicious behavior in a victim model during training. This paper addresses the challenge of defending against backdoor attacks in a blackbox setting where the defender has a limited access to a suspicious model. In this paper, we introduce Importance Splitting, a Sequential Monte-Carlo method previously used in neural network robustness certification, as an off-the-shelf tool for defending against backdoors. We demonstrate that a black-box defender can leverage rare event simulation to assess the presence of a backdoor, reconstruct its trigger, and finally purify test-time input data in real-time. So-called REStore, our input purification defense proves effective in black-box scenarios because it uses triggers recovered with a query access to a model (only observing its logit, probit, or top-1 label outputs). We test our method on MNIST, CIFAR-10, and CASIA-Webface. We believe we are the first to demonstrate that backdoors may be considered under the lens of rare event simulation. Moreover, REStore is the first one-stage, black-box input purification defense that approaches the performance of more complex comparables. REStore avoids gradient estimation, model reconstruction, or the vulnerable training of additional models.
Fichier principal
Vignette du fichier
REStore! Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation.pdf (4.18 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04485197 , version 1 (01-03-2024)

Licence

Paternité

Identifiants

  • HAL Id : hal-04485197 , version 1

Citer

Quentin Le Roux, Kassem Kallas, Teddy Furon. REStore: Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation. SaTML 2024 - 2nd IEEE Conference on Secure and Trustworthy Machine Learning, Apr 2024, Toronto, Canada. pp.1-22. ⟨hal-04485197⟩
82 Consultations
22 Téléchargements

Partager

Gmail Facebook X LinkedIn More