REStore: Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation

Quentin Le Roux; Kassem Kallas; Teddy Furon

Communication Dans Un Congrès Année : 2024

REStore: Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation

(1, 2) , (1) , (1)

1
2

Quentin Le Roux

Fonction : Auteur

Creating and exploiting explicit links between multimedia fragments

THALES [France]

Kassem Kallas

Fonction : Auteur
PersonId : 1172264

Creating and exploiting explicit links between multimedia fragments

Teddy Furon

Fonction : Auteur
PersonId : 3087
IdHAL : teddy-furon
IdRef : 078044758

Creating and exploiting explicit links between multimedia fragments

Résumé

Backdoor attacks pose a significant threat to deep neural networks as they allow an adversary to inject a malicious behavior in a victim model during training. This paper addresses the challenge of defending against backdoor attacks in a blackbox setting where the defender has a limited access to a suspicious model. In this paper, we introduce Importance Splitting, a Sequential Monte-Carlo method previously used in neural network robustness certification, as an off-the-shelf tool for defending against backdoors. We demonstrate that a black-box defender can leverage rare event simulation to assess the presence of a backdoor, reconstruct its trigger, and finally purify test-time input data in real-time. So-called REStore, our input purification defense proves effective in black-box scenarios because it uses triggers recovered with a query access to a model (only observing its logit, probit, or top-1 label outputs). We test our method on MNIST, CIFAR-10, and CASIA-Webface. We believe we are the first to demonstrate that backdoors may be considered under the lens of rare event simulation. Moreover, REStore is the first one-stage, black-box input purification defense that approaches the performance of more complex comparables. REStore avoids gradient estimation, model reconstruction, or the vulnerable training of additional models.

Mots clés

deep neural networks backdoor defense blackbox trigger reconstruction input purification

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Cryptographie et sécurité [cs.CR]

Fichier principal

REStore! Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation.pdf (4.18 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Teddy Furon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04485197

Soumis le : vendredi 1 mars 2024-10:41:48

Dernière modification le : mardi 5 mars 2024-10:45:11

Dates et versions

hal-04485197 , version 1 (01-03-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04485197 , version 1

Citer

Quentin Le Roux, Kassem Kallas, Teddy Furon. REStore: Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation. SaTML 2024 - 2nd IEEE Conference on Secure and Trustworthy Machine Learning, Apr 2024, Toronto, Canada. pp.1-22. ⟨hal-04485197⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR UR1-MATH-NUM CYBERSCHOOL

82 Consultations

22 Téléchargements

REStore: Exploring a Black-Box Defense against DNN Backdoors using Rare Event Simulation

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager