Journal of Information Security Reserach ›› 2025, Vol. 11 ›› Issue (1): 21-.

Previous Articles     Next Articles

Research of Invisible Backdoor Attack Based on Interpretability

Zheng Jiaxi, Chen Wei, Yin Ping, and Zhang Yiting   

  1. (School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023)
  • Online:2025-01-24 Published:2025-01-24

基于可解释性的不可见后门攻击研究

郑嘉熙陈伟尹萍张怡婷   

  1. (南京邮电大学计算机学院、软件学院、网络空间安全学院南京210023)
  • 通讯作者: 郑嘉熙 硕士.主要研究方向为信息安全和AI安全. 747508759@qq.com
  • 作者简介:郑嘉熙 硕士.主要研究方向为信息安全和AI安全. 747508759@qq.com 陈伟 博士,教授.主要研究方向为Web安全、IoT安全和机器学习系统安全. chenwei@njupt.edu.cn 尹萍 硕士.主要研究方向为信息安全和AI安全. 2894632476@qq.com 张怡婷 博士,副教授.主要研究方向为网络实体识别和网络流量分析. zyt@njupt.edu.cn

Abstract: Deep learning has achieved remarkable success on a variety of critical tasks. However, recent work has shown that deep neural networks are vulnerable to backdoor attacks, where attackers release inverse models that behave normally on benign samples, but misclassify samples imposed by any trigger to the target label. Unlike adversarial samples, backdoor attacks are mainly implemented in the model training phase, perturbing samples with triggers and injecting backdoors into the model. This paper proposes an invisible backdoor attack based on interpretability algorithms. Different from the existing works that arbitrarily set the trigger mask, this paper carefully designs a trigger mask determination based on interpretability, and uses the latest random pixel perturbation as the trigger style design, so that the sample pairs imposed by the trigger are more natural and undetectable to avoid the detection of the human eye, and the defense strategy against the backdoor attack. In this paper, we conduct a large number of comparative experiments on CIFAR10, CIFAR100 and ImageNet datasets to demonstrate the effectiveness and superiority of our attack. The SSIM index is also used to evaluate the difference between the backdoor samples designed in this paper and the benign samples, and an evaluation index close to 0.99 is obtained, which proves that the backdoor samples generated in this paper are not identifiable under visual inspection. Finally, this paper also proves that the proposed attack is defensible against the existing backdoor defense methods.

Key words: deep learning, deep neural network, backdoor attack, trigger, interpretability, backdoor sample

摘要: 深度学习在各种关键任务上取得了显著的成功.然而,最近的研究表明,深度神经网络很容易受到后门攻击,攻击者释放出对良性样本行为正常的反向模型,但将任何触发器施加的样本错误地分类到目标标签上.与对抗性样本不同,后门攻击主要实施在模型训练阶段,用触发器干扰样本,并向模型中注入后门,提出了一种基于可解释性算法的不可见后门攻击方法.与现有的任意设置触发掩膜的工作不同,精心设计了一个基于可解释性的触发掩膜确定,并采用最新型的随机像素扰动作为触发器样式设计,使触发器施加的样本更自然和难以察觉,用以规避人眼的检测,以及对后门攻击的防御策略.通过在CIFAR10,CIFAR100和ImageNet数据集上进行了大量的对比实验证明该攻击的有效性和优越性.还使用SSIM指数评估所设计的后门样本与良性样本之间的差异,得到了接近0.99的评估指标,证明了生成的后门样本在目视检查下是无法识别的.最后还证明了攻击的抗防御性,可以抵御现有的后门防御方法.

关键词: 深度学习, 深度神经网络, 后门攻击, 触发器, 可解释性, 后门样本

CLC Number: