Research of Invisible Backdoor Attack Based on Interpretability

Abstract

Abstract: Deep learning has achieved remarkable success on a variety of critical tasks. However, recent work has shown that deep neural networks are vulnerable to backdoor attacks, where attackers release inverse models that behave normally on benign samples, but misclassify samples imposed by any trigger to the target label. Unlike adversarial samples, backdoor attacks are mainly implemented in the model training phase, perturbing samples with triggers and injecting backdoors into the model. This paper proposes an invisible backdoor attack based on interpretability algorithms. Different from the existing works that arbitrarily set the trigger mask, this paper carefully designs a trigger mask determination based on interpretability, and uses the latest random pixel perturbation as the trigger style design, so that the sample pairs imposed by the trigger are more natural and undetectable to avoid the detection of the human eye, and the defense strategy against the backdoor attack. In this paper, we conduct a large number of comparative experiments on CIFAR10, CIFAR100 and ImageNet datasets to demonstrate the effectiveness and superiority of our attack. The SSIM index is also used to evaluate the difference between the backdoor samples designed in this paper and the benign samples, and an evaluation index close to 0.99 is obtained, which proves that the backdoor samples generated in this paper are not identifiable under visual inspection. Finally, this paper also proves that the proposed attack is defensible against the existing backdoor defense methods.

Key words: deep learning, deep neural network, backdoor attack, trigger, interpretability, backdoor sample

摘要： 深度学习在各种关键任务上取得了显著的成功.然而，最近的研究表明，深度神经网络很容易受到后门攻击，攻击者释放出对良性样本行为正常的反向模型，但将任何触发器施加的样本错误地分类到目标标签上.与对抗性样本不同，后门攻击主要实施在模型训练阶段，用触发器干扰样本，并向模型中注入后门，提出了一种基于可解释性算法的不可见后门攻击方法.与现有的任意设置触发掩膜的工作不同，精心设计了一个基于可解释性的触发掩膜确定，并采用最新型的随机像素扰动作为触发器样式设计，使触发器施加的样本更自然和难以察觉，用以规避人眼的检测，以及对后门攻击的防御策略.通过在CIFAR10，CIFAR100和ImageNet数据集上进行了大量的对比实验证明该攻击的有效性和优越性.还使用SSIM指数评估所设计的后门样本与良性样本之间的差异，得到了接近0.99的评估指标，证明了生成的后门样本在目视检查下是无法识别的.最后还证明了攻击的抗防御性，可以抵御现有的后门防御方法.

关键词: 深度学习, 深度神经网络, 后门攻击, 触发器, 可解释性, 后门样本

CLC Number:

TP18

郑嘉熙, 陈伟, 尹萍, 张怡婷, . 基于可解释性的不可见后门攻击研究[J]. 信息安全研究, 2025, 11(1): 21-.

References

［1］Gu T, DolanGavitt B, Garg S. Badnets: Identifying vulnerabilities in the machine learning model supply chain［J］. arXiv preprint, arXiv:1708.06733, 2017［2］Chen X, Liu C, Li B, et al. Targeted backdoor attacks on deep learning systems using data poisoning［J］. arXiv preprint, arXiv:1712.05526, 2017［3］Ribeiro M T, Singh S, Guestrin C. Why should i trust you? Explaining the predictions of any classifier［C］ Proc of the 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2016: 11351144［4］Gong Xueluan, Chen Yanjiao, Dong Jianshuo, et al. ATTEQNN: Attentionbased QoEaware evasive backdoor attacks［C］ Proc of the 29th Annual Network and Distributed System Security Symposium (NDSS 2022). San Diego, CA: Internet Society, 2022: 118［5］Barni M, Kallas K, Tondi B. A new backdoor attack in cnns by training set corruption without label poisoning［C］ Proc of 2019 IEEE Int Conf on Image Processing (ICIP). Piscataway, NJ: IEEE, 2019: 101105［6］Zou M, Shi Y, Wang C, et al. Potrojan: Powerful neurallevel trojan designs in deep learning models［J］. arXiv preprint, arXiv:1802.03043, 2018［7］Bagdasaryan E, Shmatikov V. Blind backdoors in deep learning models［C］ Proc of the 30th USENIX Security Symposium (USENIX Security 21). Berkeley, CA: USENIX Association, 2021: 15051521［8］Xue M, He C, Wang J, et al. OnetoN & Ntoone: Two advanced backdoor attacks against deep learning models［J］. IEEE Trans on Dependable and Secure Computing, 2020, 19(3): 15621578［9］Wang B, Yao Y, Shan S, et al. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks［C］ Proc of 2019 IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2019: 707723［10］刘亦纯, 张光华, 宿景芳. 基于多级度量差值的神经网络后门检测方法［J］. 信息安全研究, 2023, 9(6): 587592［11］Chen B, Carvalho W, Baracaldo N, et al. Detecting backdoor attacks on deep neural networks by activation clustering［J］. arXiv preprint, arXiv:1811.03728, 2018［12］Gao Y, Xu C, Wang D, et al. Strip: A defence against trojan attacks on deep neural networks［C］ Proc of the 35th Annual Computer Security Applications Conference. New York: ACM, 2019: 113125［13］Wang Y, Zhao M, Li S, et al. Dispersed pixel perturbationbased imperceptible backdoor trigger for image classifier models［J］. IEEE Trans on Information Forensics and Security, 2022, 17: 30913106［14］Liu Y, Ma S, Aafer Y, et al. Trojaning attack on neural networks［C］ Proc of the 25th Annual Network and Distributed System Security Symposium (NDSS 2018). San Diego, CA: Internet Society, 2018: 115［15］Gong X, Chen Y, Wang Q, et al. Defenseresistant backdoor attacks against deep neural networks in outsourced cloud environment［J］. IEEE Journal on Selected Areas in Communications, 2021, 39(8): 26172631

[1]	. Research on Deep Learningbased Spatiotemporal Feature Fusion Network Intrusion Detection Model [J]. Journal of Information Security Reserach, 2025, 11(2): 122-.
[2]	. A Malicious TLS Traffic Detection Method with Multimodal Features [J]. Journal of Information Security Reserach, 2025, 11(2): 130-.
[3]	. Container Anomaly Detection Based on Attention Mechanism and Multiscale Convolutional Neural Network [J]. Journal of Information Security Reserach, 2025, 11(1): 35-.
[4]	. Encrypted Traffic Detection Technology for Multisession Coordinated #br# Attack Based on Deep Learning#br# [J]. Journal of Information Security Reserach, 2025, 11(1): 66-.
[5]	. Image Processing Model Watermarking Method Based on #br# Attention Mechanism and Passport Layer Embedding#br# [J]. Journal of Information Security Reserach, 2024, 10(9): 849-.
[6]	. A Review of GPU Acceleration Technology for Deep Learning in Plaintext and Private Computing Environments [J]. Journal of Information Security Reserach, 2024, 10(7): 586-.
[7]	. Model of Intrusion Detection Based on Federated Learning and Convolutional Neural Network [J]. Journal of Information Security Reserach, 2024, 10(7): 642-.
[8]	. An Automatic Vulnerability Classification Framework Based on BiGRU TextCNN [J]. Journal of Information Security Reserach, 2024, 10(5): 446-.
[9]	. Adversarial Attack Algorithm Based on Multimodel Scheduling Optimization#br# #br# [J]. Journal of Information Security Reserach, 2024, 10(5): 403-.
[10]	. Research on Network Traffic Intrusion Detection Method Based on Denoising Diffusion Probability Model [J]. Journal of Information Security Reserach, 2024, 10(5): 421-.
[11]	. Research on Source Code Vulnerability Detection Based on BERT Model [J]. Journal of Information Security Reserach, 2024, 10(4): 294-.
[12]	. A Network Intrusion Detection Model Integrating CNN-BiGRU and Attention Mechanism [J]. Journal of Information Security Reserach, 2024, 10(3): 202-.
[13]	. Malicious TLS Traffic Detection Based on Graph Representation#br# #br# [J]. Journal of Information Security Reserach, 2024, 10(3): 209-.
[14]	. Malware Detection and Classification Based on GHM Visualization and Deep Learning [J]. Journal of Information Security Reserach, 2024, 10(3): 216-.
[15]	. Research on Location Attack Detection of VANET Based on Incremental Learning [J]. Journal of Information Security Reserach, 2024, 10(3): 277-.