Journal of Information Security Reserach ›› 2026, Vol. 12 ›› Issue (4): 359-.

Previous Articles     Next Articles

Research on Adaptive Hierarchical Neural Network Backdoor Defense Method

Xu Yuanping, Ma Weifeng, and Zhang Yulai   

  1. (School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023)
  • Online:2026-04-07 Published:2026-04-07

自适应层级化神经网络后门防御方法研究

徐媛屏马伟锋张宇来   

  1. (浙江科技大学信息与电子工程学院杭州310023)
  • 通讯作者: 张宇来 博士,副教授,硕士生导师.主要研究方向为人工智能. zhangyulai@zust.edu.cn
  • 作者简介:徐媛屏 硕士研究生.主要研究方向为人工智能安全. xyp_xxx_00@163.com 马伟锋 博士,副教授,硕士生导师.主要研究方向为计算机应用. mawf@zust.edu.cn 张宇来 博士,副教授,硕士生导师.主要研究方向为人工智能. zhangyulai@zust.edu.cn
  • 基金资助:
    国家自然科学青年科学基金项目(61803337)

Abstract: Backdoor attacks force the deep learning models to output a preset result at a specific inputs by implanting a covert trigger patterns into the training data, which seriously threatens the security of the model. Traditional defense methods (such as pruning and finetuning) are difficult to balance defense effect and model performance due to the partial overlap between the posterior portal neurons and the normal neurons. To this challenge, an adaptive hierarchical neural network backdoor defense (AHBD) method is proposed, which locates the backdoor through gradient direction consistency analysis, and designs adaptive defense strategies based on the functional characteristics of different levels of neural networks. Experiments show that AHBD significantly reduces the attack success rate (ASR decreases to 2.63% and 1.71%, respectively) on the CIFAR10 and GTSRB datasets, while maintaining the original classification accuracy of the model (ACC decreases by less than 1%), which is better than the existing mainstream defense methods.

Key words: deep learning, deep neural network, backdoor attack, backdoor defense, artificial intelligence security

摘要: 后门攻击通过向训练数据植入隐蔽触发模式,迫使深度学习模型在特定输入时输出预设结果,严重威胁模型安全.传统防御方法(如剪枝和微调)因后门神经元与正常神经元部分重叠,难以兼顾防御效果与模型性能.为此,提出一种自适应层级化神经网络后门防御(adaptive hierarchical neural network backdoor defense, AHBD)方法.通过梯度方向一致性分析定位后门,并基于神经网络不同层级的功能特性设计自适应的防御策略,同时引入对抗训练进一步破坏后门激活路径,提升模型泛化能力.实验表明,AHBD在CIFAR10与GTSRB数据集上显著降低攻击成功率(平均攻击成功率ASR分别下降至2.63%和1.71%),同时保持模型原始分类准确率(平均分类准确率ACC下降幅度低于1%),优于现有主流防御方法.

关键词: 深度学习, 深度神经网络, 后门攻击, 后门防御, 人工智能安全

CLC Number: