信息安全研究 ›› 2024, Vol. 10 ›› Issue (12): 1082-.

• 综合安全防御体系专题 • 上一篇    下一篇

融合二次特征提取和自蒸馏的流量异常检测方法

陈万志1赵林1王天元2   

  1. 1(辽宁工程技术大学软件学院辽宁葫芦岛125105)
    2(国网辽宁省电力有限公司营口供电公司辽宁营口115002)
  • 出版日期:2024-12-25 发布日期:2024-12-25
  • 通讯作者: 陈万志 博士,副教授.主要研究方向为人工智能与智能信息处理、网络与信息安全、工控软件与数据分析. chenwanzhi@lntu.edu.cn
  • 作者简介:陈万志 博士,副教授.主要研究方向为人工智能与智能信息处理、网络与信息安全、工控软件与数据分析. chenwanzhi@lntu.edu.cn 赵林 硕士研究生.主要研究方向为网络安全. 18242194878@163.com 王天元 工程师.主要研究方向为电力安全与审计. 654771112@qq.com

Traffic Anomaly Detection Method by Secondorder Feature 

Chen Wanzhi1, Zhao Lin1, and Wang Tianyuan2   

  1. 1(College of Software, Liaoning Technical University, Huludao, Liaoning 125105)
    2(State Grid Liaoning Electric Power Supply Co., Ltd., Yingkou, Liaoning 115002)
  • Online:2024-12-25 Published:2024-12-25

摘要: 针对深度学习模型在处理非平衡的海量高维流量数据时对少数类攻击流量检测率低的问题,提出一种融合二次特征提取和自蒸馏的流量异常检测方法.首先,采用隔离森林(isolation forest, iForest)去除正常类样本中的离群点,训练改进的卷积去噪编码器(convolutional denoising autoencoder, CDAE),减少数据中噪声和离群点对模型训练时的影响,得到原始特征的低维增强表示.其次,借助ADASYN在去除离群点的数据集上合成少数类攻击样本,解决数据失衡问题.然后,再利用iForest清除生成新样本中的离群点得到新数据集,利用训练好的CDAE对新数据集进行1次特征提取,提取的特征作为基于自蒸馏的ResNet模型输入完成2次特征提取.最后,通过组合训练好的CDAE和ResNet模型实现对异常流量的精准识别.该方法在NSLKDD数据集上五分类准确率和F1分数最高分别达到91.52%和92.05%.实验结果表明,与现有的方法相比,该方法能够有效提升对少数攻击流量的检测率.

关键词: 流量异常检测, 卷积去噪自编码器, 自蒸馏, 隔离森林, 自适应合成采样

Abstract: A method is proposed to address the challenge of low detection rates for minority class attack traffic in deep learning models when dealing with imbalanced massive highdimensional network traffic data. Firstly, the isolation forest (iForest) is employed to remove outliers from normal class samples, used for training an enhanced Convolutional Denoising Autoencoder (CDAE) to mitigate the impact of noise and outliers on model training, resulting in a lowdimensional enhanced representation of the original features. Secondly, leveraging ADASYN on the outlierfree dataset to synthetically generate minority class attack samples, thereby resolving the data imbalance issue. Subsequently, using iForest to clean the newly generated samples from outliers, a new dataset is obtained. Employing the pretrained CDAE on this dataset achieves a firstround feature extraction, and the extracted features serve as input for a selfdistilled ResNet model to perform secondorder feature extraction. Finally, precise identification of anomalous traffic is accomplished by combining the trained CDAE and ResNet models. The method achieves the highest fiveclass accuracy and F1 score of 91.52% and 92.05%, respectively, on the NSLKDD dataset. Experimental results demonstrate that, compared to existing methods, this approach effectively enhances the detection rates for minority class attack traffic.

Key words: traffic anomaly detection, convolutional denoising autoencoder, selfdistillation, isolation forest, adaptive synthetic sampling

中图分类号: