基于机器学习的入侵检测模型对比研究

摘要/Abstract

摘要： 如今网络威胁不断衍变、隐蔽性越来越强，研究多种机器学习模型在现代流量数据上的入侵检测性能与特性，对提升入侵检测系统的时效性有较大意义.探索采用近些年高效机器学习模型，包括集成学习(如随机森林、LightGBM、XGBoost)与深度学习(如卷积、GRU、LSTM等)模型在公开数据集UNSWNB15上进行入侵检测任务.详细阐述任务流程与实验配置，对比分析不同模型评估指标，得出各模型在入侵检测任务中的特性.实践表明，在10%抽样数据集下，实验模型中二分类任务性能效率最优模型为LightGBM，F1分数为0.897,准确率为89.86%，训练时间为1.98s,预测时间为0.11s;实验中多分类任务最全面的检测模型为XGBoost，F1分数为0.7907，准确率为75.96%，训练时间为144.79s,预测时间为0.21s.

关键词: 入侵检测, 机器学习, 集成学习, 深度学习, 二分类, 多分类, UNSWNB15

Abstract: Nowadays, network threats are constantly evolving and demonstrate increasing invisibility. Studying the performance and characteristics of multiple machine learning models for intrusion detection on modern traffic data is of greater significance to improve the timeliness of intrusion detection systems. This paper explores the use of recent efficient machine learning models, including ensemble learning(Random Forest, XGBoost, LightGBM) and deep learning(CNN, LSTM, GRU, etc) models for intrusion detection tasks on the public dataset UNSWNB15.We elaborate the task flow and experimental configuration, compare and analyze the experimental results of different models, summarize the characteristics of each model in the network intrusion detection task. The experimental results demonstrate that, under a 10% sampled dataset of UNSWNB15, the bestperforming model for the binary classification task among the experimental models is LightGBM, with an F1 score of 0.897, an accuracy of 89.86%, a training time of 1.98s, and a prediction time of 0.11s. In the case of multiclassification tasks, the most comprehensive prediction model among the experimental models is XGBoost, with an overall F1 score of 0.7907, an accuracy of 75.96%, a training time of 144.79s, and a prediction time of 0.21s.

Key words: intrusion detection, machine learning, ensemble learning, deep learning, binary classification, multiclass classification, UNSWNB15

张鹏飞. 基于机器学习的入侵检测模型对比研究[J]. 信息安全研究, 2023, 9(8): 739-.

参考文献

［1］Vinayakumar R, Kp S, Poornachandran P. Evaluating effectiveness of shallow and deep networks to intrusion detection system［C］ Proc of the 2017 Int Conf on Advances in Computing,Communications and Informatics. Berlin: Springer, 2018: 12821289［2］Debarn H, Dacier M, Wespi A. Towards a taxonomy of intrusiondetection systems［J］.Computer Networks, 1999, 31(8): 805822［3］GarciaTeodoro P, DiazVerdejo J, MaciaFernandez G, et al. Anomalybased network intrusion detection: Techniques, systems and challenges［J］. Computers & Security, 2009, 28(12): 1828［4］Moustafa N, Slay J. UNSWNB15: A comprehensive data set for network intrusion detection systems (UNSWNB15 network dataset)［C］ Proc of the 2015 Military Communications and Information Systems Conf. Piscataway, NJ: IEEE, 2015: 16［5］Vinayakumar R, Kp S, Poornachandran P. Applying convolutional neural network for network intrusion detection［C］ Proc of the 2017 Int Conf on Advances in Computing, Communications and Informatics. Berlin: Springer, 2017: 12221228［6］Mirsky Y, Doitshman T, Elovici Y, et al. Kitsune: An ensemble of autoencoders for online network intrusion detection［J］. arXiv preprint, arXiv:1802.09089, 2018［7］黄屿璁, 张潮, 吕鑫, 等. 基于深度学习的网络入侵检测研究综述［J］. 信息安全研究, 2022, 8(12): 11631177［8］Delplace A, Hermoso S, Anandita K. Cyber attack detection thanks to machine learning algorithms［J］. arXiv preprint, arXiv:2001.06309, 2020［9］Yang L, Shami A, Stevens G, et al. LCCDE: A decisionbased ensemble framework for intrusion detection in the internet of vehicles［C］ Proc of the 2022 IEEE Global Communications Conf. Piscataway, NJ: IEEE, 2022: 35453550［10］Timcenko V, Gajin S. Ensemble classifiers for supervised anomaly based network intrusion detection［C］ Proc of the 13th IEEE Int Conf on Intelligent Computer Communication and Processing. Piscataway, NJ: IEEE, 2017: 1319［11］莫坤, 王娜, 李恒吉, 等. 基于LightGBM的网络入侵检测系统［J］.信息安全研究, 2019, 5(2): 152156［12］Chen. Convolutional neural network for sentence classification［D］. Waterloo, Canada: University of Waterloo, 2015［13］Chen T, Guestrin C. XGBoost: A scalable tree boosting system［C］ Proc of the 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Minning. New York: ACM, 2016: 785794［14］Ke Guolin, Meng Qi, Finley T, et al. LightGBM: A highly efficient gradient boosting decision tree［C］ Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 19

[1]	李敬. 基于卷积神经网络的加密代理流量识别方法[J]. 信息安全研究, 2023, 9(8): 722-.
[2]	杜林, 许传淇. 基于BERT的漏洞文本特征分类技术研究[J]. 信息安全研究, 2023, 9(7): 687-.
[3]	蒋明, 张宗凯, 刘熙尧, 郭标, 胡家馨, 张硕, . 基于多注意力机制的孪生网络图像隐写分析方法[J]. 信息安全研究, 2023, 9(6): 573-.
[4]	刘亦纯, 张光华, 宿景芳. 基于多级度量差值的神经网络后门检测方法[J]. 信息安全研究, 2023, 9(6): 587-.
[5]	王志强, 都迎迎, 林雨衡, 陈旭东, . 基于文本关键词的对抗样本生成技术研究[J]. 信息安全研究, 2023, 9(4): 338-.
[6]	王志强, 王姿旖, 倪安发, . 基于Stacking集成学习的区块链异常交易检测技术研究[J]. 信息安全研究, 2023, 9(2): 98-.
[7]	张晴晴, 田潇, 田锦, . 基于区块链预言机的车联网可信身份方案研究[J]. 信息安全研究, 2023, 9(2): 120-.
[8]	郝卓楠, 谌颐. 融入智能检测引擎的AI防火墙[J]. 信息安全研究, 2022, 8(E1): 143-.
[9]	王中华, 徐杰, 韩健, 臧天宁. 基于卷积神经网络的恶意区块链域名检测方法[J]. 信息安全研究, 2022, 8(8): 760-.
[10]	颜祺, 牛彦杰, 陈国友. 基于深度学习的信息高保密率传输方法[J]. 信息安全研究, 2022, 8(8): 793-.
[11]	周梓馨, 张功萱, 寇小勇, 杨威. 一种基于自注意力机制的深度学习侧信道攻击方法[J]. 信息安全研究, 2022, 8(8): 812-.
[12]	刘小乐, 方勇, 黄诚, 许益家. 基于深度图卷积神经网络的Exploit Kit攻击活动检测方法[J]. 信息安全研究, 2022, 8(7): 685-.
[13]	彭长根, . 人工智能安全治理挑战与对策[J]. 信息安全研究, 2022, 8(4): 318-.
[14]	金志刚周峻毅何晓勇. 面向自然语言处理领域的对抗攻击研究与展望[J]. 信息安全研究, 2022, 8(3): 202-.
[15]	桓琦, 谢小权, 郭敏, 曾颖明, . 针对深度强化学习导航的物理对抗攻击方法[J]. 信息安全研究, 2022, 8(3): 212-.