Journal of Information Security Research ›› 2021, Vol. 7 ›› Issue (3): 225-232.

Previous Articles     Next Articles

Research on Intrusion Detection Model based on Multiple Feature Selection Strategies

  

  • Received:2021-03-09 Online:2021-03-05 Published:2021-03-17

基于多种特征选择策略的入侵检测模型研究

何红艳,黄国言,张炳,陈瑜   

  1. 燕山大学信息科学与工程学院
  • 通讯作者: 何红艳
  • 作者简介:何红艳,女(1992-), 目前在燕山大学攻读计算机科学与技术博士学位。研究方向:数据挖掘、网络安全和入侵检测。E-mail: hhongyan@stumail.ysu.edu.cn 黄国言,男(1969-),2006年获河北燕山大学博士学位。现为燕山大学信息科学与工程学院教授。他的研究兴趣包括网络协作技术和软件安全。他是中国计算机学会和美国计算机学会的资深会员。国家自然科学基金项目(61772451)负责人。E-mail: hgy@ysu.edu.cn 张炳,男(1989-),中国燕山大学信息科学与工程学院讲师,博士后。研究方向:软件安全和数据挖掘。E-mail: bingzhang@ysu.edu.cn 陈瑜,女(1994-),于2019年获得燕山大学硕士学位。研究方向:入侵检测,网络异常检测。E-mail: 1485092925@qq.com

Abstract: Intrusion detection is an effective method to prevent host and network attacks. The use of intrusion detection systems makes up for the shortcomings of traditional firewall technology, signature authentication technology, and access control technology in terms of security protection. However, the mutual redundancy between the features of intrusion detection data samples interferes with the accuracy and efficiency of attack detection. The feature selection method can effectively reduce the dimension of data features and eliminate redundant features, select the optimal sub-features and improve the accuracy of network traffic anomaly detection. Based on this, this article first uses the K-means algorithm to extract typical data from the real traffic data set UNSW-NB15, generates a data set with typical data characteristics as the feature extraction data set, and then uses 9 different strategies for intrusion on the data set. The detection model has conducted network intrusion detection experiments. The experimental results show that the method can effectively detect and classify, and the accuracy of two classifications of normal traffic and malicious traffic is 88.27%, which is higher than other machine learning algorithms. In addition, the detection rate of attack types with less sample data is improved in the study of multi-category classification. The effectiveness of the method is verified and it is easy to use.

Key words: intrusion detection, feature selection, UNSW-NB15, recursive feature elimination (RFE), logistic regression (LR)

摘要: 入侵检测是防止主机和网络攻击的有效方法。入侵检测系统的使用弥补了传统防火墙技术、签名认证技术、访问控制技术在安全保护方面的不足。但是,由于入侵检测数据样本特征之间存在互冗余性干扰了攻击检测的准确性和效率。特征选择方法能有效降低数据特征的维度和消除冗余特征,选出最优特征子集并提高网络流量异常检测的准确率。基于此,本文首先使用Kmeans聚类算法在真实流量数据集UNSW-NB15提取典型数据,生成具有典型数据特征的数据集作为特征提取的数据集,随后在该数据集上分别使用了9种不同策略的入侵检测模型进行了网络入侵检测实验。实验结果表明,该方法能够进行有效检测和分类,正常流量、恶意流量二分类精度为88.27%,高于其他机器学习算法。并且在进行多类分类研究时,样本数据少的攻击类型的检测率均有提高。验证了该方法的有效性,易于使用。

关键词: 入侵检测, 特征选择, UNSW-NB15, 特征递归消除(RFE), 逻辑回归(LR)