Journal of Information Security Research ›› 2020, Vol. 6 ›› Issue (6): 0-0.

    Next Articles

Flow Anomaly Detection Based on Hierarchical Clustering Method

  

  • Received:2020-06-08 Online:2020-06-05 Published:2020-06-09

基于层次聚类方法的流量异常检测

蹇诗婕1,卢志刚1,姜波1,刘玉岭1,刘宝旭2   

  1. 1. 中国科学院信息工程研究所
    2. 中国科学院信息工所
  • 通讯作者: 蹇诗婕

Abstract: With the advent of the big data era, the attacks in network traffic are rising dramatically. Detecting malicious traffic through abnormal flow detection is vital. Nowadays, the equipment of abnormal flow detection used in industry mainly adopts statistical analysis method or simple machine learning method. However, the amount of flow data and redundant data is large. The precision rate is low and the false alarm rate is high. In order to solve these problems, this paper presents a new method to detect flow anomalies based on hierarchical clustering in data processing. This method first uses the hierarchical clustering algorithm to achieve the purpose of data reduction. Then based on seven different machine learning algorithms, an abnormal traffic model based on hierarchical clustering is constructed. The experimental results show that this method can detect the abnormal behavior on the DARPA dataset with a precision rate of 99% and a recall rate of 99%. At the same time, while maintaining the precision rate of 90%, the data reduction can be up to 47.58%, which greatly improves the detection efficiency.

Key words: flow anomaly detection, data preprocessing, data reduction, hierarchical clustering, machine learning methods

摘要: 随着大数据时代的到来,网络安全中攻击总流量大幅上升,通过异常流量检测发现网络中的恶意流量成为当前亟需解决的问题.目前工业中使用的异常流量检测设备主要采用统计分析方法或简单的机器学习方法,存在网络流量数据量巨大,冗余的正常数据量较多,精准率较低,误报率较高等问题.针对此类问题,提出了一种作用于数据处理阶段的基于层次聚类的流量异常检测方法.该方法先使用层次聚类算法达到数据约减的目的,然后基于7种不同的机器学习算法构建了基于层次聚类的异常流量模型.实验结果表明,该方法在DARPA数据集上对异常行为的检测精准率可达到99%,召回率可达到99%.同时,在保持90%精准率情况下,最佳数据约减量可达47.58%,极大地提升了检测效率.

关键词: 流量异常检测, 数据预处理, 数据约减, 层次聚类, 机器学习方法