信息安全研究 ›› 2015, Vol. 1 ›› Issue (3): 272-277.

• 学术论文 • 上一篇    下一篇

安全攻击追踪分析中短文本分类研究

黄克敏 方勇   

  1. 1. 四川大学电子信息学院2. 四川大学
  • 收稿日期:2015-11-23 修回日期:2015-11-30 出版日期:2015-12-15 发布日期:2016-01-18
  • 通讯作者: 黄克敏

Research on short text classification in security attack tracking and analysis

  • Received:2015-11-23 Revised:2015-11-30 Online:2015-12-15 Published:2016-01-18

摘要: 近年来,伴随着大数据时代信息技术的飞速发展同时,信息安全研究领域也得到快速发展,越来越多的网络信息安全攻击事件不断发生和被报道。为进一步保障网络信息安全,建立大数据下的网络信息安全攻击追踪分析系统显得尤为重要。大数据下的网络信息安全攻击追踪分析系统是基于网络攻击事件的发生总会在网络上留下大量的网络攻击痕迹这一事实,借助大数据分析平台对多源海量数据快速分析处理的优势,进行多维度、多角度的关联分析,对可能发生的网络信息安全攻击事件进行预测和已经发生的网络信息安全攻击事件进行追踪及分析。其中基于大数据下的网络信息安全攻击追踪分析系统中涉及到很重要的一项技术:文本的分类。本文选择朴素贝叶斯作为文本分类算法,由于朴素贝叶斯分类算法的特征项间独立性假设在现实中一般很难满足,为了在一定的程度上放宽这一假设,本文提出了一种基于特征项改进权重朴素贝叶斯的分类方法,此分类方法基于改进卡方统计特征项选择方法和加权朴素贝叶斯分类算法相结合,充分考虑特征项对分类作用的大小和各特征项之间的依赖关系,并用语料库样本进行相应的实验。实验结果表明基于特征项改进权重朴素贝叶斯分类方法比改进前的分类效果有一定的提高。

关键词: 大数据, 网络信息安全, 网络攻击, 文本分类, 朴素贝叶斯分类法, 改进权重朴素贝叶斯分类方法

Abstract: In recent years, with the rapid development of information technology in the era of big data, information security research has also been rapid development, more and more network information security attacks continue to occur and are reported. In order to protect the information security of the network, it is very important to establish the network information security attack and tracking system. Network information security attack and tracking analysis system is based on network attacks, which can leave a large amount of network attack traces, and the advantages of large data analysis platform for multi-source massive data analysis, multi dimension and multi angle analysis. Which is a very important technology in the network information security attack and tracking analysis system based on large data. This paper chooses Naive Bayesian as a text classification algorithm, because the characteristics of the Naive Bayesian classification algorithm is generally very difficult to meet in the reality, in order to relax the assumption in a certain degree, this paper proposes a classification method based on feature items improved weight Naive Bayesian, which is based on the improved chi square statistical feature selection method and weighted Naive Bayesian classification algorithm. The experimental results show that the classification method based on the feature of the improved weight of the Naive Bayesian classification method has a certain improvement compared with the classification results.

Key words: Big Data, Network Information Security, Network Attack, Text Classification, Native Bayes Classifier, Improve the method of weighted naive bayes classification