信息安全研究 ›› 2016, Vol. 2 ›› Issue (12): 1105-1109.

• 学术论文 • 上一篇    下一篇

基于新闻流的信息安全事件发现

徐建忠   

  1. 杭州世平信息科技有限公司
  • 收稿日期:2016-12-26 出版日期:2016-12-15 发布日期:2016-12-26
  • 通讯作者: 徐建忠
  • 作者简介:本科,工程师,主要研究方向为网络与信息安全.

Information Security Events Discovery Based on News Flow

  • Received:2016-12-26 Online:2016-12-15 Published:2016-12-26

摘要: 随着互联网的广泛普及,人们可以更方便地从网络上获取信息,甚至随时随地都可以通过网络同外界进行交互.方便获取信息的同时也带了诸如信息泄露、账户密码失窃等安全问题,因此信息安全越来越受到大众的关注.网络新闻作为时下的主流媒体之一,其中包含了大量人们关注的问题,包括近期发生的各种信息安全事件等.然而,这些信息往往淹没在海量的网络文档中,大众难以快速了解近期国内外发生的关于信息安全的大事件.因此,建立一种自动发现梳理信息安全事件的方法具有一定的现实意义.将单个句子作为表述“信息安全事件”的单元,应用机器学习算法判断句子中是否包含“信息安全事件”相关信息,从新闻文档中抽取出包含“信息安全事件”内容的句子作为所需要的结果.通过人工构建训练数据集、句子特征设计和支持向量机(support vector machine, SVM)模型训练,建立了一种自动从新闻文档中抽取“信息安全事件”相关句子的方法.实验结果表明,该方法在信息安全事件的发现方面有着较高的准确率和召回率,验证了所提方法的有效性.

Abstract: With the popularity of the Internet, people can more easily obtain information from the network and interact with the outside world via the Web in anytime or anywhere. With the access to information easily, security issues come out, such as information disclosure, account passwords stolen and so on, which rise more and more public concern on information security. Nowadays Web news is one of main social media, which contains a large number of public concerning issues, such as information security events. However, security information is often buried in the mass of Web documents, making it inconvenient to quickly obtain recent information security events for readers. Therefore, establishing a method to automatic extracting information security events is significant. In this paper, we regard single sentence as “information security” unit, applying machine learning algorithm to determine whether a sentence containing “information security events” or not. Sentences containing “information security events” are extracted from news documents as the desired results. Via manual training data construction, sentence feature designing and support vector machine (SVM) model training, we propose an automatic method to extract “information security” related sentences from news documents. Experiment result show that, the method discussed in this paper get high precision and recall in information security events discovery, which verify the effectiveness of proposed method.