基于新闻流的信息安全事件发现

信息安全研究 ›› 2016, Vol. 2 ›› Issue (12): 1105-1109.

基于新闻流的信息安全事件发现

徐建忠

杭州世平信息科技有限公司

收稿日期:2016-12-26 出版日期:2016-12-15 发布日期:2016-12-26
通讯作者: 徐建忠
作者简介:本科，工程师，主要研究方向为网络与信息安全.

Information Security Events Discovery Based on News Flow

Received:2016-12-26 Online:2016-12-15 Published:2016-12-26

摘要/Abstract

摘要： 随着互联网的广泛普及，人们可以更方便地从网络上获取信息，甚至随时随地都可以通过网络同外界进行交互.方便获取信息的同时也带了诸如信息泄露、账户密码失窃等安全问题，因此信息安全越来越受到大众的关注.网络新闻作为时下的主流媒体之一，其中包含了大量人们关注的问题，包括近期发生的各种信息安全事件等.然而，这些信息往往淹没在海量的网络文档中，大众难以快速了解近期国内外发生的关于信息安全的大事件.因此，建立一种自动发现梳理信息安全事件的方法具有一定的现实意义.将单个句子作为表述“信息安全事件”的单元，应用机器学习算法判断句子中是否包含“信息安全事件”相关信息，从新闻文档中抽取出包含“信息安全事件”内容的句子作为所需要的结果.通过人工构建训练数据集、句子特征设计和支持向量机(support vector machine, SVM)模型训练，建立了一种自动从新闻文档中抽取“信息安全事件”相关句子的方法.实验结果表明，该方法在信息安全事件的发现方面有着较高的准确率和召回率，验证了所提方法的有效性.

Abstract: With the popularity of the Internet, people can more easily obtain information from the network and interact with the outside world via the Web in anytime or anywhere. With the access to information easily, security issues come out, such as information disclosure, account passwords stolen and so on, which rise more and more public concern on information security. Nowadays Web news is one of main social media, which contains a large number of public concerning issues, such as information security events. However, security information is often buried in the mass of Web documents, making it inconvenient to quickly obtain recent information security events for readers. Therefore, establishing a method to automatic extracting information security events is significant. In this paper, we regard single sentence as “information security” unit, applying machine learning algorithm to determine whether a sentence containing “information security events” or not. Sentences containing “information security events” are extracted from news documents as the desired results. Via manual training data construction, sentence feature designing and support vector machine (SVM) model training, we propose an automatic method to extract “information security” related sentences from news documents. Experiment result show that, the method discussed in this paper get high precision and recall in information security events discovery, which verify the effectiveness of proposed method.

徐建忠. 基于新闻流的信息安全事件发现[J]. 信息安全研究, 2016, 2(12): 1105-1109.

参考文献

［1］陈训逊, 方滨兴, 胡铭曾, 等. 一个网络信息内容安全的新领域——网络信息渗透检测技术［J］. 通信学报, 2004, 25(7): 185191［2］Fang B X, Guo Y C, Zhou Y. Information content security on the Internet: The control model and its evaluation［J］. Science China: Information Sciences, 2010, 53(1): 3049［3］万源. 基于语义统计分析的网络舆情挖掘技术研究［D］. 武汉: 武汉理工大学, 2012［4］Hai L, Hwee T. A maximum entropy approach to information extraction from semistructured and free text［C］ Proc of the 18th National Conf on Artificial Intelligence. Berlin: Springer, 2002: 786791［5］David A. The stages of event extraction［C］  Proc of the Workshop on Annotations and Reasoning about Time and Events. Berlin: Springer, 2006: 18［6］Guo W, Li H, Ji H. Linking tweets to news: A framework to enrich short text data in social media［C］ Proc of the 51st Annual Meeting of the Association for Computational Linguistics. Berlin: Springer, 2013: 239249［7］Benson E, Haghighi A, Barzilay R. Event discovery in social media feeds［C］ Proc of the 49th Annual Meeting of the Association for Computational Linguistics. Piscataway, NJ: IEEE, 2011: 8797［8］Ritter A, Wright E, Casey W H, et al. Weakly supervised extraction of computer security events from twitter［C］ Proc of the 24th Int Conf on World Wide Web (WWW15). New York: ACM, 2015: 896905［9］Huang C, Tian Y, Zhou Z. Keyphrase extraction using semantic networks structure analysis ［C］Proc of the 6th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2006: 275284［10］Jenny R, Trond G, Christopher M. Incorporating nonlocal information into information extraction systems by gibbs sampling［EBOL］. (20050928) ［20160312］. http:nlp.stanford.edu~manningpapersgibbscrf3.pdf［11］Chickering D, Heckerman D, Meek C. A Bayesian approach for learning Bayesian networks with local structure［C］ Proc of the 13th Conf on Uncertainty in Artificial Intelligence. Los Angeles: Morgan Kaufmann, 1997: 8089［12］Schutze H, Hull D, Pedersen J. A comparison of classifiers and document representations for the routing problem［C］ Proc of the 18th ACM Int Conf on Research and Development in Information Retrieval. New York: ACM, 1995: 229237［13］Holmes G, Donkin A, Witten I H. WEKA: A machine learning workbench［C］ Proc of the 2nd Australia and New Zealand Conf on Intelligent Information Systems. Piscataway, NJ: IEEE, 1994: 357361［14］Chang C C, Lin C J. LIBSVM: A library for support vector machines ［J］. ACM Trans on Intelligent Systems Technology, 2011, 2(3): 2729