信息安全研究 ›› 2020, Vol. 6 ›› Issue (5): 396-403.

• 学术论文 • 上一篇    下一篇

基于改进Single-Pass的新闻话题检测与追踪技术研究

张帆1,潘亚雄2,胡勇3   

  1. 1. 四川大学网络空间安全学院
    2. 中物院成都科学技术发展中心
    3. 四川大学
  • 收稿日期:2020-04-29 出版日期:2020-05-15 发布日期:2020-04-29
  • 通讯作者: 张帆
  • 作者简介:张帆 硕士研究生,主要研究方向为网络数据分析与数据安全 1105988653@qq.com 潘亚雄 硕士研究生,高级工程师,主要研究方向为网络信息安全 panyaxiong@163.com 胡勇 博士研究生,研究员,主要研究方向为网络信息安全 huyong@scu.edu.cn

Research on News Topic Detection and Tracking Technology Based on Improved Single-Pass

  • Received:2020-04-29 Online:2020-05-15 Published:2020-04-29

摘要: 为解决如何从海量新闻报道中检测并追踪到目标话题,选择了自增式聚类Single-Pass算法进行研究。在原有的基础上对其进行改进得到改进后的Single-Pass聚类算法,期望能得到更好的解决方法。对于原有算法进行的改进主要有在新闻文本的特征词选取中加入权重系数表达特征词位置信息,同时辅以时间特征进行新闻文本相似度计算,并且在Single-Pass聚类算法步骤中添加子话题阈值判断过程。实验验证改进后的Single-Pass聚类算法不止可得到不同粒度的话题聚类效果,同时也提升了聚类效率。实验结果证明,在相同条件下,改进Single-Pass聚类算法的漏检率和误检率上有明显的改善。

关键词: 新闻话题, Single-Pass聚类算法, 时间特征, 相似度, 子话题

Abstract: In order to solve the problem of how to detect and track the target topic from massive news reports, an auto-increasing clustering Single-Pass algorithm was selected to research. Based on the improvement of the original Single-Pass clustering algorithm, it is expected to get a better solution. The improvement of the original algorithm mainly includes adding weight coefficients to select feature words in news text to express feature word position information, supplemented by temporal features to calculate similarity of news text, and adding sub-segments in the Single-Pass clustering algorithm Topic threshold judgment process. The experiments verify that the improved Single-Pass clustering algorithm can not only obtain the clustering effect of topics with different granularities, but also improve the clustering efficiency. The experimental results show that under the same conditions, the missed detection rate and false detection rate of the improved Single-Pass clustering algorithm are significantly improved.

Key words: News topics, Single-Pass clustering algorithm, temporal characteristics, similarity, sub-topic