信息安全研究 ›› 2019, Vol. 5 ›› Issue (9): 843-846.

• 技术应用 • 上一篇    下一篇

基于似然比的短文本作者归属研究

李孟林   

  1. 中国刑事警察学院网络犯罪侦查系
  • 收稿日期:2019-09-06 出版日期:2019-09-15 发布日期:2019-09-06
  • 通讯作者: 李孟林
  • 作者简介:李孟林 硕士研究生,主要研究方向为网络安全执法技术. 377906547@qq.com

The Author Attribution of the Short Text Based on the Likelihood Ratio

  • Received:2019-09-06 Online:2019-09-15 Published:2019-09-06

摘要: 伴随信息技术在日常生活中的普及,互联网短文本作为电子数据证据的案例越来越多.国际上针对此类问题的研究已经很多,并积累了一定的成熟经验.然而,由于中文自身的特点和复杂性,西方国家主要以英文为应用场景的研究成果在中文场景下并不能很好地适用,因此研究适合于中文应用场景的短文本消息作者归属算法具有一定的现实意义.基于N-gram模型,利用似然比(likelihood ratio, LR)方法,通过词频的分布特征来确定短文本的作者归属.实验结果表明,该方法取得了比较好的归属效果.

关键词: 短文本, 电子数据, N-gram, 作者归属, 似然比

Abstract: With the popularization of information technology in daily life, there are more and more cases of short Internet texts as electronic evidence data. International research on such issues has been comparatively rich and accumulated some mature experience. However, due to the characteristics and complexity of Chinese language, the research results of western countries, which mainly take English as the application scene, are not very applicable to Chinese scene. Therefore, it is of practical significance to focus on the author attribution algorithm of short text messages which are suitable for Chinese application scenarios. Based on the Ngram model and the likelihood ratio method, this paper determines the author attribution of short text through the distribution feature of word frequency. The experimental results show that this method has achieved a better attribution effect.

Key words: short text, electronic data, N-gram, author attribution, likelihood ratio