信息安全研究 ›› 2022, Vol. 8 ›› Issue (9): 908-.

• 学术论文 • 上一篇    下一篇

微博截图中的用户观点定位方法研究

王桂江,黄润才,马诗语,黄小刚,王承茂   

  1. (上海工程技术大学电子电气工程学院上海201620)
  • 出版日期:2022-09-02 发布日期:2022-09-02
  • 通讯作者: 王桂江 硕士研究生.主要研究方向为模式识别与情感分析. guijiang_wang@163.com
  • 作者简介:王桂江 硕士研究生.主要研究方向为模式识别与情感分析. guijiang_wang@163.com 黄润才 博士,副教授.主要研究方向为机器学习、自然语言处理、计算机网络和大数据. runcai_huang@163.com 马诗语 硕士研究生.主要研究方向为自然语言处理. shiyu_ma_2021@163.com 黄小刚 硕士研究生.主要研究方向为情感分析. xiaoganghuang_2020@163.com 王承茂 硕士研究生.主要研究方向为机器学习. chengmao_wang2020@163.com

Research on User Viewpoint Positioning Method in Microblog Screenshots

  • Online:2022-09-02 Published:2022-09-02

摘要: 互联网的迅速发展为人们的生活提供了极大便利,但也为有害思想的传播提供了温床和便捷.网络截图成为信息传递的新手段,获取其中用户观点通常需要先进行文本识别,然后运用自然语言处理方式进行数据清洗,但部分关键信息可能在语言处理过程中遗失,导致数据失真.结合信息安全背景,提出一种在文本图像中寻找特定文本区域的微博截图用户观点定位方法.首先对字符区域感知模型进行迁移学习,增强其在目标任务上的泛化能力;然后使用训练后的字符区域感知模型进行字符级定位;接着使用逻辑推理对单字符形态进行分析,根据不同字符具有不同的外观特征和同行文本具有相似的行特征识别用户观点文本行;最后将逻辑定位结果与模型定位结果进行融合.实验结果表明,对微博截图的用户观点筛选能力较好,能有效实现用户观点定位,实现在文本图像中获取特定文本区域的目的.

关键词: 文本检测, 信息安全, 逻辑推理, CRAFT, 迁移学习

Abstract: The rapid development of the Internet provides great convenience to people’s life, but it also provides a breeding ground and convenience for the spread of harmful thoughts. Network screenshots have become a new means of information transmission, the acquisition of user viewpoint usually requires text recognition first, and then uses natural language processing for data cleaning. However, some key information may be lost in the process of language processing, resulting in data distortion. Combined with the background of information security, this paper proposes a method to locate the user viewpoint in microblog screenshots by looking for specific text areas in text images. Firstly, transfer learning of the character region perception model is performed to enhance its generalization ability on the target task, and the character level positioning of the trained character region perception model is used. Then, the single character shape is analyzed by logical reasoning, and the user viewpoint text line is recognized according to the different appearance characteristics of different characters and the similar line characteristics of the characters on the same line. Finally, the logical location results are fused with the model location results. The experimental results show that the method provides a good ability to filter user viewpoint in microblog screenshots, can effectively locate user viewpoint, and achieve the purpose of obtaining specific text areas in text images.


Key words: text detection, information security, logical reasoning, CRAFT, transfer learning