Journal of Information Security Reserach ›› 2024, Vol. 10 ›› Issue (8): 760-.

Previous Articles     Next Articles

Multilabel Classification Method of Open Source Threat Intelligence Text Based on BertTextCNN

Lu Jiali   

  1. (Beijing Topsec Network Security Technology Co., Ltd., Beijing 100193)
  • Online:2024-08-08 Published:2024-08-09

基于BertTextCNN的开源威胁情报文本的多标签分类方法

陆佳丽   

  1. (北京天融信网络安全技术有限公司北京100193)
  • 通讯作者: 陆佳丽 硕士.主要研究方向为网络安全、威胁情报、深度学习. lu_jiali@qq.com
  • 作者简介:陆佳丽 硕士.主要研究方向为网络安全、威胁情报、深度学习. lu_jiali@qq.com

Abstract: Open source threat intelligence is very important for network security protection, but it has the characteristics of wide distribution, many forms and loud noise. Therefore, how to organize and analyze the collected massive open source threat intelligence efficiently has become an urgent problem to be solved. Therefore, this paper explores a multilabel classification method based on BertTextCNN model, considering the title, text, and regular judgment. According to the characteristics of the text published by the intelligence source, the article sets regular judgment rules to make up for the deficiency of the model. In order to fully reflect the threat topics involved in the open source threat intelligence text, the paper sets the BertTextCNN multilabel classification model for the title and the text respectively, and then resorts the two labels to get the final threat category of the text. Compared with the BertTextCNN multilabel classification model based on text only, the performance of the proposed model is improved, and the recall rate is significantly improved, which can provide valuable reference for the classification of open source threat intelligence.

Key words: Open source threat intelligence, Multi-label classification, text classification, BERT model

摘要: 开源威胁情报对网络安全防护十分重要,但其存在着分布广、形式多、噪声大的特点.所以如何能对收集到的海量开源威胁情报进行高效的整理和分析就成为亟需解决的问题.因此,探索了一种以BertTextCNN模型为基础且同时考虑标题、正文和正则判断的多标签分类方法.根据情报源发布文本的特点,设置正则判断规则,以弥补模型的欠缺;为更全面反映开源威胁情报文本所涉及的威胁主题,针对标题和正文分别设置了BertTextCNN多标签分类模型,并将2部分标签整理去重以得到文本的最终威胁类别.通过与只依据正文建立的BertTextCNN多标签分类模型进行对比,所设置的模型在性能上有所提升,且召回率提升明显,能为开源威胁情报分类工作提供有价值的参考.


关键词: 开源威胁情报, 多标签分类, 文本分类, BERT模型

CLC Number: