Journal of Information Security Reserach ›› 2023, Vol. 9 ›› Issue (11): 1067-.

Previous Articles     Next Articles

A Unified Model for Information Extraction in the Field of  Network Security for Small Samples

Bu Tian, Zhang Long, Gu Dujuan, Yuan Jun, Zhang Ruikang, and Li Wenjin#br#

#br#
  

  1. (NSFOCUS Technologies Group Co., Ltd., Beijing 100089)

  • Online:2023-11-06 Published:2023-11-30

用于小样本的网络安全领域信息抽取统一模型

卜天张龙顾杜娟袁军章瑞康李文瑾   

  1. (绿盟科技集团股份有限公司北京100089)

  • 通讯作者: 卜天 硕士.主要研究方向为自然语言处理、信息抽取、文本生成. butian1997@163.com
  • 作者简介:卜天 硕士.主要研究方向为自然语言处理、信息抽取、文本生成. butian1997@163.com 张龙 主要研究方向为自然语言处理、信息抽取、知识图谱. zhanglong5@nsfocus.com 顾杜娟 博士.主要研究方向为安全体系架构、威胁建模、人工智能. gudujuan@nsfocus.com 袁军 主要研究方向为威胁建模、知识图谱、攻防对抗、网络安全、数据分析. yuanjun@nsfocus.com 章瑞康 硕士.主要研究方向为恶意软件分析、威胁建模、知识图谱、图表示学习. zhangruikang@nsfocus.com 李文瑾 硕士.主要研究方向为威胁检测技术、网络攻防对抗技术、知识图谱. liwenjin@nsfocus.com

Abstract: Threat intelligence in the Internet is growing day by day, and major network security platforms tend to use automated means to extract important information from it. However, many studies in the past separately modeled tasks such as entity recognition, relationship extraction and event extraction, which resulted in high cost of multi model management, large demand for data and poor knowledge sharing ability. To this end, this paper applies the unified extraction to the field of network security, and proposes an information extraction model in the field of network securityMRCUIE. In addtion, the article also designs the entity construction, prompt template design and model optimization in the field of network security. Finally, MRCUIE is tested on multiple network security data sets. The results show that MRCUIE has improved on 83% of the data sets, and its F1 value is 1%~3% better than the single extraction model, and 9%~14% better than the unified extraction model. Later, MRCUIE was tested on small samples, and it was found that the model only needed 10 samples to get good results, which verified the small sample capacity of MRCUIE.

Key words: network security, information extraction, unified information extraction, fewshot, prompt template

摘要: 互联网中的威胁情报与日俱增,各大网络安全平台趋于使用自动化手段抽取其中的重要信息.然而,过去的诸多研究对实体识别、关系抽取和事件抽取等任务分开建模,导致多模型管理成本高、数据量需求大、知识共享能力差等问题.为此,将统一式抽取应用到网络安全领域上,提出了一种网络安全领域信息抽取模型——MRCUIE.同时,还对网络安全领域的实体构建、prompt模板设计和模型优化等内容进行了设计.最后,将MRCUIE在多个网络安全数据集上进行测试.结果表明,MRCUIE在83%的数据集上均有提升,其F1值优于单一抽取模型1%~3%,优于统一式抽取模型9%~14%.随后,将MRCUIE在小样本上进行测试,发现模型仅需要10个样本就能得到较好效果,验证了MRCUIE的小样本能力.

关键词: 网络安全, 信息抽取, 统一式信息抽取, 小样本, prompt 模板

CLC Number: