A Unified Model for Information Extraction in the Field of  Network Security for Small Samples

Journal of Information Security Reserach ›› 2023, Vol. 9 ›› Issue (11): 1067-.

Previous Articles Next Articles

A Unified Model for Information Extraction in the Field of Network Security for Small Samples

Bu Tian, Zhang Long, Gu Dujuan, Yuan Jun, Zhang Ruikang, and Li Wenjin#br#

#br#

(NSFOCUS Technologies Group Co., Ltd., Beijing 100089)

Online:2023-11-06 Published:2023-11-30

用于小样本的网络安全领域信息抽取统一模型

卜天张龙顾杜娟袁军章瑞康李文瑾

(绿盟科技集团股份有限公司北京100089)

通讯作者: 卜天硕士.主要研究方向为自然语言处理、信息抽取、文本生成. butian1997@163.com
作者简介:卜天硕士.主要研究方向为自然语言处理、信息抽取、文本生成. butian1997@163.com 张龙主要研究方向为自然语言处理、信息抽取、知识图谱. zhanglong5@nsfocus.com 顾杜娟博士.主要研究方向为安全体系架构、威胁建模、人工智能. gudujuan@nsfocus.com 袁军主要研究方向为威胁建模、知识图谱、攻防对抗、网络安全、数据分析. yuanjun@nsfocus.com 章瑞康硕士.主要研究方向为恶意软件分析、威胁建模、知识图谱、图表示学习. zhangruikang@nsfocus.com 李文瑾硕士.主要研究方向为威胁检测技术、网络攻防对抗技术、知识图谱. liwenjin@nsfocus.com

Abstract

Abstract: Threat intelligence in the Internet is growing day by day, and major network security platforms tend to use automated means to extract important information from it. However, many studies in the past separately modeled tasks such as entity recognition, relationship extraction and event extraction, which resulted in high cost of multi model management, large demand for data and poor knowledge sharing ability. To this end, this paper applies the unified extraction to the field of network security, and proposes an information extraction model in the field of network securityMRCUIE. In addtion, the article also designs the entity construction, prompt template design and model optimization in the field of network security. Finally, MRCUIE is tested on multiple network security data sets. The results show that MRCUIE has improved on 83% of the data sets, and its F1 value is 1%~3% better than the single extraction model, and 9%~14% better than the unified extraction model. Later, MRCUIE was tested on small samples, and it was found that the model only needed 10 samples to get good results, which verified the small sample capacity of MRCUIE.

Key words: network security, information extraction, unified information extraction, fewshot, prompt template

摘要： 互联网中的威胁情报与日俱增，各大网络安全平台趋于使用自动化手段抽取其中的重要信息.然而，过去的诸多研究对实体识别、关系抽取和事件抽取等任务分开建模，导致多模型管理成本高、数据量需求大、知识共享能力差等问题.为此，将统一式抽取应用到网络安全领域上，提出了一种网络安全领域信息抽取模型——MRCUIE.同时，还对网络安全领域的实体构建、prompt模板设计和模型优化等内容进行了设计.最后，将MRCUIE在多个网络安全数据集上进行测试.结果表明，MRCUIE在83%的数据集上均有提升，其F1值优于单一抽取模型1%~3%，优于统一式抽取模型9%~14%.随后，将MRCUIE在小样本上进行测试，发现模型仅需要10个样本就能得到较好效果，验证了MRCUIE的小样本能力.

关键词: 网络安全, 信息抽取, 统一式信息抽取, 小样本, prompt 模板

CLC Number:

TP309

卜天, 张龙, 顾杜娟, 袁军, 章瑞康, 李文瑾, . 用于小样本的网络安全领域信息抽取统一模型[J]. 信息安全研究, 2023, 9(11): 1067-.

References

［1］He Bingjun, Chen Jianfeng. Named entity recognition method in network security domain based on BERTBiLSTMCRF［C］ Proc of the 21st IEEE Int Conf on Communication Technology (ICCT). Piscataway, NJ: IEEE, 2021: 508512［2］陈浩, 王兴芬. 基于神经网络的嵌套命名实体关系抽取模型［J］. 情报工程, 2022, 7(6): 0670755［3］Lin Yankai. Neural relation extraction with selective attention over instances［C］ Proc of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2016: 21242133［4］秦娅, 申国伟, 赵文波, 等. 基于深度神经网络的网络安全实体识别方法［J］. 南京大学学报: 自然科学版, 2019, 55(1): 2940［5］黄振洋, 王雨城, 王高升, 等. 基于机器阅读理解的网络安全事件抽取方法［JOL］. 信息安全学报, 2022 ［20220329］. http:jcs.iie.ac.cnxxaqxbchreaderview_abstract.aspx?flag=2&file_no=202203070000001&journal_id=xxaqxb［6］Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified texttotext transformer［J］. Journal of Machine Learning Research, 2020, 21(140): 167［7］Lu Yaojie. Unified structure generation for universal information extraction［J］. arXiv preprint, arXiv:2203.12277, 2022［8］Lu Junyu. Unified BERT for fewshot natural language understanding［J］. arXiv preprint, arXiv:2206.12094, 2022［9］Li Xiaoya. A unified MRC framework for named entity recognition［J］. arXiv preprint, arXiv:1910.11476, 2019［10］He Pengcheng. Deberta: Decodingenhanced bert with disentangled attention［J］. arXiv preprint, arXiv:2006.03654, 2020［11］Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need［JOL］. Advances in Neural Information Processing Systems, 2017 ［20221118］. https:proceedings.neurips.ccpaper2017［12］Kim G, Lee C, Jo J, et al. Automatic extraction of named entities of cyber threats using a deep BiLSTMCRF network［J］. International Journal of Machine Learning and Cybernetics, 2020, 11(10): 23412355［13］NLP & AI Lab. CTIreportsdataset［DBOL］. 2020 ［20220922］. https:github.comnlpailabCTIreportsdataset［14］Bridges R A, Jones C L, Iannacone M D, et al. Automatic labeling for entity extraction in cyber security［J］. arXiv preprint, arXiv:1308.4941, 2013［15］Mikeiannacone. Autolabeledcorpus［DBOL］. 2019 ［20220922］. https:github.comstuccoautolabeledcorpus［16］Staneeya. CASIE［DBOL］. ［20221128］. https:github.comEbiquityCASIE［17］Zhou Kaiyin. BERTNER［DBOL］. ［20221128］. https:github.comKyzhouhzauBERTNER［18］He Pengcheng, Gao Jianfeng, Chen Weizhu. Debertav3: Improving deberta using electrastyle pretraining with gradientdisentangled embedding sharing［J］. arXiv preprint, arXiv:2111.09543, 2021［19］Microsoft. Huggingface, DeBERTaV3［EBOL］. (20211018) ［20220922］. https:huggingface.comicrosoftdebertav3base［20］Diederik P, Jimmy Ba. Adam: A method for stochastic optimization［C］ Proc of the 3rd Int Conf on Learning Representations. San Diego: ARXIV, 2015

[1]	. Security and Privacy Protection in 6G Network: A Survey [J]. Journal of Information Security Reserach, 2023, 9(9): 822-.
[2]	. Design of Network Security Protection System for Meteorological Big Data Cloud Platform [J]. Journal of Information Security Reserach, 2023, 9(7): 701-.
[3]	. Research on Network Security Governance and Response of Largescale AI Model [J]. Journal of Information Security Reserach, 2023, 9(6): 551-.
[4]	. Research on Active Defense Method of Network Security Under APT Organization Attack Behavior [J]. Journal of Information Security Reserach, 2023, 9(5): 423-.
[5]	. Research on Security Risk Response for Internet of Body Applications [J]. Journal of Information Security Reserach, 2023, 9(5): 433-.
[6]	. A Vulnerability Detecting Approach Based on Sanitizer Identification for Embedded Devices [J]. Journal of Information Security Reserach, 2023, 9(10): 954-.
[7]	. A CNN-LSTM Method Based on Attention Mechanism for In vehicle CAN Bus Intrusion Detection [J]. Journal of Information Security Reserach, 2023, 9(10): 961-.
[8]	. Design and Research of Attack and Defense Platform Based on Real Network [J]. Journal of Information Security Reserach, 2022, 8(9): 895-.
[9]	. [J]. Journal of Information Security Reserach, 2022, 8(8): 751-.
[10]	. [J]. Journal of Information Security Reserach, 2022, 8(8): 801-.
[11]	. Research on Memorycorruption Vulnerability Defense Methods Based on Memory Protection Technology [J]. Journal of Information Security Reserach, 2022, 8(7): 694-.
[12]	. Research on a New Generation Network Security Framework for Network Security Assurance of Major Event [J]. Journal of Information Security Reserach, 2022, 8(5): 492-.
[13]	. An approach for detecting malicious domain names generated by dictionary-based DGA [J]. Journal of Information Security Reserach, 2022, 8(2): 129-.
[14]	. Research and Design of Unified Platform for Vulnerability Management [J]. Journal of Information Security Reserach, 2022, 8(2): 190-.
[15]	. Network Security Individual Soldier Detection System Based on “ATT&CK” Tactical Framework [J]. Journal of Information Security Reserach, 2021, 7(E1): 41-.

A Unified Model for Information Extraction in the Field of Network Security for Small Samples

用于小样本的网络安全领域信息抽取统一模型

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics