Research on Vulnerability Text Feature Classification Technology  Based on BERT

Abstract

Abstract: With the development of informatization and the increase of network applications, many software and hardware products are affected by various types of cybersecurity vulnerabilities. Vulnerability analysis and management often require people to classify large amounts of vulnerability intelligence texts. In order to efficiently and accurately determine the category of the vulnerability described by the vulnerability intelligence text, this paper proposes a cybersecurity vulnerability classification model based on BERT (bidirectional encoder representation from Transformers). First, the vulnerability classification dataset is constructed, and the pretrained model represents the vulnerability intelligence text as feature vectors. Then the feature vectors complete the classification through the classifier. At last, we use the test set to evaluate the classification effect. In our experiment, we use TextCNN, TextRNN, TextRNN_Att, fastText and the proposed model to classify 48000 vulnerability intelligence texts containing vulnerability descriptions. Experimental results show that the proposed model scored the highest on the classification evaluation indicators on the test set, and it can be effectively applied to cybersecurity vulnerability classification tasks and reduce manual workload.

Key words: natural language processing systems, cybersecurity, feature extraction, classifier, deep learning

摘要： 随着信息化的发展和网络应用的增多，许多软硬件产品受到各种类型的网络安全漏洞影响.漏洞分析和管理工作往往需要对大量漏洞情报文本进行人工分类.为了高效准确地判断漏洞情报文本所描述漏洞的类别，提出了一种基于多层双向Transformer编码器表示(bidirectional encoder representation from Transformers, BERT)的网络安全漏洞分类模型.首先，构建漏洞分类数据集，用预训练模型对漏洞情报文本进行特征向量表示.然后，将所得的特征向量通过分类器完成分类.最后，使用测试集对分类效果进行评估.实验共使用了48000个包含漏洞描述的漏洞情报文本，分别用TextCNN，TextRNN，TextRNN_Att，fastText和所提模型进行分类.实验结果表明，所提模型在测试集上的分类评价指标得分均为最高，能够有效应用于网络安全漏洞分类任务，降低人工工作量.

关键词: 自然语言处理系统, 网络安全, 特征抽取, 分类器, 深度学习

杜林, 许传淇. 基于BERT的漏洞文本特征分类技术研究[J]. 信息安全研究, 2023, 9(7): 687-.

References

［1］全国信息安全标准化技术委员会. GBT 25069—2022 信息安全技术术语［S］. 北京: 中国标准出版社, 2022［2］全国信息安全标准化技术委员会. GBT 30279—2020 信息安全技术网络安全漏洞分类分级指南［S］. 北京: 中国标准出版社, 2020［3］全国信息安全标准化技术委员会. GBT 30276—2020 信息安全技术网络安全漏洞管理规范［S］. 北京: 中国标准出版社, 2020［4］董聪, 姜波, 卢志刚, 等. 面向网络空间安全情报的知识图谱综述［J］. 信息安全学报, 2020, 5(5): 5676［5］全国信息安全标准化技术委员会. GBT 28458—2020 信息安全技术网络安全漏洞标识与描述规范［S］. 北京: 中国标准出版社, 2020［6］Li Qian, Peng Hao, Li Jianxin, et al. A survey on text classification: From shallow to deep learning［EBOL］. (20211222) ［20220520］. https:arxiv.orgabs2008.00364v6［7］Kim Y. Convolutional neural networks for sentence classification［J］. arXiv preprint, arXiv:1408.5882, 2014［8］Liu Pengfei, Qiu Xipeng, Huang Xuanjing. Recurrent neural network for text classification with multitask learning［J］. arXiv preprint, arXiv:1605.05101,2016［9］Zhou Peng, Shi Wei, Tian Jun, et al. Attentionbased bidirectional long shortterm memory networks for relation classification［C］ Proc of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2016: 207212［10］Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification［J］. arXiv preprint, arXiv:1607.01759, 2016［11］杨可心, 桑永胜. 基于BP神经网络的DDoS攻击检测研究［J］. 四川大学学报: 自然科学版, 2017, 54(1): 7175［12］张若彬, 刘嘉勇, 何祥. 基于BLSTMCRF模型的安全漏洞领域命名实体识别［J］. 四川大学学报: 自然科学版, 2019, 56(3): 469475［13］Devlin J, Chang Mingwei, Lee K, et al. BERT: Pretraining of deep bidirectional transformers for language understanding［J］. arXiv preprint, arXiv:1810.04805, 2019［14］郭锡泉, 陈香锡. 强化网络安全和安全情报意识, 共筑网络安全防线——基于OWASP和CNCERT相关项目的分析［J］. 网络空间安全, 2020, 11(2): 6674［15］Li Canchen. Preprocessing methods and pipelines of data mining: An overview［J］. arXiv preprint, arXiv:1906.08510, 2019［16］Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space［J］. arXiv preprint, arXiv:1301.3781, 2013［17］Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need［J］. arXiv preprint, arXiv:1706.03762, 2017［18］Rogers A, Kovaleva O, Rumshisky A. A primer in BERTology: What we know about how BERT works［J］. arXiv preprint, arXiv:2002.12327, 2020［19］王月, 王孟轩, 张胜, 等. 基于BERT的警情文本命名实体识［J］. 计算机应用, 2020, 40(2): 535540［20］严寒冰. 网络安全治理［J］. 信息安全研究, 2022, 8(8): 734735

[1]	. Research on the Disclosure and Sharing Policy of Cybersecurity Vulnerabilities in China and the United States [J]. Journal of Information Security Reserach, 2023, 9(6): 602-.
[2]	. Research on Adversarial Examples Generation Technology Based on Text Keywords [J]. Journal of Information Security Reserach, 2023, 9(4): 338-.
[3]	. Research on Intranet Security Integrated Protection Architecture in Energy Enterprises Under Complex Network Threat Environment [J]. Journal of Information Security Reserach, 2023, 9(4): 390-.
[4]	. [J]. Journal of Information Security Reserach, 2022, 8(8): 793-.
[5]	. A Traceable Deep Learning Classifier Based on Differential Privacy [J]. Journal of Information Security Reserach, 2022, 8(3): 277-.
[6]	. An Overview of Application and Technology of Artificial Intelligence in Cybersecurity [J]. Journal of Information Security Reserach, 2022, 8(2): 110-.
[7]	. Survey of Network Intrusion Detection Based on Deep Learning [J]. Journal of Information Security Reserach, 2022, 8(12): 1163-.
[8]	. Relationship Analysis of Cloud Platform Data Protection and Content Review Obligation [J]. Journal of Information Security Reserach, 2022, 8(11): 1079-.
[9]	. Study on DDoS Attack Detection Based on Biological Immune Principle [J]. Journal of Information Security Reserach, 2022, 8(11): 1129-.
[10]	. The Knowing, Practices and Thoughts on “Cybersecurity Maps” from CubeSec [J]. Journal of Information Security Reserach, 2021, 7(E1): 140-.
[11]	. Whole Process Solution of Classified Protection 2.0 [J]. Journal of Information Security Reserach, 2021, 7(E1): 182-.
[12]	. The Analysis of National Security Risk in Open Source Software Supply Chain [J]. Journal of Information Security Reserach, 2021, 7(9): 790-794.
[13]	. Research on Cybersecurity in Cross-border Data Flow Scenario [J]. Journal of Information Security Reserach, 2021, 7(7): 682-686.
[14]	. Research on policies and standards of cybersecurity workforce [J]. Journal of Information Security Reserach, 2021, 7(6): 520-526.
[15]	. Research on Cybersecurity of 5G Networks EU Toolbox of Risk Mitigating Measures [J]. Journal of Information Security Reserach, 2021, 7(5): 412-417.

Research on Vulnerability Text Feature Classification Technology Based on BERT

基于BERT的漏洞文本特征分类技术研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics