Research on Name Entity Recognition of Security Events Based on BERT

Journal of Information Security Research ›› 2021, Vol. 7 ›› Issue (3): 242-249.

Previous Articles Next Articles

Research on Name Entity Recognition of Security Events Based on BERT

Received:2021-03-09 Online:2021-03-05 Published:2021-03-17

基于BERT的安全事件命名实体识别研究

窦宇宸¹,胡勇²

1. 四川大学网络空间安全学院
2. 四川大学

通讯作者: 窦宇宸
作者简介:窦宇宸硕士研究生，主要研究方向为移动安全、物联网安全. douyuchen_jl@163.com 胡勇博士，研究员，主要研究方向为信息系统安全，移动互联网安全、人工智能在信息安全的应用等. huyong@scu.edu.cn

Abstract

Abstract: To achieve the task of named entity recognition in public safety event, we present a model which combines BERT pre-training model and neural network. The Chinese emergency corpus (CEC) was used as the experimental data set, and the entities of data set were marked using the BIO sequence labeling method. Use the BERT (bidirectional encoder representations from transformers) pre-training model to obtain the word vectors (word embedding) of a single Chinese character, use the fusion model of BiLSTM (bidirectional long short-term memory network) and CRF (conditional random field) to extract features to identify public safety events that conclude time, place, participant and behavior of participant. CRF, BiLSTM, BiLSTM-CRF,BERT-BiLSTM-CRF were used for comparative experiments. The experimental results show that the method used in this paper has an accuracy rate of more than 90%, a recall rate and F1 score of more than 85%, which proves that the model solves the problem of polysemy and can effectively obtain important entity information in public safety incidents.

Key words: public security, named entity recognition, bert, conditional random field, bidirectional long short-term memory network, word embedding

摘要： 使用BERT预训练模型及神经网络提取公共安全事件命名实体．以中文突发事件语料库（CEC）为实验数据集，使用BIO序列标注方法标记该数据集的实体．采用BERT（bidirectional encoder representations from transformers）预训练模型获取单个汉字的词向量，并使用BiLSTM（双向长短期记忆网络）及CRF（条件随机场）的融合模型提取特征用以识别公共安全事件的时间、地点、参与者及参与者的行为．采用CRF，BiLSTM，BiLSTM-CRF，BERT-BiLSTM-CRF进行对比实验．实验结果表明，本文使用的方法准确率达到90%以上，召回率及F1值均达到85%以上，证明该模型解决了一词多义的问题，可以有效获取公共安全事件中的重要实体信息．

关键词: 公共安全, 命名实体识别, BERT, 条件随机场, 双向长短时记忆网络, 词向量

窦宇宸胡勇. 基于BERT的安全事件命名实体识别研究[J]. 信息安全研究, 2021, 7(3): 242-249.

References

[1]. 向晓雯,史晓东,曾华琳.一个统计与规则相结合的中文命名实体识别系统[J].计算机应用, 2005(10):2404-2406 [2]. 龚凌晖. 中文命名实体识别与歧义消解研究[D].上海：复旦大学,2011 [3]. 朱丹浩,杨蕾,王东波.基于深度学习的中文机构名识别研究——一种汉字级别的循环神经网络方法[J].现代图书情报技术,2016(12):36-43 [4]. 李丽双,郭元凯.基于CNN-BLSTM-CRF模型的生物医学命名实体识别[J].中文信息学报,2018,32(1):116-122. [5]. 黄炜,黄建桥,李岳峰.基于BiLSTM-CRF的涉恐信息实体识别模型研究[J].情报杂志,2019,38(12):149-156 [6]. 范晓霞,周安民,郑荣锋等. 基于深度学习的暗网市场命名实体识别研究[J]. 信息安全研究,2021,7(1): 37-43 [7]. Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(Feb): 1137-1155 [8]. Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint, arXiv:1301.3781, 2013 [9]. Pennington J, Socher R, Manning C. Glove: Global vectors for word representation[C]//Proc of the 2014 Conf on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1532-1543 [10]. Sikdar U K, Barik B, Gambäck B. Flytxt_NTNU at SemEval-2018 task 8: Identifying and classifying malware text using conditional random fields and Naïve Bayes classifiers[C]//Proc of the 12th Int Workshop on Semantic Evaluation. 2018: 890-893 [11]. 杨飘,董文永.基于BERT嵌入的中文命名实体识别方法[J].计算机工程,2020,46(4):40-45,52 [12]. 李航.统计学习方法. 北京：清华大学出版社, 2012 [13]. 刘宗田.中文突发事件语料库[DB].[2020-09-15].https://github.com/shijiebei2009/CEC-Corpus [14]. Google.Chinese BERT Model[DB].[2020-09-15].https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip [15]. Wang P, Qian Y, Soong F K, et al. A unified tagging solution: Bidirectional lstm recurrent neural network with word embedding[J]. arXiv preprint， arXiv:1511.00215, 2015

Research on Name Entity Recognition of Security Events Based on BERT

基于BERT的安全事件命名实体识别研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 7

Recommended Articles

Metrics

[1]	. SWOT-AHP Analysis and Countermeasure Research on Information Security of Public Security Organs [J]. Journal of Information Security Research, 2021, 7(2): 190-196.
[2]	. Darknet Market Named Entity Recognition Based on Deep Learning [J]. Journal of Information Security Research, 2021, 7(1): 37-43.
[3]	. Research and Implementation of Security Model of Telecommuting System Based on Zero Trust [J]. Journal of Information Security Research, 2020, 6(4): 289-295.
[4]	. Text Sentiment Analysis Based on BERT [J]. Journal of Information Security Research, 2020, 6(3): 220-227.
[5]	. Malware Detection Based on Application Programming Interface Sequence and Convolutional Neural Network [J]. Journal of Information Security Research, 2020, 6(3): 212-219.
[6]	. Discussion of Security Protection System of Public Security Video Network in Xueliang Project [J]. Journal of Information Security Research, 2020, 6(2): 171-180.
[7]	. The Definition, Understandings and Responses of Cyberterrorism [J]. Journal of Information Security Research, 2016, 2(10): 882-887.