Journal of Information Security Research ›› 2021, Vol. 7 ›› Issue (3): 242-249.

Previous Articles     Next Articles

Research on Name Entity Recognition of Security Events Based on BERT

  

  • Received:2021-03-09 Online:2021-03-05 Published:2021-03-17

基于BERT的安全事件命名实体识别研究

窦宇宸1,胡勇2   

  1. 1. 四川大学网络空间安全学院
    2. 四川大学
  • 通讯作者: 窦宇宸
  • 作者简介:窦宇宸 硕士研究生,主要研究方向为移动安全、物联网安全. douyuchen_jl@163.com 胡勇 博士,研究员,主要研究方向为信息系统安全,移动互联网安全、人工智能在信息安全的应用等. huyong@scu.edu.cn

Abstract: To achieve the task of named entity recognition in public safety event, we present a model which combines BERT pre-training model and neural network. The Chinese emergency corpus (CEC) was used as the experimental data set, and the entities of data set were marked using the BIO sequence labeling method. Use the BERT (bidirectional encoder representations from transformers) pre-training model to obtain the word vectors (word embedding) of a single Chinese character, use the fusion model of BiLSTM (bidirectional long short-term memory network) and CRF (conditional random field) to extract features to identify public safety events that conclude time, place, participant and behavior of participant. CRF, BiLSTM, BiLSTM-CRF,BERT-BiLSTM-CRF were used for comparative experiments. The experimental results show that the method used in this paper has an accuracy rate of more than 90%, a recall rate and F1 score of more than 85%, which proves that the model solves the problem of polysemy and can effectively obtain important entity information in public safety incidents.

Key words: public security, named entity recognition, bert, conditional random field, bidirectional long short-term memory network, word embedding

摘要: 使用BERT预训练模型及神经网络提取公共安全事件命名实体.以中文突发事件语料库(CEC)为实验数据集,使用BIO序列标注方法标记该数据集的实体.采用BERT(bidirectional encoder representations from transformers)预训练模型获取单个汉字的词向量,并使用BiLSTM(双向长短期记忆网络)及CRF(条件随机场)的融合模型提取特征用以识别公共安全事件的时间、地点、参与者及参与者的行为.采用CRF,BiLSTM,BiLSTM-CRF,BERT-BiLSTM-CRF进行对比实验.实验结果表明,本文使用的方法准确率达到90%以上,召回率及F1值均达到85%以上,证明该模型解决了一词多义的问题,可以有效获取公共安全事件中的重要实体信息.

关键词: 公共安全, 命名实体识别, BERT, 条件随机场, 双向长短时记忆网络, 词向量