信息安全研究 ›› 2023, Vol. 9 ›› Issue (4): 338-.

• 学术论文 • 上一篇    下一篇

基于文本关键词的对抗样本生成技术研究

  

  1. 1(北京电子科技学院北京100070)
    2(国家信息中心北京100045)
  • 出版日期:2023-04-01 发布日期:2023-03-30
  • 通讯作者: 王志强 博士,副教授.主要研究方向为网络空间安全和漏洞挖掘. wangzq@besti.edu.cn
  • 作者简介:王志强 博士,副教授.主要研究方向为网络空间安全和漏洞挖掘. wangzq@besti.edu.cn 都迎迎 硕士研究生.主要研究方向为网络空间安全和信息安全. 2353504882@qq.com 林雨衡 硕士研究生.主要研究方向为网络空间安全和信息安全. 297440224@qq.com 陈旭东 硕士研究生.主要研究方向为网络空间安全和信息安全. ra949280237@163.com

Research on Adversarial Examples Generation Technology Based on  Text Keywords

王志强1,2都迎迎1林雨衡1陈旭东1   

  • Online:2023-04-01 Published:2023-03-30

摘要: 深度学习模型已被广泛应用于处理自然语言任务,但最新研究表明对抗攻击会严重降低分类模型的准确率,使模型分类功能失效.针对深度学习模型处理自然语言任务时出现的脆弱性问题,提出一种新的对抗样本生成方法KeywordsAttack.该方法利用统计算法选择部分字词组成文本关键词集合,再根据关键词对模型分类结果贡献度大小进行迭代替换,直到成功误导分类模型或替换次数达到设定阈值.该方法针对中文的特点采用汉字拆分、拼音替换的方式生成对抗样本.最后,采用公开酒店购物评论数据集进行实验.实验结果表明,利用KeywordsAttack方法生成的对抗样本平均修改幅度占原始文本的18.2%,攻击BERT模型分类准确率约降低43%,攻击LSTM模型分类准确率约降低30%.该数据表明KeywordsAttack方法可以通过对文本进行较小的扰动成功误导分类模型,同时生成对抗样本过程中访问模型次数较少.

关键词: 对抗样本, 中文文本, 神经网络, 黑盒攻击, 深度学习

Abstract: Deep learning models have been widely used to deal with natural language tasks, but the latest research shows that adversarial attacks will seriously reduce the accuracy of the classification model and make the model classification function ineffective. Aiming at the vulnerability of deep learning models when dealing with natural language tasks, a new adversarial examples generation method, KeywordsAttack, is proposed. The method uses a statistical algorithm to select some words to form a text keyword set. And then it iteratively replaces the keywords according to the contribution of the model classification results until the classification model is successfully misled or the number of replacements reaches the set value. According to the characteristics of Chinese, this method generates adversarial examples by splitting Chinese characters and replacing pinyin. Finally, using the public hotel shopping review dataset to conduct experiments, the results show that the average modification magnitude of adversarial examples accounts for 18.2% of the original text and the classification accuracy of attacking the BERT model is reduced by about 43%, and the classification accuracy of attacking the LSTM model is reduced by about 30%. These data show that the KeywordsAttack method can successfully mislead the classification model by making small perturbations to the text. At the same time, the number of query models in the process of generating adversarial examples is small.

Key words: adversarial examples, Chinese text, neural network, blackbox attack, deep learning