基于文本关键词的对抗样本生成技术研究

摘要/Abstract

摘要： 深度学习模型已被广泛应用于处理自然语言任务，但最新研究表明对抗攻击会严重降低分类模型的准确率，使模型分类功能失效.针对深度学习模型处理自然语言任务时出现的脆弱性问题，提出一种新的对抗样本生成方法KeywordsAttack.该方法利用统计算法选择部分字词组成文本关键词集合，再根据关键词对模型分类结果贡献度大小进行迭代替换，直到成功误导分类模型或替换次数达到设定阈值.该方法针对中文的特点采用汉字拆分、拼音替换的方式生成对抗样本.最后，采用公开酒店购物评论数据集进行实验.实验结果表明，利用KeywordsAttack方法生成的对抗样本平均修改幅度占原始文本的18.2%，攻击BERT模型分类准确率约降低43%，攻击LSTM模型分类准确率约降低30%.该数据表明KeywordsAttack方法可以通过对文本进行较小的扰动成功误导分类模型，同时生成对抗样本过程中访问模型次数较少.

关键词: 对抗样本, 中文文本, 神经网络, 黑盒攻击, 深度学习

Abstract: Deep learning models have been widely used to deal with natural language tasks, but the latest research shows that adversarial attacks will seriously reduce the accuracy of the classification model and make the model classification function ineffective. Aiming at the vulnerability of deep learning models when dealing with natural language tasks, a new adversarial examples generation method, KeywordsAttack, is proposed. The method uses a statistical algorithm to select some words to form a text keyword set. And then it iteratively replaces the keywords according to the contribution of the model classification results until the classification model is successfully misled or the number of replacements reaches the set value. According to the characteristics of Chinese, this method generates adversarial examples by splitting Chinese characters and replacing pinyin. Finally, using the public hotel shopping review dataset to conduct experiments, the results show that the average modification magnitude of adversarial examples accounts for 18.2% of the original text and the classification accuracy of attacking the BERT model is reduced by about 43%, and the classification accuracy of attacking the LSTM model is reduced by about 30%. These data show that the KeywordsAttack method can successfully mislead the classification model by making small perturbations to the text. At the same time, the number of query models in the process of generating adversarial examples is small.

Key words: adversarial examples, Chinese text, neural network, blackbox attack, deep learning

王志强, 都迎迎, 林雨衡, 陈旭东, . 基于文本关键词的对抗样本生成技术研究[J]. 信息安全研究, 2023, 9(4): 338-.

参考文献

参考文献
［1］Li D, Wei F, Ming Z, et al. Questionanswering over freebase with multicolumn convolutional neural networks［C］ Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int Joint Conf on Natural Language Processing (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2015: 260269［2］AbdelHamid O, Mohamed A, Jiang H, et al. Convolutional neural networks for speech recognition［J］. IEEEACM Trans on Audio, Speech, and Language Processing, 2014, 22(10): 15331545［3］Tu Z, Hu B, Lu Z, et al. Contextdependent translation selection using convolutional neural network［C］ Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int Joint Conf on Natural Language Processing (Volume 2: Short Papers). Stroudsburg, PA: ACL, 2015: 536541［4］Kim Y. Convolutional neural networks for sentence classification［J］. arXiv preprint, arXiv:1408.5882, 2014［5］金志刚, 周峻毅, 何晓勇. 面向自然语言处理领域的对抗攻击研究与展望［J］. 信息安全研究, 2022, 8(3): 202211［6］Marra F, Gragnaniello D, Verdoliva L. On the vulnerability of deep learning to adversarial attacks for camera model identification［J］. Signal Processing: Image Communication, 2018, 65: 240248［7］Yu Y, Lee H J, Kim B C, et al. Investigating vulnerability to adversarial examples on multimodal data fusion in deep learning［J］. arXiv preprint, arXiv:2005.10987, 2020［8］Ganin Y, Ustinova E, Ajakan H, et al. Domainadversarial training of neural networks［J］. The Journal of Machine Learning Research, 2016, 17(1): 20962030［9］Mahmood F, Chen R, Durr N J. Unsupervised reverse domain adaptation for synthetic medical images via adversarial training［J］. IEEE Trans on Medical Imaging, 2018, 37(12): 25722581［10］Jin D, Jin Z, Zhou J T, et al. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment［C］ Proc of the AAAI Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2020: 80188025［11］Yang Y, Huang P, Cao J, et al. A promptingbased approach for adversarial example generation and robustness enhancement［J］. arXiv preprint, arXiv:2203.10714, 2022［12］Devlin J, Chang M W, Lee K, et al. BERT: Pretraining of deep bidirectional transformers for language understanding［J］. arXiv preprint, arXiv:1810.04805, 2018［13］Hochreiter S, Schmidhuber J. Long shortterm memory［J］. Neural Computation, 1997, 9(8): 17351780［14］王文琦, 汪润, 王丽娜, 等. 面向中文文本倾向性分类的对抗样本生成方法［J］. 软件学报, 2019, 30(8): 24152427［15］仝鑫, 王罗娜, 王润正, 等. 面向中文文本分类的词级对抗样本生成方法［J］. 信息网络安全, 2020, 20(9): 1216［16］Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks［J］. arXiv preprint, arXiv:1312.6199, 2013［17］MoosaviDezfooli S M, Fawzi A, Frossard P. Deepfool: A simple and accurate method to fool deep neural networks［C］ Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 25742582［18］Carlini N, Wagner D. Towards evaluating the robustness of neural networks［C］ Proc of the 2017 IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2017: 3957［19］Goodfellow J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples［J］. arXiv preprint, arXiv:1412.6572, 2014［20］Liang B, Li H, Su M, et al. Deep text classification can be fooled［C］ Proc of the 27th Int Joint Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2018: 42084215［21］Gao J, Lanchantin J, Soffa M L, et al. Blackbox generation of adversarial text sequences to evade deep learning classifiers［C］ Proc of the 2018 IEEE Security and Privacy Workshops (SPW). Piscataway, NJ: IEEE, 2018: 5056［22］Li J, Ji S, Du T, et al. Textbugger: Generating adversarial text against realworld applications［J］. arXiv preprint, arXiv:1812.05271, 2018.［23］Papernot N, McDaniel P, Swami A, et al. Crafting adversarial input sequences for recurrent neural networks［C］ Proc of the 2016 IEEE Military Communications Conf. Piscataway, NJ: IEEE, 2016: 4954［24］Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need［C］ Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 60006010［25］Garg S, Ramakrishnan G. BAE: BERTbasedadversarial examples for text classification［C］ Proc of the 2020 Conf on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: ACL, 2020: 61746181［26］Kusner M, Sun Y, Kolkin N, et al. From word embeddings to document distances［C］ Proc of the Int Conf on Machine Learning. New York: PMLR, 2015: 957966

[1]	陈嘉翊, 孙晨雨, 周欣桐, 胡志广, . 基于联邦学习和同态加密的电力数据预测模型本地保护[J]. 信息安全研究, 2023, 9(3): 228-.
[2]	喻晓伟, 陈丹伟, . 基于注意力机制的图神经网络加密流量分类研究[J]. 信息安全研究, 2023, 9(1): 13-.
[3]	沈铁志, 楚兵, 吴炳辉, 郎大鹏, . 基于图神经网络的工业互联网攻击检测算法[J]. 信息安全研究, 2022, 8(E2): 91-.
[4]	陆明远, 侯春燕, 王劲松. 基于 Softplus 函数的神经网络的 Reluplex 算法验证研究[J]. 信息安全研究, 2022, 8(9): 917-.
[5]	王中华, 徐杰, 韩健, 臧天宁. 基于卷积神经网络的恶意区块链域名检测方法[J]. 信息安全研究, 2022, 8(8): 760-.
[6]	颜祺, 牛彦杰, 陈国友. 基于深度学习的信息高保密率传输方法[J]. 信息安全研究, 2022, 8(8): 793-.
[7]	周梓馨, 张功萱, 寇小勇, 杨威. 一种基于自注意力机制的深度学习侧信道攻击方法[J]. 信息安全研究, 2022, 8(8): 812-.
[8]	刘小乐, 方勇, 黄诚, 许益家. 基于深度图卷积神经网络的Exploit Kit攻击活动检测方法[J]. 信息安全研究, 2022, 8(7): 685-.
[9]	王煦莹, 沈红波, 徐兴周, . 基于人工神经网络的金融信息系统风险评价[J]. 信息安全研究, 2022, 8(11): 1055-.
[10]	陈传涛潘丽敏罗森林 . 基于抽象语法树压缩编码的漏洞检测方法[J]. 信息安全研究, 2022, 8(1): 35-.
[11]	罗运鑫佘堃于钥李洋. 基于忆阻神经网络的车辆标志识别技术研究[J]. 信息安全研究, 2021, 7(8): 715-727.
[12]	李晓明王文晖任琳琳晏涌陈兆玉沙芸刘学君. 基于强化学习的特征提取方法在攻击识别中的应用[J]. 信息安全研究, 2021, 7(4): 351-358.
[13]	范晓霞周安民郑荣锋李孟铭. 基于深度学习的暗网市场命名实体识别研究[J]. 信息安全研究, 2021, 7(1): 37-43.
[14]	张行. 人工智能在手写签字鉴定应用中的研究[J]. 信息安全研究, 2020, 6(7): 622-633.
[15]	张泽樊江伟周南. 基于MEA-LVQ的网络态势预测模型 [J]. 信息安全研究, 2020, 6(6): 0-0.