针对深度强化学习导航的物理对抗攻击方法

信息安全研究 ›› 2022, Vol. 8 ›› Issue (3): 212-.

• 深度学习安全与对抗专题 • 上一篇下一篇

针对深度强化学习导航的物理对抗攻击方法

桓琦;谢小权;郭敏;曾颖明;

（中国航天科工集团第二研究院706所北京100854）

出版日期:2022-03-01 发布日期:2022-03-01
通讯作者: 桓琦硕士研究生，主要研究方向为强化学习、人工智能安全 1209924748@qq.com
作者简介:桓琦硕士研究生，主要研究方向为强化学习、人工智能安全 1209924748@qq.com 谢小权硕士、研究员，主要研究方向为信息安全 xiexiaoquan@163.com 郭敏硕士、工程师，研究方向为人工智能安全 guominjmh@163.com 曾颖明硕士、研究员，研究方向为网络安全 zengyingming@163.com

Physical Adversarial Attacks Against Deep Reinforcement Learning Based Navigation

Online:2022-03-01 Published:2022-03-01

摘要/Abstract

摘要： 本文针对基于深度强化学习(deep reinforcement learning, DRL)的激光导航系统的安全性进行研究，首次提出了对抗地图的概念，并在此基础上提出了一种物理对抗攻击方法.该方法使用对抗样本生成算法计算激光测距传感器上的对抗扰动，然后修改原始地图实现这些扰动，得到对抗地图.对抗地图可以在某个特定区域诱导智能体偏离最优路径，最终使机器人导航失败.在物理仿真实验中，本文对比了智能体在多个原始地图和对抗地图的导航结果，证明了对抗地图攻击方法的有效性，也指出了目前DRL技术应用在导航系统上存在的安全隐患.

关键词: 深度强化学习, 自主导航, 对抗攻击, 对抗样本, 深度学习

Abstract: In this paper, the security of deep reinforcement learning (DRL) based laser navigation system is studied, and the concept of adversarial map and a physical attack method based on it is proposed for the first time. The method uses the adversarial example generation algorithm to calculate the noise on the laser sensor and then modifies the original map to realize these noises and get the adversarial map. The adversarial map can induce the agent to deviate from the optimal path in a particular area and finally makes the robot navigation fail. In the physical simulation experiment, this paper compares the navigation results of an agent in multiple original maps and adversarial maps, proves the effectiveness of the countermeasure map attack method, and points out the hidden security dangers of the current application of DRL technology in the navigation system.

Key words: deep reinforcement learning, autonomous navigation, artificial intelligence, adversarial attack, deep learning

桓琦, 谢小权, 郭敏, 曾颖明, . 针对深度强化学习导航的物理对抗攻击方法[J]. 信息安全研究, 2022, 8(3): 212-.

参考文献

[1] 刘全, 翟建伟, 章宗长,等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1): 1-27.

[2] Zhu Kai, Zhang Tao. Deep reinforcement learning based mobile robot navigation: a review [J]. Tsinghua Science and Technology, 2021, 26(5): 674-691

[3] 徐金才, 任民, 李琦,等. 图像对抗样本的安全性研究概述[J]. 信息安全研究, 2021, 7(4): 294-309.

[4] 陈岳峰, 毛潇锋, 李裕宏,等. AI安全--对抗样本技术综述与应用[J]. 信息安全研究, 2019, 5(11): 1000-1007.

[5] 张强, 杨吉斌, 张雄伟,等. 基于生成对抗网络的音频目标分类对抗[J]. 南京大学学报:自然科学版，2021,57(5): 793-800.

[6] Huang S , Papernot N , Goodfellow I , et al. Adversarial attacks on neural network policies [C] //Proc of the 5th Int Conf on Learning Representations. La Jolla, CA: ICLR, 2017.

[7] Schulman J , Wolski F , Dhariwal P, et al. Proximal policy optimization algorithms [J]. arXiv preprint arXiv: 1707.06347, 2017.

[8] Papernot N , Mcdaniel P , Jha S, et al. The limitations of deep learning in adversarial settings [C] //Proc of the 2016 IEEE European Symposium on Security and Privacy. Piscataway,NJ: IEEE, 2015.

[9] Rone W, Ben-Tzvi P. Mapping, localization and motion planning in mobile multi-robotic systems [J]. Robotica, 2013, 31(1):1-23.

[10]赵星宇, 丁世飞. 深度强化学习研究综述[J]. 计算机科学, 2018, 45(7):1-6.

[11] Tai Lei , Paolo G , Liu Ming. Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation [C] //IEEE International Conference on Intelligent Robots and Systems. Piscataway. NJ: IEEE, 2017: 31-36.

[12]Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks [C] //2nd International Conference on Learning Representations. La Jolla, CA:ICLR, 2014.

[13]Behzadan V , Munir A. Vulnerability of deep reinforcement learning to policy induction attacks [G] // LNAI 10358: Proc of the 13th Int Conf on Machine Learning and Data Mining in Pattern Recognition. New York: ACM, 2017: 262-275.

[14]Lin Yenchen, Hong Zhangwei, Liao Yuanhong, et al. Tactics of adversarial attack on deep reinforcement learning agents [C] //Proc of the IJCAI Int Joint Conf on Artificial Intelligence. San Francisco, CA: Morgan Kaufmann, 2017: 3756-3762.

[15]Kos J, Song D. Delving into adversarial attacks on deep policies [C] //Proc of the 5th Int Conf on Learning Representations, La Jolla, CA: ICLR 2017.

[16]Hussenot L, Geist M, Pietquin O. Copycat: Taking control of neural policies with constant attacks [C]// Proc of the Int Joint Conf on Autonomous Agents and Multiagent Systems. New York: Springer, 2020: 548-556.

[17]Chen Tong, Niu Wenjia, Xiang Yingxiao, et al. Gradient band-based adversarial training for generalized attack immunity of a3c path finding [J]. arXiv preprint arXiv:1807.06752, 2018.

[18]Baixiaoxuan, NIU Wenjia, LIU Jiqiang, et al. Adversarial examples construction towards white-box q table variation in dqn pathfinding training [C] //Proc of the 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). Piscataway, NJ: IEEE, 2018: 781-787.

[19]钱亚冠, 张锡敏, 王滨,等. 基于二阶对抗样本的对抗训练防御[J]. 电子与信息学报, 2021,43(11):3367-3373.

针对深度强化学习导航的物理对抗攻击方法

Physical Adversarial Attacks Against Deep Reinforcement Learning Based Navigation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	金志刚周峻毅何晓勇. 面向自然语言处理领域的对抗攻击研究与展望[J]. 信息安全研究, 2022, 8(3): 202-.
[2]	梁晨, 王利斌, 李卓群, 薛源, . 生成式对抗网络技术与研究进展[J]. 信息安全研究, 2022, 8(3): 235-.
[3]	张煜之, 王锐芳, 朱亮, 赵坤园, 刘梦琪, . 深度伪造生成和检测技术综述[J]. 信息安全研究, 2022, 8(3): 258-.
[4]	胡韵, 刘嘉驹, 李春国, . 一种基于差分隐私的可追踪深度学习分类器[J]. 信息安全研究, 2022, 8(3): 277-.
[5]	石波, 于然, 陈志浩, 朱健, . 工业控制系统安全态势评估与预测方案[J]. 信息安全研究, 2022, 8(2): 145-.
[6]	徐金才任民李琦孙哲南. 图像对抗样本的安全性研究概述[J]. 信息安全研究, 2021, 7(4): 294-309.
[7]	冯科阮树骅陈兴蜀王海舟王文贤蒋术语. 基于联合模型的网络舆情事件检测方法 [J]. 信息安全研究, 2021, 7(3): 207-214.
[8]	肖喜生彭凯飞龙春魏金侠赵静. 基于人工智能的安全态势预测技术研究综述[J]. 信息安全研究, 2020, 6(6): 0-0.
[9]	雷惊鹏. 基于云计算和深度学习的协议监测系统设计[J]. 信息安全研究, 2020, 6(12): 1127-1132.
[10]	周琳娜吕欣一. 基于GAN图像生成的信息隐藏技术综述[J]. 信息安全研究, 2019, 5(9): 771-777.
[11]	李创丰李云龙孙伟. 基于CNN和朴素贝叶斯方法的安卓恶意应用检测算法[J]. 信息安全研究, 2019, 5(6): 470-476.
[12]	李敏杨阳王钤孟博李凌寒白入文杜虹. 基于智能视频分析的人流量态势感知方法研究[J]. 信息安全研究, 2019, 5(6): 488-494.
[13]	周琳娜曹洋. 从传统到深度学习的图像隐写技术研究[J]. 信息安全研究, 2019, 5(3): 230-235.
[14]	陈华钧耿玉霞叶志权邓淑敏. “知识图谱+深度学习”赋能内容安全[J]. 信息安全研究, 2019, 5(11): 975-980.
[15]	罗观柱赵妍妍秦兵刘挺. 智能人机对话中的负面情感检测[J]. 信息安全研究, 2019, 5(11): 981-987.