一种高效率的多智能体协作学习通信机制

信息安全研究 ›› 2020, Vol. 6 ›› Issue (4): 345-349.

一种高效率的多智能体协作学习通信机制

赵宇航¹,马修军²

1. 机器感知与智能教育部重点实验室（北京大学）
2. 北京大学信息科学技术学院

收稿日期:2020-04-06 出版日期:2020-04-03 发布日期:2020-04-06
通讯作者: 赵宇航
作者简介:赵宇航硕士研究生，主要研究方向为多智能系统、强化学习。 zhaoyuhang@pku.edu.cn 马修军副教授，主要研究方向为时空数据挖掘、智能Agent与智能系统。 maxiujun@pku.edu.cn

An Efficient Communication Framework in Multi-Agent Cooperating Learning Environment Zhao Yuhang and Ma Xiujun

Received:2020-04-06 Online:2020-04-03 Published:2020-04-06

摘要/Abstract

摘要： 目前人工智能的发展日新月异，从计算机视觉到自然语言处理，再到强化学习的研究，都有了不小的突破。但是绝大部分人工智能针对的目标都是单智能体的，这些研究者的目标是让单智能体的智能能够不断的提升。然而多智能体的突破更能解决复杂的问题，例如动物种群的繁衍、人类的团队协作等等。即使单个智能体的智能不是特别高，但智能体之间的交流、协作能够很有效率的话，从整体来看，这个智能体群落的智能会比较高。目前，多智能体协作学习领域通常使用强化学习框架，但大多研究没有显式地应用通信机制，以提高整体模型的效果。提出了一种基于通信过滤的Actor-Critic算法框架，它能使多智能体环境中的智能体之间能够高效地交流，即使在没有Critic指导的执行阶段，高效率的通信也能够很好地帮助智能体协作。算法框架中采用了一个神经网络来过滤智能体之间的信息，来完成一个使低质量的冗余的信息到高质量的低维的信息的过程。本文设计了3个实验来验证模型的效果，分别是2个协作学习场景和一个自动驾驶中的车道变换任务。实验结果表明，在引入沟通的多智能体协作学习中，算法模型的效果比其他类似的模型效果好。

关键词: 多智能系统, 强化学习, 协作学习, 人工智能, 自动驾驶

Abstract: Reinforcement learning in cooperate multi-agent scenarios is important for real-world applications. While several attempts before tried to resolve it without explicit communication, we present a communication-filtering actor-critic algorithm that trains decentralized policies which could exchange filtered information in multi-agent settings, using centrally computed critics. Communication could potentially be an effective way for multi-agent cooperation. We supposed that, when in execution phase without central critics, high-quality communication between agents could help agents have better performance in cooperative situations. However, information sharing among all agents or in predefined communication architectures that existing methods adopt can be problematic. Therefore, we use a neural network to filter information between agents. Empirically, we show the strength of our model in two general cooperative settings and vehicle lane changing scenarios. Our approach outperforms several state-of-the-art models solving multi-agent problems.

Key words: multi-agent system, reinforcement learning, cooperating learning, artificial intelligence, autonomous driving

赵宇航马修军. 一种高效率的多智能体协作学习通信机制[J]. 信息安全研究, 2020, 6(4): 345-349.

参考文献

[1] Bloembergen D, Tuyls K, Hennes D, et al. Evolutionary dynamics of multi-agent learning: A survey[J]. Journal of Artificial Intelligence Research, 2015, 53: 659-697 [2] Panait L, Luke S. Cooperative multi-agent learning: The state of the art[J]. Autonomous agents and multi-agent systems, 2005, 11(3): 387-434 [3] Sukhbaatar S, Fergus R. Learning multiagent communication with backpropagation[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2016: 2244-2252 [4] Banerjee B, Lyle J, Kraemer L, et al. Sample bounded distributed reinforcement learning for decentralized POMDPs[C]//Proc of the 26th AAAI Confon Artificial Intelligence. Menlo Park, CA: AAAI, 2012 [5] Omidshafiei S, Agha-Mohammadi A A, Amato C, et al. Graph-based cross entropy method for solving multi-robot decentralized POMDPs[C]//Proc of IEEE Int Conf on Robotics and Automation (ICRA). Piscataway, NJ: IEEE, 2016: 5395-5402 [6] Cheney D L, Seyfarth R M. Constraints and preadaptations in the earliest stages of language evolution[J]. The Linguistic Review, 2005, 22(2/3/4): 135-159 [7] Cao Y, Yu W, Ren W, et al. An overview of recent progress in the study of distributed multi-agent coordination[J]. IEEE Trans on Industrial informatics, 2012, 9(1): 427-438 [8] Matignon L, Jeanpierre L, Mouaddib A I. Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes[C]//Proc of the 26th AAAI Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2012 [9] Buşoniu L, Babuška R, De Schutter B. Multi-agent reinforcement learning: An overview[M]//Innovations in Multi-Agent Systems and Applications-1. Berlin: Springer, 2010: 183-221 [10] Foerster J N, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[C]//Proc of the 32nd AAAI Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2018 [11] Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2017: 6379-6390 [12] Foerster J, Assael I A, De Freitas N, et al. Learning to communicate with deep multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2016: 2137-2145 [13] Peng P, Wen Y, Yang Y, et al. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games[J]. arXiv preprint, arXiv:1703.10069, 2017 [14] Iqbal S, Sha F. Actor-attention-critic for multi-agent reinforcement learning[J]. arXiv preprint, arXiv:1810.02912, 2018 [15] Jiang J, Lu Z. Learning attentional communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2018: 7254-7264