Multi-agent Autonomous Cooperative Confrontation based on Meta Curriculum Reinforcement Learning

doi:10.3969/j.issn.1009-086x.2022.05.006

Abstract

Abstract:

Multi-agent cooperation and competition has the characteristics of real-time and action continuity， incomplete information， huge search space， multi-complex tasks and time-space inference， etc. It is one of the most challenging problems in the current artificial intelligence field. Aiming at the problem of long training time for large-scale multi-agent reinforcement learning， this paper proposes an Actor-Critic-based cooperative confrontation framework， which uses meta curriculum reinforcement learning method to extract meta-models of basic tasks for small-scale scenarios. We carry out model migration to large-scale scenarios based on the curriculum learning， which continues training based on the meta-models and finally obtains a better collaboration strategy. This paper conducts simulation experiments on the "Star-Craft II" platform. The results show that the multi-agent cooperative confrontation technology based on the meta curriculum reinforcement learning can effectively accelerate the training process， and can achieve a higher win rate within a shorter time compared with the traditional training methods. The training speed is increased by about 40%. This method can effectively support the efficient generation of multi-agent cooperative confrontation strategies.

Key words: multi-agent, reinforcement learning, cooperative confrontation, meta turriculum learning, high efficiency training

摘要：

多智能体协同博弈具有实时及动作连续性、非完全信息博弈、庞大的搜索空间、多复杂任务和时间空间推理等特点，是当前人工智能领域极具挑战的难题之一。针对大规模多智能体强化学习训练时间长、难以收敛等问题，提出了一种基于Actor-Critic的多智能体强化学习协同博弈框架，利用元课程强化学习方法对小规模场景进行基础课程元模型提取，并且基于课程学习向大规模场景进行模型迁移，在元模型基础上继续进行训练，扩展元模型策略网络，最终得到较优协同博弈策略。在《星际争霸Ⅱ》平台上进行仿真实验，结果表明：基于元课程强化学习的多智能体协同博弈技术可有效地加速其训练过程，相较于传统训练方法可以在较短时间内达到较高的胜率，训练速度提升约40%，该方法可有效支撑多智能体协同博弈策略的高效生成，为低资源下的强化学习高效训练奠定理论基础。

关键词: 多智能体, 强化学习, 协同博弈, 元课程学习, 高效训练

CLC Number:

Ji-shi-yu DING, Ke-wu SUN, Bo DONG, Xi-rui YANG, Chang-chao FAN, Zhe MA. Multi-agent Autonomous Cooperative Confrontation based on Meta Curriculum Reinforcement Learning[J]. Modern Defense Technology, 2022, 50(5): 36-42.

丁季时雨, 孙科武, 董博, 杨皙睿, 范长超, 马喆. 基于元课程强化学习的多智能体协同博弈技术[J]. 现代防御技术, 2022, 50(5): 36-42.

Figures/Tables 8

Fig. 1 Multi-agent reinforcement learning autonomous coordination framework

Fig. 2 Meta curriculum construction based on Star Craft Ⅱ

Fig. 3 Meta model extraction in basic tasks

Fig. 4 Meta model transfer based on curriculum learning

Fig. 5 "Star Craft II" map

Fig. 6 Training process based on the meta curriculum reinforcement learning

Fig. 7 Training process based on the meta curriculum reinforcement learning

Fig. 8 Win rate under different training steps

References 16

1	孙清. 基于强化学习的多智能体协同机制研究［D］. 杭州：浙江工业大学， 2015.
	SUN Qing. Research on Multi-Agent Collaboration Mechanism Based on Reinforcement Learning［D］. Hangzhou：Zhejiang University of Technology， 2015.
2	柏晓祉. 强化学习在多智能体协同中的研究与应用［D］.成都：电子科技大学， 2020.
	BAI Xiao-zhi. Research and Application of Reinforcement Learning in Multi-Agent Collaboration［D］. Chengdu： University of Electronic Science and Technology of China， 2020.
3	李天旭. 基于深度强化学习的多智能体协同算法研究［D］. 北京：中国矿业大学， 2020.
	LI Tian-xu. Research on Multi-Agent Cooperative Algorithm Based on Deep Reinforcement Learning［D］.Beijing： China University of Mining and Technology，2020.
4	谭晓阳，文超，姚兴虎. 一种基于λ-回报的异策略多智能体强化学习协作方法： CN111079305A［P］. 2020-04-28.
	TAN Xiao-yang， WEN Chao， YAO Xing-hu. Multi-Agent Reinforcement Learning Cooperation Method Based on λ-Return： CN111079305A［P］.2020-04-28.
5	陈亮，梁宸，张景异，等. Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法［J］. 控制与决策， 2021， 36（1）： 75-82.
	CHEN Liang， LIANG Chen， ZHANG Jing-yi，et al. A Multi-Agent Reinforcement Learning Algorithm Basedon Improved DDPG Under Actor-Critic Framework［J］.Control and Decision， 2021， 36（1）： 75-82.
6	郑健，陈建，朱琨. 基于多智能体强化学习的无人集群协同设计［J］. 指挥信息系统与技术，2020，11（6）： 26-31.
	ZHENG Jian， CHEN Jian， ZHU Kun. Collaborative Design of Unmanned Cluster Based on Multi-Agent Reinforcement Learning［J］. Command Information System and Technology， 2020， 11（6）： 26-31.
7	曹雷. 基于深度强化学习的智能博弈对抗关键技术［J］. 指挥信息系统与技术， 2019， 10（5）： 1-7.
	CAO Lei. Key Technologies of Intelligent Game Confrontation Based on Deep Reinforcement Learning［J］. Command Information System and Technology， 2019， 10（5）： 1-7.
8	HAUSKNECHT M， STONE P. Deep Recurrent Q-Learning for Partially Observable MDP［C］∥2015 AAAI Spring Symposium Series， Palo Alto，CA，2015.
9	MNIH V， BADIA A P， MIRZA M， et al. Asynchronous Methods for Deep Reinforcement Learning［C］∥International Conference on Machine Learning （ICLR）. New York，2016： 1928-1937.
10	LOWE R， WU Y I， TAMAR A， et al. Multi-Agent Actor-Critic for Mixed Cooperative Competitive Environments［C］//Advances in Neural Information Processing Systems.San Francisco， 2017： 6379-6390.
11	PENG P， WEN Y， YANG Y， et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games［J］. arXiv： Learning， 2017.
12	WEI E， WICKE D， FREELAN D， et al. Multiagent Soft Q-Learning［C］∥2018 AAAI Spring Symposium Series， 2018.
13	SUNEHAG P， LEVER G， GRUSLYS A， et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based on Team Reward［C］∥Adaptive Agents and Multi-Agents Systems （AAMAS）. Stockholm，Sweden，2018： 2085-2087.
14	RASHID T， SAMVELYAN M， SCHROEDER C， et al. QMIX： Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning［C］∥ International Conference on Machine Learning （ICML）. Stockholm， Sweden，2018： 4292-4301.
15	FOERSTER J N， FARQUHAR G， AFOURAS T， et al. Counterfactual Multi-Agent policy gradients［C］∥ Thirty-Second AAAI Conference on Artificial Intelligence， New Orleans，2018.
16	IQBAL S， SHA F. Actor-Attention-Critic for Multi-Agent Reinforcement Learning［C］∥International Conference on Machine Learning.San Francisco，2019： 2961-2970.

[1]	Zonglei BAI, Xiuhua LIU, Tianxiang BAI, Kewu SUN. Research on Multi-task Controllable Emergence Mechanism for Air and Space Defense System [J]. Modern Defense Technology, 2023, 51(3): 39-48.
[2]	FANG Xiao, ZENG Bi, SONG Xiang-xiang, JIA Zheng-xuan. Modeling of Air Target Threat to Warship Based on Deep Reinforcement Learning [J]. Modern Defense Technology, 2020, 48(5): 59-66.
[3]	TANG Run-ze, ZHANG Cheng-long, LI Lin-lin. Application of Artificial Intelligence on Situation Assessment and Game Countermeasure in Unmanned Battlefield [J]. Modern Defense Technology, 2020, 48(5): 25-31.