基于强化学习的多无人机航迹规划

doi:10.3969/j.issn.1009-086x.2025.05.014

现代防御技术 ›› 2025, Vol. 53 ›› Issue (5): 136-144.DOI: 10.3969/j.issn.1009-086x.2025.05.014

• 目标特性与探测跟踪技术 • 上一篇

基于强化学习的多无人机航迹规划

韦振汉, 唐辉, 杨煜, 廖植泓, 赖启辉, 卢忱

深圳信息职业技术大学，广东深圳 518172

收稿日期:2024-06-20 修回日期:2024-10-25 出版日期:2025-10-28 发布日期:2025-11-03
通讯作者: 唐辉
作者简介:韦振汉（1989-）男，广西河池人。实验师，硕士。研究方向为机器学习、强化学习。
基金资助:
深圳市科技创新局资助项目(KJZD20240903103300002)

Multi-UAV Path Planning Based on Reinforcement Learning

Zhenhan WEI, Hui TANG, Yu YANG, Zhihong LIAO, Qihui LAI, Chen LU

Shenzhen University of Information Technology，Shenzhen 518172，China

Received:2024-06-20 Revised:2024-10-25 Online:2025-10-28 Published:2025-11-03
Contact: Hui TANG

摘要/Abstract

摘要：

针对多无人机航迹规划问题，对无人机与威胁区域进行数学建模，模拟出接近实际环境的场景，基于此设计一种多智能体强化学习的动态场景航迹规划算法（dynamic scene trajectory planning algorithm based on multi-agent reinforcement learning，DSTP-MARL），用于实现多架无人机的智能航迹规划，该算法通过有效避开威胁区域，保障无人机安全到达目标地点，优化任务执行路径。为验证算法性能，将DSTP-MARL与深度Q网络（deep Q-network, DQN）进行对比，实验结果表明，无论在简单还是复杂威胁区域中，DSTP-MARL均表现出更优的避障能力与任务完成效果，在收敛速度、过程稳定性方面，DSTP-MARL相较于DQN有明显优势，能够更快收敛且更加稳定，有效提升任务执行效率，展现出更高的实用价值与应用前景。

关键词: 数学建模, 多智能体强化学习, 多无人机, 航迹规划, 威胁区域

Abstract:

To deal with multi-unmanned aerial vehicle （UAV） path planning， a mathematical model was developed for UAVs and threat zones to simulate scenarios close to real-world conditions. Based on this， a dynamic scene trajectory planning algorithm based on multi-agent reinforcement learning （DSTP-MARL） was designed for intelligent path planning of multiple UAVs. This algorithm ensures that UAVs reach target destinations safely by avoiding threat zones and helps optimize mission routes. To evaluate the performance of the algorithm， DSTP-MARL was compared with the deep Q-Network （DQN）. Experimental results show that， whether in simple or complex threat environments， DSTP-MARL demonstrates superior obstacle avoidance and mission completion capabilities. In terms of convergence speed and process stability， DSTP-MARL exhibits significant advantages over DQN， converging faster and more stably， thus enhancing mission efficiency. These results indicate a higher practical value and broader application potential of DSTP-MARL.

Key words: mathematical modeling, multi-agent reinforcement learning, multi-unmanned aerial vehicle, path planning

中图分类号:

TP713

韦振汉, 唐辉, 杨煜, 廖植泓, 赖启辉, 卢忱. 基于强化学习的多无人机航迹规划[J]. 现代防御技术, 2025, 53(5): 136-144.

Zhenhan WEI, Hui TANG, Yu YANG, Zhihong LIAO, Qihui LAI, Chen LU. Multi-UAV Path Planning Based on Reinforcement Learning[J]. Modern Defense Technology, 2025, 53(5): 136-144.

图/表 7

图1 场景设置

Fig. 1 Scenario setting

图2 雷达建模

Fig. 2 Radar modelling

图3 山峰建模

Fig. 3 Mountein modelling

图4 DSTP-MARL算法框架

Fig. 4 Framework of DSTP-MARL algorithm

图5 初始环境

Fig. 5 Initial environment

图6 简单无人机两种场景下算法对比

Fig. 6 Comparison of algorithms for two scenarios of simple UAVS

图7 复杂无人机复杂场景下算法对比

Fig. 7 Comparison of algorithms for complex scenarios of complex UAVS

参考文献 18

[1]	翟云逸. 多智能体强化学习驱动的无人机动态信道分配［J］. 电讯技术， 2023， 63（3）： 329-334.
	ZHAI Yunyi. Dynamic Channel Allocation for UAV Formation Driven by Multi-Agent Reinforcement Learning［J］. Telecommunication Engineering， 2023， 63（3）： 329-334.
[2]	李博扬，刘洋，万诺天，等. 基于强化学习的无人机电磁干扰感知与抗干扰传输方法［J］. 电讯技术， 2023， 63（12）： 1855-1861.
	LI Boyang， LIU Yang， WAN Nuotian， et al. An Electromagnetic Jamming Sensing and Anti-Jamming Transmission Method of UAV Based on Reinforcement Learning［J］. Telecommunication Engineering， 2023， 63（12）： 1855-1861.
[3]	张学伟，田栢苓，鲁瀚辰，等. 面向复杂未知多障碍环境的多无人机分布式在线轨迹规划［J］. 中国科学（信息科学）， 2022， 52（9）： 1627-1641.
	ZHANG Xuewei， TIAN Bailing， LU Hanchen， et al. Multi-UAV Decentralized Online Trajectory Planning in Complex Unknown Obstacle-Rich Environments［J］. Scientia Sinica（Informationis）， 2022， 52（9）： 1627-1641.
[4]	LI Bowen， NA Zhenyu， LIN Bin. UAV Trajectory Planning from a Comprehensive Energy Efficiency Perspective in Harsh Environments［J］. IEEE Network， 2022， 36（4）： 62-68.
[5]	TANG Xiaowei， HUANG Yi， SHI Yunmei， et al. 3D Trajectory Planning for Real-Time Image Acquisition in UAV-Assisted VR［J］. IEEE Transactions on Wireless Communications， 2024， 23（1）： 16-30.
[6]	ZHU Botao， BEDEER E， NGUYEN H H， et al. UAV Trajectory Planning in Wireless Sensor Networks for Energy Consumption Minimization by Deep Reinforcement Learning［J］. IEEE Transactions on Vehicular Technology， 2021， 70（9）： 9540-9554.
[7]	凌文通，倪建军，陈颜，等. 基于改进鸽群优化算法的多无人机目标搜索［J］. 计算机工程与科学， 2022， 44（3）： 530-535.
	LING Wentong， NI Jianjun， CHEN Yan， et al. Multi-UAV Target Search Based on Improved Pigeon Swarm Algorithm［J］. Computer Engineering & Science， 2022， 44（3）： 530-535.
[8]	HU Wenjian， YU Yao， LIU Shumei， et al. Multi-UAV Coverage Path Planning： A Distributed Online Cooperation Method［J］. IEEE Transactions on Vehicular Technology， 2023， 72（9）： 11727-11740.
[9]	ZHANG Baochang， LIU Wanquan， MAO Zhili， et al. Cooperative and Geometric Learning Algorithm （CGLA） for Path Planning of UAVs with Limited Information［J］. Automatica， 2014， 50（3）： 809-820.
[10]	YAN Chao， XIANG Xiaojia. A Path Planning Algorithm for UAV Based on Improved Q-Learning［C］∥2018 2nd International Conference on Robotics and Automation Sciences （ICRAS）. Piscataway： IEEE， 2018： 1-5.
[11]	GAO Yang， LI Yuankai， GUO Ziqi. A Q-Learning Based UAV Path Planning Method with Awareness of Risk Avoidance［C］∥2021 China Automation Congress （CAC）. Piscataway： IEEE， 2021： 669-673.
[12]	LIU Qian， SHI Long， SUN Linlin， et al. Path Planning for UAV-Mounted Mobile Edge Computing with Deep Reinforcement Learning［J］. IEEE Transactions on Vehicular Technology， 2020， 69（5）： 5723-5728.
[13]	石伟龙. 基于粒子群算法的无人机轨迹规划研究［D］. 石家庄：河北科技大学， 2022.
	SHI Weilong. Research on Trajectory Planning of UAV Based on Particle Swarm Optimization［D］. Shijiazhuang： Hebei University of Science and Technology， 2022.
[14]	孔富晨. 基于专家知识的强化学习无人机路径规划方法研究［D］. 镇江：江苏科技大学， 2023.
	KONG Fuchen. Study on Expert Knowledge-Based Reinforcement Learning Method for Unmanned Aerial Vehicle Path Planning［D］. Zhenjiang： Jiangsu University of Science and Technology， 2023.
[15]	符小卫，潘静. 无人机集群规避动态障碍物的分布式队形控制［J］. 系统工程与电子技术， 2022， 44（2）： 529-537.
	FU Xiaowei， PAN Jing. Distributed Formation Control of UAV Swarm with Dynamic Obstacle Avoidance［J］. Systems Engineering and Electronics， 2022， 44（2）： 529-537.
[16]	DAI Chen， ZHU Kun， HOSSAIN E. Multi-Agent Deep Reinforcement Learning for Joint Decoupled User Association and Trajectory Design in Full-Duplex Multi-UAV Networks［J］. IEEE Transactions on Mobile Computing， 2023， 22（10）： 6056-6070.
[17]	赵毓，郭继峰，郑红星，等. 基于强化学习的多无人机避碰计算制导方法［J］. 导航定位与授时， 2021， 8（1）： 31-40.
	ZHAO Yu， GUO Jifeng， ZHENG Hongxing， et al. A Reinforcement Learning Based Computational Guidance Approach for UAVs Collision Avoidance［J］. Navigation Positioning and Timing， 2021， 8（1）： 31-40.
[18]	李欣童，熊智，陈明星，等. 基于深度强化学习的无人机集群协同信息筛选方法研究［J］. 电光与控制， 2021， 28（10）： 6-10.
	LI Xintong， XIONG Zhi， CHEN Mingxing， et al. A Collaborative Information Screening Method for UAV Swarm Based on Deep Reinforcement Learning［J］. Electronics Optics & Control， 2021， 28（10）： 6-10.

基于强化学习的多无人机航迹规划

Multi-UAV Path Planning Based on Reinforcement Learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 18

相关文章 4

编辑推荐

Metrics

本文评价

[1]	费陈, 赵亮, 贺拥亮, 李银城, 徐嵩. 城市环境下无人机群目标打击航迹规划[J]. 现代防御技术, 2025, 53(1): 1-10.
[2]	白宗磊, 刘秀华, 白天翔, 孙科武. 面向空天防御体系的多任务可控涌现机制研究[J]. 现代防御技术, 2023, 51(3): 39-48.
[3]	杨萍, 闫少强, 汪江鹏, 吴丰轩, 阎哲, 燕松. 一种基于安全飞行空间的UAV低空突防改进模型[J]. 现代防御技术, 2022, 50(6): 124-131.
[4]	李春, 宋晓程, 李芳芳. 多无人机编队协同控制软件平台设计与仿真[J]. 现代防御技术, 2018, 46(6): 143-150.