现代防御技术 ›› 2025, Vol. 53 ›› Issue (5): 136-144.DOI: 10.3969/j.issn.1009-086x.2025.05.014

• 目标特性与探测跟踪技术 • 上一篇    

基于强化学习的多无人机航迹规划

韦振汉, 唐辉, 杨煜, 廖植泓, 赖启辉, 卢忱   

  1. 深圳信息职业技术大学,广东 深圳 518172
  • 收稿日期:2024-06-20 修回日期:2024-10-25 出版日期:2025-10-28 发布日期:2025-11-03
  • 通讯作者: 唐辉
  • 作者简介:韦振汉(1989-)男,广西河池人。实验师,硕士。研究方向为机器学习、强化学习。
  • 基金资助:
    深圳市科技创新局资助项目(KJZD20240903103300002)

Multi-UAV Path Planning Based on Reinforcement Learning

Zhenhan WEI, Hui TANG, Yu YANG, Zhihong LIAO, Qihui LAI, Chen LU   

  1. Shenzhen University of Information Technology,Shenzhen 518172,China
  • Received:2024-06-20 Revised:2024-10-25 Online:2025-10-28 Published:2025-11-03
  • Contact: Hui TANG

摘要:

针对多无人机航迹规划问题,对无人机与威胁区域进行数学建模,模拟出接近实际环境的场景,基于此设计一种多智能体强化学习的动态场景航迹规划算法(dynamic scene trajectory planning algorithm based on multi-agent reinforcement learning,DSTP-MARL),用于实现多架无人机的智能航迹规划,该算法通过有效避开威胁区域,保障无人机安全到达目标地点,优化任务执行路径。为验证算法性能,将DSTP-MARL与深度Q网络(deep Q-network, DQN)进行对比,实验结果表明,无论在简单还是复杂威胁区域中,DSTP-MARL均表现出更优的避障能力与任务完成效果,在收敛速度、过程稳定性方面,DSTP-MARL相较于DQN有明显优势,能够更快收敛且更加稳定,有效提升任务执行效率,展现出更高的实用价值与应用前景。

关键词: 数学建模, 多智能体强化学习, 多无人机, 航迹规划, 威胁区域

Abstract:

To deal with multi-unmanned aerial vehicle (UAV) path planning, a mathematical model was developed for UAVs and threat zones to simulate scenarios close to real-world conditions. Based on this, a dynamic scene trajectory planning algorithm based on multi-agent reinforcement learning (DSTP-MARL) was designed for intelligent path planning of multiple UAVs. This algorithm ensures that UAVs reach target destinations safely by avoiding threat zones and helps optimize mission routes. To evaluate the performance of the algorithm, DSTP-MARL was compared with the deep Q-Network (DQN). Experimental results show that, whether in simple or complex threat environments, DSTP-MARL demonstrates superior obstacle avoidance and mission completion capabilities. In terms of convergence speed and process stability, DSTP-MARL exhibits significant advantages over DQN, converging faster and more stably, thus enhancing mission efficiency. These results indicate a higher practical value and broader application potential of DSTP-MARL.

Key words: mathematical modeling, multi-agent reinforcement learning, multi-unmanned aerial vehicle, path planning

中图分类号: