现代防御技术 ›› 2026, Vol. 54 ›› Issue (3): 71-81.DOI: 10.3969/j.issn.1009-086x.2026.03.007

• ?栏目名称:论文? • 上一篇    下一篇

基于强化学习的无人攻击机与诱饵机协同航迹规划

祁昊哲1, 郑明发2, 胡小荣3, 杨楠1   

  1. 1.空军工程大学 空管领航学院,陕西 西安 710051
    2.空军工程大学 基础部,陕西 西安 710051
    3.军事科学院 国防科技创新研究院,北京 100071
  • 收稿日期:2025-05-20 修回日期:2025-11-23 出版日期:2026-06-28 发布日期:2026-07-03
  • 通讯作者: 郑明发
  • 作者简介:祁昊哲(2005-),男,山西运城人。本科生,研究方向为无人机作战任务规划。

Reinforcement Learning-Based Cooperative Trajectory Planning for Unmanned Combat Aerial Vehicles and Decoy UAVs

Haozhe QI1, Mingfa ZHENG2, Xiaorong HU3, Nan YANG1   

  1. 1.Air Traffic Control and Navigation School,Air Force Engineering University,Xi′an 710051,China
    2.Fundamentals Department,Air Force Engineering University,Xi′an 710051,China
    3.National Innovation Institute of Defense Technology,Beijing 100071,China
  • Received:2025-05-20 Revised:2025-11-23 Online:2026-06-28 Published:2026-07-03
  • Contact: Mingfa ZHENG

摘要:

在现代战争中,无人机协同作战已成为提升战场效能的关键技术。其中,无人战斗机(UCAV)与诱饵机(Decoy UAV)的协同作战模式因其战术价值受到广泛关注。针对无人攻击机与诱饵机协同打击敌方重点目标的任务,提出了一种基于近端策略优化(proximal policy optimization, PPO)算法的协同航迹规划方法。构建融合动态威胁评估的马尔可夫决策过程(Markov decision process, MDP)模型,集成无人机运动学与战场环境约束,设计状态、动作空间及分层奖励函数。仿真实验表明,所提方法能有效引导攻击机与诱饵机在复杂战场环境中实现高效协同,显著提升任务成功率并降低被敌方防空系统拦截的风险,为无人机协同作战的智能化路径规划提供了理论与技术支撑。

关键词: 无人机协同作战, 航迹规划, 强化学习, PPO算法, 马尔可夫决策

Abstract:

Unmanned aerial vehicle (UAV) cooperative combat is crucial in modern warfare. The cooperative mode between unmanned combat aerial vehicles (UCAVs) and decoy UAVs has gained significant attention due to its tactical value. This paper proposes a cooperative trajectory planning method based on the proximal policy optimization (PPO) algorithm for UCAV and decoy UAV strike missions against key enemy targets. We construct a Markov decision process (MDP) model incorporating dynamic threat assessment, integrating UAV kinematics and battlefield constraints, and design the state/action spaces and a hierarchical reward function. Simulation results demonstrate that the proposed method effectively guides UCAVs and decoys to achieve efficient cooperation in complex environments, significantly increasing mission success rates while reducing interception risks from enemy air defense systems. This provides theoretical and technical support for intelligent path planning in UAV cooperative operations.

Key words: UAVs cooperative operations, trajectory planning, reinforcement learning, proximal policy optimization(PPO)

中图分类号: