Modern Defense Technology ›› 2026, Vol. 54 ›› Issue (3): 71-81.DOI: 10.3969/j.issn.1009-086x.2026.03.007
• PAPERS • Previous Articles Next Articles
Haozhe QI1, Mingfa ZHENG2, Xiaorong HU3, Nan YANG1
Received:2025-05-20
Revised:2025-11-23
Online:2026-06-28
Published:2026-07-03
Contact:
Mingfa ZHENG
通讯作者:
郑明发
作者简介:祁昊哲(2005-),男,山西运城人。本科生,研究方向为无人机作战任务规划。
CLC Number:
Haozhe QI, Mingfa ZHENG, Xiaorong HU, Nan YANG. Reinforcement Learning-Based Cooperative Trajectory Planning for Unmanned Combat Aerial Vehicles and Decoy UAVs[J]. Modern Defense Technology, 2026, 54(3): 71-81.
祁昊哲, 郑明发, 胡小荣, 杨楠. 基于强化学习的无人攻击机与诱饵机协同航迹规划[J]. 现代防御技术, 2026, 54(3): 71-81.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.xdfyjs.cn/EN/10.3969/j.issn.1009-086x.2026.03.007
| 事件 | 描述 | 取值 |
|---|---|---|
| 胜 | 消灭敌方地导或目标 | 1 000 |
| 负 | 我方无人机被击落 | -1 000 |
| 平 | 双方无对抗 | -10 |
Table 1 Episode reward configuration
| 事件 | 描述 | 取值 |
|---|---|---|
| 胜 | 消灭敌方地导或目标 | 1 000 |
| 负 | 我方无人机被击落 | -1 000 |
| 平 | 双方无对抗 | -10 |
| 事件 | 描述 | 取值 |
|---|---|---|
| 击落 | 击落敌方地导 | 1 000 |
| 被击落 | 战斗机被敌方击落 | -1 000 |
| 被击落 | 诱饵机被敌方击落 | -100 |
Table 2 Critical events reward configuration
| 事件 | 描述 | 取值 |
|---|---|---|
| 击落 | 击落敌方地导 | 1 000 |
| 被击落 | 战斗机被敌方击落 | -1 000 |
| 被击落 | 诱饵机被敌方击落 | -100 |
| 事件 | 描述 | 取值 |
|---|---|---|
| 每步 | 每个时间步给予负奖励 | -1 |
| 诱饵机吸引火力 | 牵制敌方雷达 | 200 |
| 被锁定 | 被敌方锁定 | -100 |
| 战斗机靠近 | 在诱饵机牵制期间靠近目标 | 100 |
Table 3 Single-step reward configuration
| 事件 | 描述 | 取值 |
|---|---|---|
| 每步 | 每个时间步给予负奖励 | -1 |
| 诱饵机吸引火力 | 牵制敌方雷达 | 200 |
| 被锁定 | 被敌方锁定 | -100 |
| 战斗机靠近 | 在诱饵机牵制期间靠近目标 | 100 |
| 参数类别 | 参数名称 | 取值 范围 | 说明 |
|---|---|---|---|
| 基础参数 | 最大训练步数 | 100 000 | 终止条件 |
| 回合最大步长 | 1 000 | 单局最大执行步数 | |
| 优化参数 | 学习率 | 0.000 3 | Adam优化器初始学习率 |
| 新旧策略裁剪阈值 | 0.2 | PPO Clip约束参数 | |
| 折扣参数 | 奖励折扣因子γ | 0.995 | 未来奖励衰减系数 |
| GAE参数λ | 0.95 | 广义优势估计参数 | |
| 损失函数参数 | 策略损失权重 | 1.0 | 策略梯度权重 |
| 值函数损失权重 | 0.5 | 价值网络损失权重 | |
| 熵正则化系数 | 0.01 | 策略熵项权重 | |
| 数据处理参数 | 小批量样本数 | 64 | 梯度更新批次大小 |
| 随机种子数 | 3 | 初始化随机种子 |
Table 4 Deep reinforcement learning training parameter configuration
| 参数类别 | 参数名称 | 取值 范围 | 说明 |
|---|---|---|---|
| 基础参数 | 最大训练步数 | 100 000 | 终止条件 |
| 回合最大步长 | 1 000 | 单局最大执行步数 | |
| 优化参数 | 学习率 | 0.000 3 | Adam优化器初始学习率 |
| 新旧策略裁剪阈值 | 0.2 | PPO Clip约束参数 | |
| 折扣参数 | 奖励折扣因子γ | 0.995 | 未来奖励衰减系数 |
| GAE参数λ | 0.95 | 广义优势估计参数 | |
| 损失函数参数 | 策略损失权重 | 1.0 | 策略梯度权重 |
| 值函数损失权重 | 0.5 | 价值网络损失权重 | |
| 熵正则化系数 | 0.01 | 策略熵项权重 | |
| 数据处理参数 | 小批量样本数 | 64 | 梯度更新批次大小 |
| 随机种子数 | 3 | 初始化随机种子 |
| 算法 | TSR/% | ASP/% | APL_UCAV/km | APL_Decoy/km | ASC (kSteps) | AR |
|---|---|---|---|---|---|---|
| PPO Clip | 92.0 | 88.0 | 142.3 | 98.7 | 40.0 | 856.2 |
| PPO | 85.0 | 82.0 | 138.5 | 95.2 | 50.0 | 782.4 |
| DDPG | 78.0 | 75.0 | 145.8 | 101.5 | >60.0 | 701.8 |
| A3C | 80.0 | 77.0 | 140.1 | 97.3 | 55.0 | 725.6 |
| DQN | 65.0 | 68.0 | 130.5 | 90.8 | 45.0 | 620.3 |
Table 5 Quantitative performance comparison of algorithms
| 算法 | TSR/% | ASP/% | APL_UCAV/km | APL_Decoy/km | ASC (kSteps) | AR |
|---|---|---|---|---|---|---|
| PPO Clip | 92.0 | 88.0 | 142.3 | 98.7 | 40.0 | 856.2 |
| PPO | 85.0 | 82.0 | 138.5 | 95.2 | 50.0 | 782.4 |
| DDPG | 78.0 | 75.0 | 145.8 | 101.5 | >60.0 | 701.8 |
| A3C | 80.0 | 77.0 | 140.1 | 97.3 | 55.0 | 725.6 |
| DQN | 65.0 | 68.0 | 130.5 | 90.8 | 45.0 | 620.3 |
| [1] | 韦振汉, 唐辉, 杨煜, 等. 基于强化学习的多无人机航迹规划[J]. 现代防御技术, 2025, 53(5): 136-144. |
| WEI Zhenhan, TANG Hui, YANG Yu, et al. Multi-UAV Path Planning Based on Reinforcement Learning[J]. Modern Defence Technology, 2025, 53(5): 136-144. | |
| [2] | 费陈, 赵亮, 贺拥亮, 等. 城市环境下无人机群目标打击航迹规划[J]. 现代防御技术, 2025, 53(1): 1-10. |
| FEI Chen, ZHAO Liang, HE Yongliang, et al. Trajectory Planning for UAV Swarm Target Strikes in Urban Environments[J]. Modern Defence Technology, 2025, 53(1): 1-10. | |
| [3] | 潘楠, 刘海石, 陈启用, 等. 多基地多目标无人机协同任务规划算法研究[J]. 现代防御技术, 2021, 49(2): 49-56. |
| PAN Nan, LIU Haishi, CHEN Qiyong, et al. Study on Cooperative Mission Planning Algorithm for Multi-base and Multi-target UAV[J]. Modern Defence Technology, 2021, 49(2): 49-56. | |
| [4] | 房霄, 曾贲, 宋祥祥, 等. 基于深度强化学习的舰艇空中威胁行为建模[J]. 现代防御技术, 2020, 48(5): 59-66. |
| FANG Xiao, ZENG Ben, SONG Xiangxiang, et al. Modeling of Air Target Threat to Warship Based on Deep Reinforcement Learning[J]. Modern Defence Technology, 2020, 48(5): 59-66. | |
| [5] | LI Qiuxiang, WU Jianping. Efficient Network Attack Path Optimization Method Based on Prior Knowledge-Based PPO Algorithm[J]. Cybersecurity, 2025, 8(1): 15. |
| [6] | 屈文涛, 谢韩彧, 刘鑫, 等. 基于改进遗传算法的油气管道无人机航迹规划[J]. 科学技术与工程, 2024, 24(27): 11901-11908. |
| QU Wentao, XIE Hanyu, LIU Xin, et al. Path Planning of UAV in Oil and Gas Pipeline Based on Improved Genetic Algorithm[J]. Science Technology and Engineering, 2024, 24(27): 11901-11908. | |
| [7] | 唐颂, 吴建源. 基于改进遗传算法的协同航迹规划方法[J]. 电光与控制, 2024, 31(7): 8-12, 26. |
| TANG Song, WU Jianyuan. A Cooperative Trajectory Planning Method Based on Improved Genetic Algorithm[J]. Electronics Optics & Control, 2024, 31(7): 8-12, 26. | |
| [8] | 王瑶, 任安虎, 任洋洋. 改进蚁群算法的无人机航迹规划[J]. 电光与控制, 2024, 31(4): 43-48. |
| WANG Yao, REN Anhu, REN Yangyang. An Improved Ant Colony Algorithm for UAV Trajectory Planning[J]. Electronics Optics & Control, 2024, 31(4): 43-48. | |
| [9] | XU Chentao, ZHOU Shiqi, LIANG Maohan, et al. Reliable Vessel Trajectory Clustering: A Maritime Shipping Network-Driven Computational Method[J]. Ocean Engineering, 2025, 336: 121691. |
| [10] | 郝昱. 基于强化学习的无人机火灾救援航迹规划[D]. 赣州: 江西理工大学, 2023. |
| HAO Yu. Route Planning for Fire Rescue of Unmanned Aerial Vehicles Based on Reinforcement Learning[D]. Ganzhou: Jiangxi University of Science and Technology, 2023. | |
| [11] | 周枫. 基于智能算法的无人机航迹规划研究[D]. 镇江: 江苏科技大学, 2023. |
| ZHOU Feng. Research on Unmanned Aerial Vehicle Path Planning Based on Intelligent Algorithms[D]. Zhenjiang: Jiangsu University of Science and Technology, 2023. | |
| [12] | 王力, 赵全海, 黄石磊. 面向物流机器人的改进Q-Learning动态避障算法研究[J]. 计算机测量与控制, 2025, 33(3): 267-274. |
| WANG Li, ZHAO Quanhai, HUANG Shilei. Improved Q-Learning Dynamic Obstacle Avoidance Algorithm for Logistics Robots[J]. Computer Measurement & Control, 2025, 33(3): 267-274. | |
| [13] | WANG Meng, GUI Xueqian, YAN Huaicheng, et al. Event-Triggered Optimal Bipartite Consensus Control for Constrained Multiagent Systems via Internal Reinforce Q-Learning[J]. IEEE Transactions on Cybernetics, 2025, 55(8): 3852-3865. |
| [14] | 杨淞匀, 王杭先, 林鹏. 基于DDPG算法的无人船避障路径规划[J]. 信息技术, 2025, 49(3): 1-7, 15. |
| YANG Songyun, WANG Hangxian, LIN Peng. Obstacle Avoidance Path Planning of Unmanned Surface Vessels Based on DDPG Algorithm[J]. Information Technology, 2025, 49(3): 1-7, 15. | |
| [15] | 桑垚, 马晓宁. 改进奖励函数的深度强化学习路径规划方法[J]. 计算机应用与软件, 2025, 42(1): 271-276. |
| SANG Yao, MA Xiaoning. Path Planning Method of Deep Reinforcement Learning with Improved Reward Function[J]. Computer Applications and Software, 2025, 42(1): 271-276. | |
| [16] | 周从航, 李建兴, 石宇静, 等. 深度强化学习在无人机编队路径规划中的应用[J]. 电光与控制, 2024, 31(10): 27-33. |
| ZHOU Conghang, LI Jianxing, SHI Yujing, et al. Application of Deep Reinforcement Learning in Path Planning of UAV Formation[J]. Electronics Optics & Control, 2024, 31(10): 27-33. | |
| [17] | 李京涛. 基于强化学习的多智能体救援策略研究[D]. 杭州: 浙江科技大学, 2024. |
| LI Jingtao. Research on Multi-agent Rescue Strategy Based on Reinforcement Learning[D]. Hangzhou: Zhejiang University of Science and Technology, 2024. | |
| [18] | LUO Wangbin, WANG Xiang, HAN Fang, et al. Research on LSTM-PPO Obstacle Avoidance Algorithm and Training Environment for Unmanned Surface Vehicles[J]. Journal of Marine Science and Engineering, 2025, 13(3): 479. |
| [19] | WU Yaohuan, XIE Nan. Design of Digital Low-Carbon System for Smart Buildings Based on PPO Algorithm[J]. Sustainable Energy Research, 2025, 12(1): 9. |
| [20] | ZHANG Xiaoya, ZHANG Yuyang, DONG Ping, et al. PPORM: A PPO-Assisted Packet Reordering Mechanism of Heterogeneous VANETs for Enhancing Goodput and Stability in Fog Computing[J]. Vehicular Communications, 2025, 53: 100894. |
| [21] | WANG Junwei, ZENG Zilin, SHANG Peng. Smooth Clip Advantage PPO in Reinforcement Learning[J]. Journal of Physics: Conference Series, 2023, 2513(1): 012005. |
| [1] | Wei GUO, Yuan CHANG, Fang CHENG, Qingyun WANG, Chong WANG. Research and Reflection on Intelligent Guidance Based on Deep Reinforcement Learning [J]. Modern Defense Technology, 2026, 54(1): 73-84. |
| [2] | Yaoluo HUI, Bo XU, Xiumin LI, Luyao ZANG, Liu LIU. A Review of Development of Intelligent Penetration Technology for Missile Cluster [J]. Modern Defense Technology, 2025, 53(6): 69-81. |
| [3] | Zhenhan WEI, Hui TANG, Yu YANG, Zhihong LIAO, Qihui LAI, Chen LU. Multi-UAV Path Planning Based on Reinforcement Learning [J]. Modern Defense Technology, 2025, 53(5): 136-144. |
| [4] | Shixiang YAN, Haijun LIU. Sensor-Weapon-Target Assignment Method Based on Deep Reinforcement Learning [J]. Modern Defense Technology, 2025, 53(4): 10-17. |
| [5] | Wanchun CHEN, Jia ZHENG, Xuehe ZHENG, Qi YU, Peng ZENG, Chao WANG. Trajectory Planning Method for Boost Phase Interceptor Based on Adaptive Energy Allocation [J]. Modern Defense Technology, 2025, 53(3): 103-111. |
| [6] | Zhongyu WANG, Xiaopeng XU, Dong WANG. Policy Transfer Reinforcement Learning Method for Partially Observable Conditions [J]. Modern Defense Technology, 2024, 52(2): 63-71. |
| [7] | Zhehao WANG, Tian LI, Haidong YAN, Yudong HU, Changsheng GAO. Improved Drag Acceleration Profile Design for Hypersonic Vehicles [J]. Modern Defense Technology, 2024, 52(2): 115-123. |
| [8] | Futai LIANG, Yan ZHOU, Chenhao ZHANG, Zihao SONG, Xiaorui ZHAO. Threat Assessment Method of Aerial Targets under Confrontational Conditions [J]. Modern Defense Technology, 2024, 52(1): 147-154. |
| [9] | Zonglei BAI, Xiuhua LIU, Tianxiang BAI, Kewu SUN. Research on Multi-task Controllable Emergence Mechanism for Air and Space Defense System [J]. Modern Defense Technology, 2023, 51(3): 39-48. |
| [10] | Ping YANG, Shao-qiang YAN, Jiang-peng WANG, Feng-xuan WU, Zhe YAN, Song YAN. An Improved UAV Low Altitude Penetration Model Based on Safe Flight Space [J]. Modern Defense Technology, 2022, 50(6): 124-131. |
| [11] | Ji-shi-yu DING, Ke-wu SUN, Bo DONG, Xi-rui YANG, Chang-chao FAN, Zhe MA. Multi-agent Autonomous Cooperative Confrontation based on Meta Curriculum Reinforcement Learning [J]. Modern Defense Technology, 2022, 50(5): 36-42. |
| [12] | FANG Xiao, ZENG Bi, SONG Xiang-xiang, JIA Zheng-xuan. Modeling of Air Target Threat to Warship Based on Deep Reinforcement Learning [J]. Modern Defense Technology, 2020, 48(5): 59-66. |
| [13] | TANG Run-ze, ZHANG Cheng-long, LI Lin-lin. Application of Artificial Intelligence on Situation Assessment and Game Countermeasure in Unmanned Battlefield [J]. Modern Defense Technology, 2020, 48(5): 25-31. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||