Simulation of Game-Theoretic Decision-Making for Beyond-Visual-Range Combat with UCAVs

doi:10.3969/j.issn.1009-086x.2026.03.009

Abstract

Abstract:

Reinforcement learning （RL） performance in beyond-visual-range （BVR） air combat is constrained by inadequate training opponents. This paper proposes a rule-based agent decision framework serving as RL training adversaries， where simulations confirm significantly enhanced combat effectiveness through efficient mastery of tactical maneuvers and improved adaptive decision-making. Fundamental aircraft maneuvers are modeled within an air combat simulation environment with collaborative strategy training modules. To address incomplete coverage and complexity in conventional rule-based decision trees， a state-machine-driven framework implements event-condition mechanisms for state transitions and combat decisions， demonstrating superior performance in comparative simulations. Finally， RL agents trained against this state-machine-based opponent under expert knowledge guidance autonomously acquire classical maneuvers while exhibiting advanced decision adaptability， providing foundational insights for BVR decision systems.

Key words: beyond visual range air combat, unmanned aerial vehicle（UAV）, independent decision making, rule based decision making, reinforcement learning, expert domain knowledge

摘要：

超视距空战中强化学习性能受限于训练对手质量。为此，提出一种基于规则的智能体决策框架作为强化学习智能体的训练对手。经仿真验证，以此框架训练的智能体可高效掌握典型空战策略，作战效能明显提升。介绍了战机基本机动动作，建立了空战仿真模块和协同策略训练模块。针对现有规则决策树存在的规则梳理不全面、繁琐复杂难梳理等问题，提出了基于状态机转移的决策逻辑框架，利用事件条件实现状态跳转与决策，其相较于传统决策树具备更强的空战决策能力。建立单机超视距空战强化学习智能体，并以基于状态机转移的决策逻辑框架为对手引导智能体学习训练，在规则专家知识引导下训练的智能体能够自主学习到典型机动动作，同时具备更好的决策适应水平和作战能力，为超视距空战决策系统的进一步研究提供了思路。

关键词: 超视距空战, 无人机, 自主决策, 基于规则的决策, 强化学习, 专家知识

CLC Number:

Tongyu SHI, Hao WANG, Youkun WANG, Maolong LÜ. Simulation of Game-Theoretic Decision-Making for Beyond-Visual-Range Combat with UCAVs[J]. Modern Defense Technology, 2026, 54(3): 93-103.

史桐雨, 王昊, 王酉琨, 吕茂隆. 无人作战飞机超视距空战博弈对抗决策仿真[J]. 现代防御技术, 2026, 54(3): 93-103.

Figures/Tables 21

Fig. 1 Flowchart for generating task strategies of one-on-oneair combat cooperative confrontation

Table 1 Tactical action based on the decision frame of beyond visual range air combat

A	介绍	A	介绍
a₁	直飞加速	a₇	半滚倒转防御
a₂	爬升	a₈	大回环防御
a₃	目标位置追踪	a₉	90°侧转机动
a₄	高强度回旋	a₁₀	置尾机动
a₅	低强度回旋	a₁₁	策略偏置
a₆	高角度追踪	a₁₂	蛇形机动

Table 2 Typical situation of beyond visual range air combat decision frame

Q	介绍
q₁	远距离对峙（我方飞机距离敌方≥30 km）
q₂	中远距离对峙（距离<30 km而≥10 km）
q₃	中近距离对峙（距离<10 km而≥5 km）
q₄	近距离对峙（距离<5 km）
q₅	低能量状态（高度速度过低）

Table 3 Typical events of the decision frame of beyond visual range air combat

E	介绍	E	介绍
e₁	天线偏置角变化	e₃	导弹锁定告警
e₂	能量评估	e₄	高度速度评估

Table 4 Typical conditions of beyond visual range air combat decision frame

C	介绍
c₁	我方飞机能量<目标能量的0.6倍
c₂	天线偏置角>70°
c₃	天线偏置角≤15°
c₄	天线偏置角>120°且进入角>150°
c₅	距离<300 m且天线偏置角>60°
c₆	天线偏置角>30°

Table 5 Decision frame rule set for beyond visual range air combat

Rule	Q	E	C	A	Q'
Rule₁	q₁	e₁	c₂	a₃	q₂
Rule₂	q₁	e₁	¬c₂	a₁₁	q₁
Rule₃	q₂	e₁	c₃	a₁₂	q₃
Rule₄	q₂	—	—	a₃	q₂
Rule₅	q₃	e₂	c₁	a₁	q₃
Rule₆	q₃	e₁	c₄	a₄	q₄
Rule₇	q₃	e₁	c₆	a₅	q₄
Rule₈	q₃	e₃	—	a₉	q₄
Rule₉	q₃	—	—	a₃	q₃
Rule₁₀	q₄	e₁	c₆	a₃	q₄
Rule₁₁	q₄	e₁	c₄	a₇	q₄
Rule₁₂	q₄	e₁	¬c₄⋁¬c₅⋁¬c₆	a₆	q₄
Rule₁₃	q₄	e₁	c₅	a₈	q₄
Rule₁₄	q₄	e₃	—	a₁₀	q₄
Rule₁₅	q₄	e₄	—	a₂	q₁
Rule₁₆	q₅	—	—	a₂	q₁

Fig. 2 Situation transformation diagram underthe framework of decision logic

Table 6 Initialization data of simulation experiment environment

参数名称	数值
初始距离/km	100~120
初始高度/km	8~10
初始Ma数	1.0~1.2
初始方位角/（°）	150~180
挂载近距弹数量	2
挂载中距弹数量	3

Table 7 Average arithmetic data of simulation experiment

类型	中距弹发射数	近距弹发射数	胜率/%
红方	2.7	1.8	17
蓝方	2.5	1.5	71

Fig. 3 Example of beyond visual range air combat simulation contest

Fig. 4 Process diagram for converting action numbers to one-hot encoding

Table 8 State space input information parameters

参数	介绍
$L o c h$	我方战机三维坐标
$v h$	我方战机速度矢量
$L o c t$	敌方战机三维坐标
$v t$	敌方战机速度矢量
$d t$	敌我距离矢量
$A T A$	天线偏置角
$A O T$	尾后角
$d m$	我与敌导弹距离矢量

Table 8 State space input information parameters

参数	介绍
$L o c h$	我方战机三维坐标
$v h$	我方战机速度矢量
$L o c t$	敌方战机三维坐标
$v t$	敌方战机速度矢量
$d t$	敌我距离矢量
$A T A$	天线偏置角
$A O T$	尾后角
$d m$	我与敌导弹距离矢量

Table 9 Action space tactical action set

A	介绍	A	介绍
a₁	直飞加速	a₆	高角度追踪
a₂	爬升	a₇	半滚倒转防御
a₃	目标位置追踪	a₈	大回环防御
a₄	高强势回旋	a₉	蛇形机动
a₅	低强势回旋	a₁₀	三九机动

Table 10 Reward function event and value design

奖励类型	事件	取值
回合奖励	胜	10
	平	0
	负	-8
	击落	2
	被击落	-2
关键事件奖励	锁定	0.05
	被锁定	-0.05
	规避导弹	1
	导弹未命中	-1
优势态势奖励	能量优势	0.05
	高度优势	0.04
	角度优势	0.03
高度保护	安全飞行	0.005
高度保护	危险飞行	-0.5
单步奖励	每步	-0.01

Fig. 5 Line chart of the changes in the initial training reward value

Fig. 6 Line chart showing the changes in reward values after training of adversarial strategy optimization

Fig. 7 Escape with climb

Fig. 8 Straight flight after 39 maneuvering

Fig. 9 Flowchart of the intelligent agent against bias strategy process

Fig. 10 Flow chart of the intelligent agent againstthe millstone strategy

Table 11 Simulation experiment data statistics

评估维度	指标名称	规则集	状态机转移
对抗效能	胜率/%	70.8	78.0
武器使用	单目标命中所需发射数	1.9	1.7
决策能力	威胁识别准确率/% 战术响应延迟/ms	91.1 240	92.7 233
鲁棒性	策略迁移成功率/%	89	83

References 19

[1]	邓嘉宁. 无人机空战态势评估与机动决策方法研究［D］. 重庆：重庆大学， 2023.
	DENG Jianing. Research on UAV Air Combat Situation Assessment and Maneuver Decision Method［D］. Chongqing： Chongqing University， 2023.
[2]	车竞，钱炜祺，和争春. 基于矩阵博弈的两机攻防对抗空战仿真［J］. 飞行力学， 2015， 33（2）： 173-177.
	CHE Jing， QIAN Weiqi， HE Zhengchun. Attack-Defense Confrontation Simulation of Air Combat Based on Game-matrix Approach［J］. Flight Dynamics， 2015， 33（2）： 173-177.
[3]	刘昊天，王玉惠，陈谋，等. 基于对局迭代的无人机空战博弈研究［J］. 电光与控制， 2022， 29（2）： 1-6.
	LIU Haotian， WANG Yuhui， CHEN Mou， et al. UAV Air Combat Game Based on Iteration Method［J］. Electronics Optics & Control， 2022， 29（2）： 1-6.
[4]	邓可，彭宣淇，周德云. 基于矩阵对策与遗传算法的无人机空战决策［J］. 火力与指挥控制， 2019， 44（12）： 61-66， 71.
	DENG Ke， PENG Xuanqi， ZHOU Deyun. Study on Air Combat Decision Method of UAV Based on Matrix Game and Genetic Algorithm［J］. Fire Control & Command Control， 2019， 44（12）： 61-66， 71.
[5]	顾佼佼，赵建军，刘卫华. 基于博弈论及 Memetic算法求解的空战机动决策框架［J］. 电光与控制， 2015， 22（1）： 20-23.
	GU Jiaojiao， ZHAO Jianjun， LIU Weihua. Air Combat Maneuvering Decision Framework Based on Game Theory and Memetic Algorithm［J］. Electronics Optics & Control， 2015， 22（1）： 20-23.
[6]	赵明明，李彬，王敏立. 多无人机超视距空战博弈策略研究［J］. 电光与控制， 2015， 22（4）： 41-45.
	ZHAO Mingming， LI Bin， WANG Minli. On Game Strategy for Multi-UAV Beyond-Visual-Range Air Combat［J］. Electronics Optics & Control， 2015， 22（4）： 41-45.
[7]	PARK H， LEE B Y， TAHK M J， et al. Differential Game Based Air Combat Maneuver Generation Using Scoring Function Matrix［J］. International Journal of Aeronautical and Space Sciences， 2016， 17（2）： 204-213.
[8]	邵将，徐扬，罗德林. 无人机多机协同对抗决策研究［J］. 信息与控制， 2018， 47（3）： 347-354.
	SHAO Jiang， XU Yang， LUO Delin. Cooperative Combat Decision-Making Research for Multi UAVs［J］. Information and Control， 2018， 47（3）： 347-354.
[9]	谢季良，马克茂. 基于微分对策的飞行器逃逸策略设计［J］. 航空兵器， 2025， 32（3）： 57-63.
	XIE Jiliang， MA Kemao. Differential Game Evading Strategy for a Flight Vehicle［J］. Aero Weaponry， 2025， 32（3）： 57-63.
[10]	HERRALA O， TERHO T， OLIVEIRA F. Risk-Averse Decision Strategies for Influence Diagrams Using Rooted Junction Trees［J］. Operations Research Letters， 2025， 61： 107308.
[11]	周新民，吴佳晖，贾圣德，等. 无人机空战决策技术研究进展［J］. 国防科技， 2021， 42（3）： 33-41.
	ZHOU Xinmin， WU Jiahui， JIA Shengde， et al. Progress in Research on Combat Decision-Making Technology in UAVs［J］. National Defense Technology， 2021， 42（3）： 33-41.
[12]	刘昊天. 无人机空战对抗博弈决策研究［D］. 南京：南京航空航天大学， 2022.
	LIU Haotian. Research on Game Decision of UAV Air Combat Confrontation［D］. Nanjing： Nanjing University of Aeronautics and Astronautics， 2022.
[13]	徐安，郑万泽，奚之飞，等. 基于进化式决策树的超视距空战机动决策模型［J］. 现代防御技术， 2022， 50（6）： 68-82.
	XU An， ZHENG Wanze， XI Zhifei， et al. Improved Evolutionary Decision Tree for BVR Air Combat Decision Making［J］. Modern Defence Technology， 2022， 50（6）： 68-82.
[14]	刘涛，李艺海，张奇. 基于零和博弈的近距空战机动决策方法研究［J］. 科技与创新， 2025（10）： 18-22.
	LIU Tao， LI Yihai， ZHANG Qi. Decision-Making Method for Close-Range Air Combat Based on Zero-Sum Game［J］. Science and Technology & Innovation， 2025（10）： 18-22.
[15]	王捷，刘俊辉，陈昊，等. 一种基于扰动补偿的机弹协同LOS主动防御制导律［J］. 现代防御技术， 2024， 52（2）： 94-103.
	WANG Jie， LIU Junhui， CHEN Hao， et al. Active Defense Line-of-Sight Guidance Law with Compensation of Unknown Disturbance for the Cooperation of Aircraft and Interceptor［J］. Modern Defence Technology， 2024， 52（2）： 94-103.
[16]	卿朝进，赵桂毅，刘宇畅，等. 专家校验与人工蜂群算法结合的地面防空设备部署策略［J］. 电子信息对抗技术， 2025， 40（3）： 75-84.
	QING Zhaojin， ZHAO Guiyi， LIU Yuchang， et al. Deployment Strategy of Ground Air Defense Equipment Combining Expert Verification and Artificial Bee Colony Algorithm［J］. Electronic Information Warfare Technology， 2025， 40（3）： 75-84.
[17]	任桢，韩兵，蔡慧敏. 基于多专家决策相关滤波的光谱目标跟踪算法［C］∥第十三届中国指挥控制大会论文集（下册）会议论文集. 北京：中国指挥与控制学会， 2025： 328-332.
	REN Zhen， HAN Bing， CAI Huimin. Spectral Target Tracking Algorithm Based on Multi-Expert Decision Correlation Filtering［C］∥Proceedings of the 13th China Conference on Command and Control. Beijing： Chinese Institute of Command and Control， 2025： 328-332.
[18]	梁复台，周焰，张晨浩，等. 对抗条件下空中目标威胁评估方法［J］. 现代防御技术， 2024， 52（1）： 147-154.
	LIANG Futai， ZHOU Yan， ZHANG Chenhao， et al. Threat Assessment Method of Aerial Targets Under Confrontational Conditions［J］. Modern Defence Technology， 2024， 52（1）： 147-154.
[19]	王忠禹，徐晓鹏，王东. 部分可观测条件下的策略迁移强化学习方法［J］. 现代防御技术， 2024， 52（2）： 63-71.
	WANG Zhongyu， XU Xiaopeng， WANG Dong. Policy Transfer Reinforcement Learning Method for Partially Observable Conditions［J］. Modern Defence Technology， 2024， 52（2）： 63-71.