[1] 钟华.贴近实战的外军军事训练[J].国防科技,2014,35(4):104. ZHONG Hua.Foreign Military Training Close to Actual Combat[J].National Defense Science and technology,2014,35(4):104. [2] 寇英信,李战武,李俊兵,等.现代战斗机作战任务管理与决策[M].北京:国防工业出版社,2017. NI KOU Ying-xin,LI Zhan-wu,LI Jun-bing,et al.Mission Management and Decision of Modern Fighter[M].Beijing:National Defense Industry Press,2017. [3] 刘驰,王占健,戴子彭,等.深度强化学习:学术前沿与实战应用[M].北京:机械工业出版社. NI LIU Chi,WANG Zhan-jian,DAI Zi-peng.Deep Reinforcement Learning:Research Frontiers and Practical Applications [M].Mechanical Industry Press. [4] POLI R,KENNEDY J,BLACKWELL T.Particle Swarm Optimization:An Overview[J].Swarm Intelligence,2007,1(1). [5] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature 518 (2015) 529. [6] TESAURO G.Temporal Difference Learning and Td-Gammon[J].Commmu.Acm 38,58-68(1995) [7] Marc G Bellmare,Will Dabney,Remi Munos.A Distributional Perspective on Reinforcement Learning[C]//In International Conference on Machine Learning,2017:449-458. [8] David Silver,Guy Lever,Nicolas Heess,et al.Deterministic policy gradient algorithms[C]//In International Conference on Machine Learning,2014. [9] Diederik Kingma,Jimmy Ba Adam.A Method for Stochastic Optimization[C]//In International Conference on Learning Representations,2015. [10] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous Control with Deep Reinforcement Learning[J].arXiv Preprint arXiv:1509.02971 (2015). [11] SILVER D,HUANG A,MADDISON C J,et al.Mastering the Game of go With Deep Neural Networks and Tree Search[J].Nature 529 (2016) 484. [12] VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster Level in StarCraft II Using Muti-agent Reinforcement Learning[J].Nature 575,350-354(2019) doi:10.1038/s41586-019-1724-z. [13] GIBIANSKY A.BringHpc Techniques to Deep Learning.Technical report,Baidu Research,Tech.Rep.,2017,Zhang TESTS & CERTIFICATIONS IBM Certified Database Associate-DB2 Universal Database (2017) [14] Gabriel,Barth-Maron,Matthew W,et al.Distributed Distributional Deterministic Policy Gradients[J].arXiv preprint arXiv:1804.08617. [15] John Schulman,Sergey Levine,Pieter Abbeel,et al.Trust Regon Policy Optimization[C]//In Proceedings of the 32nd International Conference on Machine Learning (ICML-15),2015:1889-1897. |