To deal with multi-unmanned aerial vehicle (UAV) path planning, a mathematical model was developed for UAVs and threat zones to simulate scenarios close to real-world conditions. Based on this, a dynamic scene trajectory planning algorithm based on multi-agent reinforcement learning (DSTP-MARL) was designed for intelligent path planning of multiple UAVs. This algorithm ensures that UAVs reach target destinations safely by avoiding threat zones and helps optimize mission routes. To evaluate the performance of the algorithm, DSTP-MARL was compared with the deep Q-Network (DQN). Experimental results show that, whether in simple or complex threat environments, DSTP-MARL demonstrates superior obstacle avoidance and mission completion capabilities. In terms of convergence speed and process stability, DSTP-MARL exhibits significant advantages over DQN, converging faster and more stably, thus enhancing mission efficiency. These results indicate a higher practical value and broader application potential of DSTP-MARL.