|
In this paper, the trajectory design problem is investigated in wireless communications aided by multiple unmanned aerial vehicles (UAVs), and a multi-UAV trajectory design method called multi-agent twin delayed deep deterministic policy gradient (MA-TD3) is proposed which is able to design continuous trajectories without pre-knowledge of global information such as user locations and channel conditions, through integrating the multi-agent deep deterministic policy gradient (MADDPG) algorithm and twin delayed deep deterministic policy gradient (TD3) algorithm based on actor-critic reinforcement learning (RL) framework. In particular, the multi-UAV trajectory design problem is firstly formulated as a stochastic game (SG) to maximize the completion rate of the transmission tasks. Then, the MA-TD3 method is proposed which is based on the actor-critic RL framework and the learned trajectory is obtained successively. Numerical results show that compared to traditional single agent RL methods, the proposed MA-TD3 method achieves higher completion rate of the transmission tasks by enabling cooperation between multiple UAVs through centralized training and distributed execution. |
|
Keywords:Communication and Information System; trajectory design; multi-UAV aided communication; multi-agent reinforcement learning |
|