AI in EE

AI IN DIVISIONS

AI in Communication Division

Communication in Multi-Agent Reinforcement Learning: Intention Sharing

 Title: Communication in Multi-Agent Reinforcement Learning: Intention Sharing

Authors: Woojun Kim, Jongeui Park and Youngchul Sung

To be presented at International Conference on Learning Representation (ICLR) 2021

 

Communication is one of the core components for learning coordinated behavior in multi-agent systems. In this work, W. Kim et al. proposed a new communication scheme named Intention Sharing (IS) for multi-agent reinforcement learning in order to enhance the coordination among agents. In the proposed scheme, each agent generates an imagined trajectory by modeling the environment dynamics and other agents’ actions. The imagined trajectory is a simulated future trajectory of each agent based on the learned model of the environment dynamics and other agents and represents each agent’s future action plan. Each agent compresses this imagined trajectory capturing its future action plan to generate its intention message for communication by applying an attention mechanism to learn the relative importance of the components in the imagined trajectory based on the received message from other agents. Numeral results show that the proposed IS scheme significantly outperforms other communication schemes in multi-agent reinforcement learning.

 

Figure%201 %EC%84%B1%EC%98%81%EC%B2%A0

Fig. 1. The overall structure of the proposed IS scheme from the perspective of Agent i

Figure%202 %EC%84%B1%EC%98%81%EC%B2%A0
Fig. 2 Performance: : MADDPG (blue), DIAL (green), TarMAC (red), Comm-OA (purple), ATOC (cyan) and the proposed IS method (black).  (PP: Predator-and-Prey, CN: Cooperative Navigation, TJ: Traffic Junction)

Figure%203 %EC%84%B1%EC%98%81%EC%B2%A0

Fig. 3. Imagined trajectories and attention weights of each agent on PP (N=3): 1st row – agent1 (red), 2nd row – agent2 (green), and 3rd row – agent3 (blue). Black squares, circle inside the times icon, and other circles denote the prey, current position, and estimated future positions, respectively. The brightness of the circle is proportional to the attention weight.