Communication with factorized policy gradients in multi-agent deep reinforcement learning

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

In multi-agent deep reinforcement learning (MADRL), agents can learn to communicate to broaden their view and understanding of the environment and their teammates. Previous works on communication in MADRL mainly rely on centralized or independent value functions for learning communication, which cannot differentiate how communicating agents individually contribute to the overall learning process. Moreover, continuous environments that incorporate continuous state/action spaces have received limited attention in previous research. In this paper, we propose a novel architecture for communicating agents and apply centralized but factorized value functions to differentiate how each agent contributes to learning during communication, along with gradient backpropagation. Additionally, to address the complexity introduced by communication, we investigate the use of an attention mechanism that aggregates messages, enabling policies to maintain a fixed input length. We then present a new policy gradient method termed communication with factorized policy gradients (CFPG), featuring full backpropagation from factorized value functions to communicating agents’ architecture. We demonstrate that CFPG can enhance performance and accelerate learning in continuous predator–prey scenarios and multi-agent MuJoCo, when compared to other learning communication methods.

Original languageEnglish
Pages (from-to)18933-18956
Number of pages24
JournalNeural Computing and Applications
Volume37
Issue number23
Early online date2025
DOIs
Publication statusPublished - 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025.

Keywords

  • Communication
  • Continuous multi-agent environments
  • Multi-agent reinforcement learning

Fingerprint

Dive into the research topics of 'Communication with factorized policy gradients in multi-agent deep reinforcement learning'. Together they form a unique fingerprint.

Cite this