Abstract
Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.
| Original language | English |
|---|---|
| Title of host publication | Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings |
| Subtitle of host publication | 20th European Conference, EUMAS 2023, Naples, Italy, September 14–15, 2023, Proceedings |
| Editors | Vadim Malvone, Aniello Murano |
| Pages | 328–344 |
| Number of pages | 17 |
| ISBN (Electronic) | 978-3-031-43264-4 |
| DOIs | |
| Publication status | Published - 7 Sept 2023 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 14282 LNAI |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords
- multi-agent reinforcement learning
- reward machines
- automatic synthesis