Abstract
Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings, i.e., conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents. We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings, including learning control policies for wind farms.
Original language | English |
---|---|
Title of host publication | International Conference on Machine Learning, 10-15 July 2018, Stockholmsmässan, Stockholm Sweden |
Subtitle of host publication | Proceedings of the 35th International Conference on Machine Learning, ICML 2018 |
Place of Publication | Stockholm |
Pages | 482-490 |
Number of pages | 9 |
Publication status | Published - 2018 |
Event | International Conference on Machine Learning - Stockholmsmässan, Stockholm, Sweden Duration: 9 Jul 2018 → … Conference number: 35 https://icml.cc/ |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Volume | 80 |
ISSN (Electronic) | 1938-7228 |
Conference
Conference | International Conference on Machine Learning |
---|---|
Abbreviated title | ICML |
Country/Territory | Sweden |
City | Stockholm |
Period | 9/07/18 → … |
Internet address |
Keywords
- Multi-agent
- reinforcement learning
- Coordination
- Wind Energy