Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems

Eugenio Bargiacchi*, Timothy Verstraeten, Diederik M. Roiiers, Ann Nowe, Hado Van Hasselt

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings,i.e.,conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents.We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings,including learning control policies for wind farms.

Original languageEnglish
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages810-818
Number of pages9
Volume2
ISBN (Electronic)9781510867963
Publication statusPublished - 1 Jan 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: 10 Jul 201815 Jul 2018

Conference

Conference35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period10/07/1815/07/18

Fingerprint

Dive into the research topics of 'Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems'. Together they form a unique fingerprint.

Cite this