Skip to main navigation Skip to search Skip to main content

Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems

  • Eugenio Bargiacchi
  • , Timothy Verstraeten
  • , D.M. Roijers
  • , Ann Nowé
  • , Hado Van Hasselt

Research output: Contribution to JournalArticleAcademicpeer-review

2 Downloads (Pure)

Abstract

Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings, i.e., conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents. We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings, including learning control policies for wind farms.
Original languageEnglish
Pages (from-to)482-490
Number of pages9
JournalProceedings of Machine Learning Research
Volume80
Early online dateJul 2018
Publication statusPublished - 2018
EventInternational Conference on Machine Learning - Stockholmsmässan, Stockholm, Sweden
Duration: 9 Jul 2018 → …
Conference number: 35
https://icml.cc/

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 17 - Partnerships for the Goals
    SDG 17 Partnerships for the Goals

Keywords

  • Multi-agent
  • reinforcement learning
  • Coordination
  • Wind Energy

Fingerprint

Dive into the research topics of 'Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems'. Together they form a unique fingerprint.

Cite this