ETRO-VUB Department of Electronics and Informatics

About ETRO | News | Events | Vacancies | Contact

ETRO Publications

Full Details


	Conference Publication


	Thompson Sampling for Factored Multi-Agent Bandits Host Publication: The 19th International Conference on Autonomous Agents and Multi-Agent Systems Authors: T. Verstraeten, E. Bargiacchi, P. Libin, D. M Roijers and A. Nowé Publisher: IFAAMAS Publication Date: May. 2020 Number of Pages: 2 Abstract: Multi-agent decision making is prevalent in many real-world applications, such as wind farm control [11], traffic light control [12] and warehouse commissioning [5]. In these settings, agents need to cooperate to maximize a shared team reward [3]. Coordination in multi-agent settings is challenging, due to the combinatorial increase in terms of the number of agents. Therefore, it is computationally intractable to consider all agents' actions jointly. Fortunately, in many real-world settings, an agent is only directly influenced by a small subset of neighboring agents. In this case, the team reward can be factorized over the groups of agents that influence each other. Such a sparse factorization must be exploited to keep multi-agent decision problems tractable. In this work, we consider learning to coordinate in multi-agent multi-armed bandit problems (Section 1), and propose the Multi-Agent Thompson Sampling (MATS) algorithm (Section 2), which exploits sparse interactions in multi-agent systems. We assume that the groups of interacting agents are known beforehand, which is often the case in real-world applications. Our method uses the exploration-exploitation mechanism of Thompson sampling (TS). TS has been shown to achieve high empirical performance [4]. Moreover, TS is a Bayesian method, which allows for the specification of prior knowledge through belief distributions. We argue that this is an important property to have in many practical applications, such as influenza mitigation [7, 8]. We compare MATS against Sparse Cooperative Q-Learning (SCQL) and Multi-Agent Upper Confidence Exploration (MAUCE) on two synthetic settings, i.e., Bernoulli 0101-Chain and Gem Mining (Section 3). MATS improves upon the state of the art with respect to accuracy, learning speed and computational speed.

	Other Reference Styles

	Full Details IEEE Style BibTex Style EndNote Style

Search ETRO Publications

Author:
Keyword:
Type:	Journals Conferences Books Reports Laymen Other


	Research - Contact person - IRIS - AVSP - LAMI	Education - Contact person - Thesis proposals - ETRO Courses	Industry - Contact person - Spin-offs - Know How	Publications - Journals - Conferences - Books	About ETRO - Vacancies - News - Events - Press	Contact ETRO Department Tel: +32 2 629 29 30


	©2024 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • Webmaster • Disclaimer