Abstract
To deal with situations not specifically designed for (“unknown unknowns”), self-adaptive systems need
to learn during operation the best – or at least good enough – action to perform in each context they
reside in. While different methods for performing online learning have been proposed so far (e.g. Qlearning [1], genetic algorithms [2]), they typically rely on specific knowledge of the system and therefore
are hard to reuse across systems and require tuning per system. Additionally, they do not always provide
a bound on their performance to advise usage. In this thesis, we propose using a family of reinforcement
learning algorithms called multi-armed bandits (MAB) for online learning within self-adaptive systems.
In particular, we identify the facets of a self-adaptive system relevant to the usage of MAB policies. These
include the number of adaptation choices, the presence and nature of context changes in the environment
and evaluative functions among others. Further, we explore through a set of systematic experiments the
effect that differences in these facets have on the performance of MAB policies and by extension the
potential performance of a system in achieving its adaptation goals. We do so through two self-adaptive
system exemplars, the Emergent Web Server and the Simulator of Web Infrastructure and Management.
We contribute a Python library for MAB policies which is nonspecific to any exemplar along with the
simple interfaces for its usage with the exemplars. This software serves as the tools with which to enact
the ARHC process proposed. Our results indicate that usage of MAB policies with self-adaptive systems
is not only viable with generality, but can also approximate provided offline solutions. The latter being
assessed through direct comparison to solutions provided by the original authors of the systems with a
realistic usage scenario.
to learn during operation the best – or at least good enough – action to perform in each context they
reside in. While different methods for performing online learning have been proposed so far (e.g. Qlearning [1], genetic algorithms [2]), they typically rely on specific knowledge of the system and therefore
are hard to reuse across systems and require tuning per system. Additionally, they do not always provide
a bound on their performance to advise usage. In this thesis, we propose using a family of reinforcement
learning algorithms called multi-armed bandits (MAB) for online learning within self-adaptive systems.
In particular, we identify the facets of a self-adaptive system relevant to the usage of MAB policies. These
include the number of adaptation choices, the presence and nature of context changes in the environment
and evaluative functions among others. Further, we explore through a set of systematic experiments the
effect that differences in these facets have on the performance of MAB policies and by extension the
potential performance of a system in achieving its adaptation goals. We do so through two self-adaptive
system exemplars, the Emergent Web Server and the Simulator of Web Infrastructure and Management.
We contribute a Python library for MAB policies which is nonspecific to any exemplar along with the
simple interfaces for its usage with the exemplars. This software serves as the tools with which to enact
the ARHC process proposed. Our results indicate that usage of MAB policies with self-adaptive systems
is not only viable with generality, but can also approximate provided offline solutions. The latter being
assessed through direct comparison to solutions provided by the original authors of the systems with a
realistic usage scenario.
| Original language | English |
|---|---|
| Place of Publication | scripties.uba.uva.nl/ |
| Publisher | University of Amsterdam |
| Media of output | Online |
| Publication status | Published - 31 Mar 2022 |
Fingerprint
Dive into the research topics of 'Adapting with Regret: Using Multi-armed Bandits with Self-adaptive Systems'. Together they form a unique fingerprint.Prizes
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver