Learning to Behave: Reinforcement Learning in Human Contexts

Research output: PhD ThesisPhD-Thesis - Research and graduation internal

520 Downloads (Pure)

Abstract

Reinforcement learning (RL) has recently attracted significant attention with applications such as improving microchip designs, predicting the behaviour of protein structures and beating humanity’s best in the games of go, chess and Starcraft-II. These impressive and inspring successes show how RL can improve our lives, however, they have so far been seen mostly in settings that involve humans to a very limited extend. This thesis looks into the usage of RL in human contexts. First, we provide a systematic literature review of the usage of RL for personalisation, i.e. the adaptation of systems to individuals. Next, we show how RL can be used to personalise a conversational recommender system and find that it outperforms existing approaches, including a gold-standard and task-specific solutions in a simulation-based study. Since simulators may not be available for all conversational systems that could benefit from personalisation, we next look into the collection of user satisfaction ratings for dialogue data. We consolidate best practices in a UI for user satisfaction annotation and show that high-quality ratings can be obtained. Next, we look into the usage of RL for strategic workforce planning. Here, we find that RL is robust to the uncertainties that are an inherent part of this problem and that RL enables the specification of goals intuitive to domain experts. Having looked into these use-cases, we then turn toward the inclusion of safety constraints in RL. We propose how safety constraints from a medical guideline can be taken into account in an observational study on the optimisation of ventilator settings for ICU patients. Next, we look into safety constraints that contain a temporal component, we find that these may make the learning problem infeasible and propose a solution based on reward shaping to address this issue. Finally, we propose how RL can benefit from instructions that break a full task into smaller pieces based on the option framework and propose an approach for learning reusable behaviours from instructions to greatly reduce data requirements.
Original languageEnglish
QualificationPhD
Awarding Institution
  • Vrije Universiteit Amsterdam
Supervisors/Advisors
  • van Harmelen, Frank, Supervisor
  • Hoogendoorn, Mark, Supervisor
  • Francois Lavet, Vincent, Co-supervisor
Award date14 Nov 2023
Print ISBNs9789464834628
DOIs
Publication statusPublished - 14 Nov 2023

Keywords

  • reinforcement learning
  • artificial intelligence
  • linear temporal logic
  • shielding
  • strategic workforce planning
  • dialogue systems
  • personalization
  • safety

Fingerprint

Dive into the research topics of 'Learning to Behave: Reinforcement Learning in Human Contexts'. Together they form a unique fingerprint.

Cite this