Planning for potential: efficient safe reinforcement learning

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Deep reinforcement learning (DRL) has shown remarkable success in artificial domains and in some real-world applications. However, substantial challenges remain such as learning efficiently under safety constraints. Adherence to safety constraints is a hard requirement in many high-impact application domains such as healthcare and finance. These constraints are preferably represented symbolically to ensure clear semantics at a suitable level of abstraction. Existing approaches to safe DRL assume that being unsafe leads to low rewards. We show that this is a special case of symbolically constrained RL and analyze a generic setting in which total reward and being safe may or may not be correlated. We analyze the impact of symbolic constraints and identify a connection between expected future reward and distance towards a goal in an automaton representation of the constraints. We use this connection in an algorithm for learning complex behaviors safely and efficiently. This algorithm relies on symbolic reasoning over safety constraints to improve the efficiency of a subsymbolic learner with a symbolically obtained measure of progress. We measure sample efficiency on a grid world and a conversational product recommender with real-world constraints. The so-called Planning for Potential algorithm converges quickly and significantly outperforms all baselines. Specifically, we find that symbolic reasoning is necessary for safety during and after learning and can be effectively used to guide a neural learner towards promising areas of the solution space. We conclude that RL can be applied both safely and efficiently when combined with symbolic reasoning.

Original languageEnglish
Pages (from-to)2255-2274
Number of pages20
JournalMachine Learning
Volume111
Issue number6
Early online date23 Mar 2022
DOIs
Publication statusPublished - Jun 2022

Bibliographical note

Funding Information:
This study was funded by ING Bank N.V.

Publisher Copyright:
© 2022, The Author(s).

Keywords

  • Reinforcement learning
  • Reward shaping
  • Safety constraints
  • Symbolic planning

Fingerprint

Dive into the research topics of 'Planning for potential: efficient safe reinforcement learning'. Together they form a unique fingerprint.

Cite this