''Give Me an Example Like This'': Episodic Active Reinforcement Learning from Demonstrations

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Reinforcement Learning (RL) has achieved great success in sequential decision-making problems but often requires extensive agent-environment interactions. To improve sample efficiency, methods like Reinforcement Learning from Expert Demonstrations (RLED) incorporate external expert demonstrations to aid agent exploration during the learning process. However, these demonstrations, typically collected from human users, are costly and thus often limited in quantity. Therefore, how to select the optimal set of human demonstrations that most effectively aids learning becomes a critical concern. This paper introduces EARLY (Episodic Active Learning from demonstration querY), an algorithm designed to enable a learning agent to generate optimized queries for expert demonstrations in a trajectory-based feature space. EARLY employs a trajectory-level estimate of uncertainty in the agent's current policy to determine the optimal timing and content for feature-based queries. By querying episodic demonstrations instead of isolated state-action pairs, EARLY enhances the human teaching experience and achieves better learning performance. We validate the effectiveness of our method across three simulated navigation tasks of increasing difficulty. Results indicate that our method achieves expert-level performance in all three tasks, converging over 50% faster than other four baseline methods when demonstrations are generated by simulated oracle policies. A follow-up pilot user study (N = 18) further supports that our method maintains significantly better convergence with human expert demonstrators, while also providing a better user experience in terms of perceived task load and requiring significantly less human time.

Original languageEnglish
Title of host publicationHAI '24
Subtitle of host publicationProceedings of the 12th International Conference on Human-Agent Interaction
PublisherAssociation for Computing Machinery, Inc
Pages287-295
Number of pages9
ISBN (Electronic)9798400711787
DOIs
Publication statusPublished - 2024
Event12th International Conference on Human-Agent Interaction, HAI 2024 - Swansea, United Kingdom
Duration: 24 Nov 202427 Nov 2024

Conference

Conference12th International Conference on Human-Agent Interaction, HAI 2024
Country/TerritoryUnited Kingdom
CitySwansea
Period24/11/2427/11/24

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Keywords

  • active reinforcement learning
  • human-agent interaction
  • human-in-the-loop machine learning
  • learning from demonstrations

Fingerprint

Dive into the research topics of '''Give Me an Example Like This'': Episodic Active Reinforcement Learning from Demonstrations'. Together they form a unique fingerprint.

Cite this