Abstract
Reinforcement Learning (RL) has achieved great success in sequential decision-making problems but often requires extensive agent-environment interactions. To improve sample efficiency, methods like Reinforcement Learning from Expert Demonstrations (RLED) incorporate external expert demonstrations to aid agent exploration during the learning process. However, these demonstrations, typically collected from human users, are costly and thus often limited in quantity. Therefore, how to select the optimal set of human demonstrations that most effectively aids learning becomes a critical concern. This paper introduces EARLY (Episodic Active Learning from demonstration querY), an algorithm designed to enable a learning agent to generate optimized queries for expert demonstrations in a trajectory-based feature space. EARLY employs a trajectory-level estimate of uncertainty in the agent's current policy to determine the optimal timing and content for feature-based queries. By querying episodic demonstrations instead of isolated state-action pairs, EARLY enhances the human teaching experience and achieves better learning performance. We validate the effectiveness of our method across three simulated navigation tasks of increasing difficulty. Results indicate that our method achieves expert-level performance in all three tasks, converging over 50% faster than other four baseline methods when demonstrations are generated by simulated oracle policies. A follow-up pilot user study (N = 18) further supports that our method maintains significantly better convergence with human expert demonstrators, while also providing a better user experience in terms of perceived task load and requiring significantly less human time.
Original language | English |
---|---|
Title of host publication | HAI '24 |
Subtitle of host publication | Proceedings of the 12th International Conference on Human-Agent Interaction |
Publisher | Association for Computing Machinery, Inc |
Pages | 287-295 |
Number of pages | 9 |
ISBN (Electronic) | 9798400711787 |
DOIs | |
Publication status | Published - 2024 |
Event | 12th International Conference on Human-Agent Interaction, HAI 2024 - Swansea, United Kingdom Duration: 24 Nov 2024 → 27 Nov 2024 |
Conference
Conference | 12th International Conference on Human-Agent Interaction, HAI 2024 |
---|---|
Country/Territory | United Kingdom |
City | Swansea |
Period | 24/11/24 → 27/11/24 |
Bibliographical note
Publisher Copyright:© 2024 Owner/Author.
Keywords
- active reinforcement learning
- human-agent interaction
- human-in-the-loop machine learning
- learning from demonstrations