Abstract
Policy gradient methods are successful for a wide range of reinforcement learning tasks. Traditionally, such methods utilize the score function as stochastic gradient estimator. We investigate the effect of replacing the score function with a measure-valued derivative within an on-policy actor-critic algorithm. The hypothesis is that measure-valued derivatives reduce the need for score function variance reduction techniques that are common in policy gradient algorithms. We adapt the actor-critic to measure-valued derivatives and develop a novel algorithm. This method keeps the computational complexity of the measure-valued derivative within bounds by using a parameterized state-value function approximation. We show empirically that measure-valued derivatives have comparable performance to score functions on the environments Pendulum and MountainCar. The empirical results of this study suggest that measure-valued derivatives can serve as low-variance alternative to score functions in on-policy actor-critic and indeed reduce the need for variance reduction techniques.
| Original language | English |
|---|---|
| Title of host publication | 2022 Winter Simulation Conference (WSC) |
| Subtitle of host publication | [Proceedings] |
| Editors | B. Feng, G. Pedrielli, Y. Peng, S. Shashaani, E. Song, C.G. Corlu, L.H. Lee, E.P. Chew, T. Roeder, P. Lendermann |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 2736-2747 |
| Number of pages | 12 |
| ISBN (Electronic) | 9798350309713 |
| ISBN (Print) | 9781665476621 |
| DOIs | |
| Publication status | Published - 2022 |
| Event | 2022 Winter Simulation Conference, WSC 2022 - Guilin, China Duration: 11 Dec 2022 → 14 Dec 2022 |
Publication series
| Name | Proceedings - Winter Simulation Conference |
|---|---|
| Volume | 2022-December |
| ISSN (Print) | 0891-7736 |
Conference
| Conference | 2022 Winter Simulation Conference, WSC 2022 |
|---|---|
| Country/Territory | China |
| City | Guilin |
| Period | 11/12/22 → 14/12/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Fingerprint
Dive into the research topics of 'Analysis of Measure-Valued Derivatives in a Reinforcement Learning Actor-Critic Framework'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver