Analysis of Measure-Valued Derivatives in a Reinforcement Learning Actor-Critic Framework

Kim Van Den Houten, Emile Van Krieken, Bernd Heidergott

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

13 Downloads (Pure)

Abstract

Policy gradient methods are successful for a wide range of reinforcement learning tasks. Traditionally, such methods utilize the score function as stochastic gradient estimator. We investigate the effect of replacing the score function with a measure-valued derivative within an on-policy actor-critic algorithm. The hypothesis is that measure-valued derivatives reduce the need for score function variance reduction techniques that are common in policy gradient algorithms. We adapt the actor-critic to measure-valued derivatives and develop a novel algorithm. This method keeps the computational complexity of the measure-valued derivative within bounds by using a parameterized state-value function approximation. We show empirically that measure-valued derivatives have comparable performance to score functions on the environments Pendulum and MountainCar. The empirical results of this study suggest that measure-valued derivatives can serve as low-variance alternative to score functions in on-policy actor-critic and indeed reduce the need for variance reduction techniques.

Original languageEnglish
Title of host publication2022 Winter Simulation Conference (WSC)
Subtitle of host publication[Proceedings]
EditorsB. Feng, G. Pedrielli, Y. Peng, S. Shashaani, E. Song, C.G. Corlu, L.H. Lee, E.P. Chew, T. Roeder, P. Lendermann
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2736-2747
Number of pages12
ISBN (Electronic)9798350309713
ISBN (Print)9781665476621
DOIs
Publication statusPublished - 2022
Event2022 Winter Simulation Conference, WSC 2022 - Guilin, China
Duration: 11 Dec 202214 Dec 2022

Publication series

NameProceedings - Winter Simulation Conference
Volume2022-December
ISSN (Print)0891-7736

Conference

Conference2022 Winter Simulation Conference, WSC 2022
Country/TerritoryChina
CityGuilin
Period11/12/2214/12/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Fingerprint

Dive into the research topics of 'Analysis of Measure-Valued Derivatives in a Reinforcement Learning Actor-Critic Framework'. Together they form a unique fingerprint.

Cite this