TY - GEN
T1 - Bootstrapping LPs in Value Iteration for Multi-Objective and Partially Observable MDPs
AU - Roijers, Diederik M.
AU - Walraven, Erwin
AU - Spaan, Matthijs T.J.
PY - 2018
Y1 - 2018
N2 - Iteratively solving a set of linear programs (LPs) is a common strategy for solving various decision-making problems in Artificial Intelligence, such as planning in multi-objective or partially observable Markov Decision Processes (MDPs). A prevalent feature is that the solutions to these LPs become increasingly similar as the solving algorithm converges, because the solution computed by the algorithm approaches the fixed point of a Bellman backup operator. In this paper, we propose to speed up the solving process of these LPs by bootstrapping based on similar LPs solved previously. We use these LPs to initialize a subset of relevant LP constraints, before iteratively generating the remaining constraints. The resulting algorithm is the first to consider such information sharing across iterations. We evaluate our approach on planning in Multi-Objective MDPs (MOMDPs) and Partially Observable MDPs (POMDPs), showing that it solves fewer LPs than the state of the art, which leads to a significant speed-up. Moreover, for MOMDPs we show that our method scales better in both the number of states and the number of objectives, which is vital for multi-objective planning.
AB - Iteratively solving a set of linear programs (LPs) is a common strategy for solving various decision-making problems in Artificial Intelligence, such as planning in multi-objective or partially observable Markov Decision Processes (MDPs). A prevalent feature is that the solutions to these LPs become increasingly similar as the solving algorithm converges, because the solution computed by the algorithm approaches the fixed point of a Bellman backup operator. In this paper, we propose to speed up the solving process of these LPs by bootstrapping based on similar LPs solved previously. We use these LPs to initialize a subset of relevant LP constraints, before iteratively generating the remaining constraints. The resulting algorithm is the first to consider such information sharing across iterations. We evaluate our approach on planning in Multi-Objective MDPs (MOMDPs) and Partially Observable MDPs (POMDPs), showing that it solves fewer LPs than the state of the art, which leads to a significant speed-up. Moreover, for MOMDPs we show that our method scales better in both the number of states and the number of objectives, which is vital for multi-objective planning.
UR - http://www.scopus.com/inward/record.url?scp=85055005643&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055005643&partnerID=8YFLogxK
UR - https://aaai.org/Library/ICAPS/icaps18contents.php
M3 - Conference contribution
AN - SCOPUS:85055005643
SN - 9781577357971
T3 - AAAI Press Proceedings
SP - 218
EP - 226
BT - Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling
A2 - de Weerdt, Mathijs
A2 - Koenig, Sven
A2 - Röger, Gabriele
A2 - Spaan, Matthijs
PB - The AAAI Press
CY - Palo Alto, CA
T2 - 28th International Conference on Automated Planning and Scheduling, ICAPS 2018
Y2 - 24 June 2018 through 29 June 2018
ER -