Skip to main navigation Skip to search Skip to main content

A Pseudo-Gradient Approach for Model-Free Markov Chain Optimization

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

In this paper, we develop a first-order (pseudo-)gradient approach for optimizing functions over the stationary distribution of discrete-time Markov chains (DTMCs). We give insights into why solving this optimization problem is challenging and show how transformations can be used to circumvent the hard constraints inherent in the optimization problem. The optimization framework is model-free since no explicit model of the interdependence of the row elements of the Markov chain transition matrix is required. Upon the transformation we build an extension of simultaneous perturbation stochastic approximation (SPSA) algorithm, called stochastic matrix SPSA (SM-SPSA) to solve the optimization problem. The performance of the SM-SPSA gradient search is compared with a benchmark commercial solver. Numerical examples show that SM-SPSA scales better which makes it the preferred solution method for large problem instances. We also apply the algorithm to the maximization of web-page rankings in web-graphs based on a real-life data set. As we explain in the paper, when applying a first-order gradient search one typically encounters a phenomenon which we call “inflection points,” that is, jumps in the optimization trajectories between periods of almost stationary behavior that slow down the optimization. We propose a heuristic for avoiding such inflection points and present a metastudy on a wide range of networks showing the positive effect of our heuristic on the convergence properties of SM-SPSA gradient search.
Original languageEnglish
Article number2550038
JournalAsia-Pacific Journal of Operational Research
DOIs
Publication statusE-pub ahead of print - 8 Sept 2025

Keywords

  • SPSA
  • Markov chain
  • SM-SPSA
  • stationary distribution

Fingerprint

Dive into the research topics of 'A Pseudo-Gradient Approach for Model-Free Markov Chain Optimization'. Together they form a unique fingerprint.

Cite this