TY - JOUR
T1 - Low-Dimensional Perturb-and-MAP Approach for Learning Restricted Boltzmann Machines
AU - Tomczak, Jakub M.
AU - Zaręba, Szymon
AU - Ravanbakhsh, Siamak
AU - Greiner, Russell
PY - 2019/10/1
Y1 - 2019/10/1
N2 - This paper introduces a new approach to maximum likelihood learning of the parameters of a restricted Boltzmann machine (RBM). The proposed method is based on the Perturb-and-MAP (PM) paradigm that enables sampling from the Gibbs distribution. PM is a two step process: (i) perturb the model using Gumbel perturbations, then (ii) find the maximum a posteriori (MAP) assignment of the perturbed model. We show that under certain conditions the resulting MAP configuration of the perturbed model is an unbiased sample from the original distribution. However, this approach requires an exponential number of perturbations, which is computationally intractable. Here, we apply an approximate approach based on the first order (low-dimensional) PM to calculate the gradient of the log-likelihood in binary RBM. Our approach relies on optimizing the energy function with respect to observable and hidden variables using a greedy procedure. First, for each variable we determine whether flipping this value will decrease the energy, and then we utilize the new local maximum to approximate the gradient. Moreover, we show that in some cases our approach works better than the standard coordinate-descent procedure for finding the MAP assignment and compare it with the Contrastive Divergence algorithm. We investigate the quality of our approach empirically, first on toy problems, then on various image datasets and a text dataset.
AB - This paper introduces a new approach to maximum likelihood learning of the parameters of a restricted Boltzmann machine (RBM). The proposed method is based on the Perturb-and-MAP (PM) paradigm that enables sampling from the Gibbs distribution. PM is a two step process: (i) perturb the model using Gumbel perturbations, then (ii) find the maximum a posteriori (MAP) assignment of the perturbed model. We show that under certain conditions the resulting MAP configuration of the perturbed model is an unbiased sample from the original distribution. However, this approach requires an exponential number of perturbations, which is computationally intractable. Here, we apply an approximate approach based on the first order (low-dimensional) PM to calculate the gradient of the log-likelihood in binary RBM. Our approach relies on optimizing the energy function with respect to observable and hidden variables using a greedy procedure. First, for each variable we determine whether flipping this value will decrease the energy, and then we utilize the new local maximum to approximate the gradient. Moreover, we show that in some cases our approach works better than the standard coordinate-descent procedure for finding the MAP assignment and compare it with the Contrastive Divergence algorithm. We investigate the quality of our approach empirically, first on toy problems, then on various image datasets and a text dataset.
KW - Greedy optimization
KW - Gumbel perturbation
KW - Restricted Boltzmann machine
KW - Unsupervised deep learning
UR - http://www.scopus.com/inward/record.url?scp=85054498287&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054498287&partnerID=8YFLogxK
U2 - 10.1007/s11063-018-9923-4
DO - 10.1007/s11063-018-9923-4
M3 - Article
AN - SCOPUS:85054498287
SN - 1370-4621
VL - 50
SP - 1401
EP - 1419
JO - Neural Processing Letters
JF - Neural Processing Letters
IS - 2
ER -