Learning from a lot: Empirical Bayes for high-dimensional model-based prediction

Mark A. van de Wiel, Dennis E. Te Beest, Magnus M. Münch

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well-known model-based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co-data”. In particular, we present two novel examples that allow for co-data: first, a Bayesian spike-and-slab setting that facilitates inclusion of multiple co-data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.

Original languageEnglish
Pages (from-to)2-25
Number of pages24
JournalScandinavian Journal of Statistics
Volume46
Issue number1
Early online date1 Jun 2018
DOIs
Publication statusPublished - Mar 2019

Fingerprint

Empirical Bayes
High-dimensional
Empirical Bayes Method
Model-based
Empirical Bayes Estimator
Prediction
Penalized Regression
Marginal Likelihood
Ridge Regression
Bayesian Model
Prior Information
Bayes
Hybrid Approach
Discriminant Analysis
Spike
Cross-validation
Repository
Linear Model
Inclusion
Maximise

Keywords

  • co-data
  • empirical Bayes
  • marginal likelihood
  • prediction
  • variable selection

Cite this

van de Wiel, Mark A. ; Te Beest, Dennis E. ; Münch, Magnus M. / Learning from a lot : Empirical Bayes for high-dimensional model-based prediction. In: Scandinavian Journal of Statistics. 2019 ; Vol. 46, No. 1. pp. 2-25.
@article{38dee8087eb5441ea8724907910d40d6,
title = "Learning from a lot: Empirical Bayes for high-dimensional model-based prediction",
abstract = "Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well-known model-based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co-data”. In particular, we present two novel examples that allow for co-data: first, a Bayesian spike-and-slab setting that facilitates inclusion of multiple co-data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.",
keywords = "co-data, empirical Bayes, marginal likelihood, prediction, variable selection",
author = "{van de Wiel}, {Mark A.} and {Te Beest}, {Dennis E.} and M{\"u}nch, {Magnus M.}",
year = "2019",
month = "3",
doi = "10.1111/sjos.12335",
language = "English",
volume = "46",
pages = "2--25",
journal = "Scandinavian Journal of Statistics",
issn = "0303-6898",
publisher = "Wiley-Blackwell",
number = "1",

}

Learning from a lot : Empirical Bayes for high-dimensional model-based prediction. / van de Wiel, Mark A.; Te Beest, Dennis E.; Münch, Magnus M.

In: Scandinavian Journal of Statistics, Vol. 46, No. 1, 03.2019, p. 2-25.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Learning from a lot

T2 - Empirical Bayes for high-dimensional model-based prediction

AU - van de Wiel, Mark A.

AU - Te Beest, Dennis E.

AU - Münch, Magnus M.

PY - 2019/3

Y1 - 2019/3

N2 - Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well-known model-based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co-data”. In particular, we present two novel examples that allow for co-data: first, a Bayesian spike-and-slab setting that facilitates inclusion of multiple co-data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.

AB - Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well-known model-based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co-data”. In particular, we present two novel examples that allow for co-data: first, a Bayesian spike-and-slab setting that facilitates inclusion of multiple co-data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.

KW - co-data

KW - empirical Bayes

KW - marginal likelihood

KW - prediction

KW - variable selection

UR - http://www.scopus.com/inward/record.url?scp=85061092410&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061092410&partnerID=8YFLogxK

U2 - 10.1111/sjos.12335

DO - 10.1111/sjos.12335

M3 - Article

VL - 46

SP - 2

EP - 25

JO - Scandinavian Journal of Statistics

JF - Scandinavian Journal of Statistics

SN - 0303-6898

IS - 1

ER -