Abstract
Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, ‘co-data’, may improve results. Examples are known variable groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to ‘do-it-yourself’: we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.
| Original language | English |
|---|---|
| Pages (from-to) | 271-283 |
| Number of pages | 13 |
| Journal | International Journal of Biostatistics |
| Volume | 21 |
| Issue number | 2 |
| Early online date | 8 Sept 2025 |
| DOIs | |
| Publication status | Published - Nov 2025 |
Fingerprint
Dive into the research topics of 'Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver