TY - JOUR
T1 - Validated inference of smoking habits from blood with a finite DNA methylation marker set
AU - Maas, Silvana C E
AU - Vidaki, Athina
AU - Wilson, Rory
AU - Teumer, Alexander
AU - Liu, Fan
AU - van Meurs, Joyce B J
AU - Uitterlinden, André G
AU - Boomsma, Dorret I
AU - de Geus, Eco J C
AU - Willemsen, Gonneke
AU - van Dongen, Jenny
AU - van der Kallen, Carla J H
AU - Slagboom, P Eline
AU - Beekman, Marian
AU - van Heemst, Diana
AU - van den Berg, Leonard H
AU - Duijts, Liesbeth
AU - Jaddoe, Vincent W V
AU - Ladwig, Karl-Heinz
AU - Kunze, Sonja
AU - Peters, Annette
AU - Ikram, M Arfan
AU - Grabe, Hans J
AU - Felix, Janine F
AU - Waldenberger, Melanie
AU - Franco, Oscar H
AU - Ghanbari, Mohsen
AU - Kayser, Manfred
AU - BIOS Consortium
PY - 2019/11
Y1 - 2019/11
N2 - Inferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications. However, a finite DNA methylation marker set and a validated statistical model based on a large dataset are not yet available. Employing 14 epigenome-wide association studies for marker discovery, and using data from six population-based cohorts (N = 3764) for model building, we identified 13 CpGs most suitable for inferring smoking versus non-smoking status from blood with a cumulative Area Under the Curve (AUC) of 0.901. Internal fivefold cross-validation yielded an average AUC of 0.897 ± 0.137, while external model validation in an independent population-based cohort (N = 1608) achieved an AUC of 0.911. These 13 CpGs also provided accurate inference of current (average AUCcrossvalidation 0.925 ± 0.021, AUCexternalvalidation0.914), former (0.766 ± 0.023, 0.699) and never smoking (0.830 ± 0.019, 0.781) status, allowed inferring pack-years in current smokers (10 pack-years 0.800 ± 0.068, 0.796; 15 pack-years 0.767 ± 0.102, 0.752) and inferring smoking cessation time in former smokers (5 years 0.774 ± 0.024, 0.760; 10 years 0.766 ± 0.033, 0.764; 15 years 0.767 ± 0.020, 0.754). Model application to children revealed highly accurate inference of the true non-smoking status (6 years of age: accuracy 0.994, N = 355; 10 years: 0.994, N = 309), suggesting prenatal and passive smoking exposure having no impact on model applications in adults. The finite set of DNA methylation markers allow accurate inference of smoking habit, with comparable accuracy as plasma cotinine use, and smoking history from blood, which we envision becoming useful in epidemiology and public health research, and in medical and forensic applications.
AB - Inferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications. However, a finite DNA methylation marker set and a validated statistical model based on a large dataset are not yet available. Employing 14 epigenome-wide association studies for marker discovery, and using data from six population-based cohorts (N = 3764) for model building, we identified 13 CpGs most suitable for inferring smoking versus non-smoking status from blood with a cumulative Area Under the Curve (AUC) of 0.901. Internal fivefold cross-validation yielded an average AUC of 0.897 ± 0.137, while external model validation in an independent population-based cohort (N = 1608) achieved an AUC of 0.911. These 13 CpGs also provided accurate inference of current (average AUCcrossvalidation 0.925 ± 0.021, AUCexternalvalidation0.914), former (0.766 ± 0.023, 0.699) and never smoking (0.830 ± 0.019, 0.781) status, allowed inferring pack-years in current smokers (10 pack-years 0.800 ± 0.068, 0.796; 15 pack-years 0.767 ± 0.102, 0.752) and inferring smoking cessation time in former smokers (5 years 0.774 ± 0.024, 0.760; 10 years 0.766 ± 0.033, 0.764; 15 years 0.767 ± 0.020, 0.754). Model application to children revealed highly accurate inference of the true non-smoking status (6 years of age: accuracy 0.994, N = 355; 10 years: 0.994, N = 309), suggesting prenatal and passive smoking exposure having no impact on model applications in adults. The finite set of DNA methylation markers allow accurate inference of smoking habit, with comparable accuracy as plasma cotinine use, and smoking history from blood, which we envision becoming useful in epidemiology and public health research, and in medical and forensic applications.
UR - http://www.scopus.com/inward/record.url?scp=85071837008&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071837008&partnerID=8YFLogxK
U2 - 10.1007/s10654-019-00555-w
DO - 10.1007/s10654-019-00555-w
M3 - Article
C2 - 31494793
SN - 0393-2990
VL - 34
SP - 1055
EP - 1074
JO - European Journal of Epidemiology
JF - European Journal of Epidemiology
IS - 11
ER -