How linkage error affects hidden Markov models: a sensitivity analysis

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Latent class models (LCM) are increasingly used to estimate and correct for classification error in categorical data, without the need for a “gold standard”, error-free, data source. To accomplish this, LCMs require multiple indicators of the same phenomenon within one data collection wave – “latent structure model” – or multiple observations over time on a single indicator – “hidden Markov model (HMM) ” – and assume that the errors in these indicators are conditionally independent. Unfortunately, this “local independence” assumption is often unrealistic, untestable, and a source of serious bias. Linking independent data sources can solve this problem by making the local independence assumption plausible across sources, while potentially allowing for local dependence within sources. However, record linkage introduces a new problem: the records may be erroneously linked. In this paper we investigate the effects of linkage error on HMM estimates of employment contract types. Our data come from linking a labor force survey to administrative employer records; this linkage yields two indicators per time point that are plausibly conditionally independent. Our results indicate that false-negative linkage error (exclusion) turns out to be problematic only if it is large and highly correlated with the dependent variable. Moreover, under many conditions, false-positive linkage error (mislinkage) turns out to act as another source of misclassification that the HMM can absorb into the
error-rate estimates, leaving the latent transition estimates unbiased. In these
cases, measurement error modeling already accounts for linkage error. Our
results also indicate where these conditions break down and more complex
methods would be needed.
Original languageEnglish
JournalJournal of Survey Statistics and Methodology
Publication statusPublished - 31 May 2019

Fingerprint

Hidden Markov models
Linkage
Markov Model
Sensitivity analysis
Sensitivity Analysis
Local Independence
Estimate
Linking
Record Linkage
Latent Class Model
Misclassification
Nominal or categorical data
Standard error
employment contract
False Positive
Measurement Error
Gold
labor statistics
Breakdown
Model structures

Keywords

  • linkage error
  • classification error
  • measurement error
  • latent class model (LCM)
  • hidden Markov model (HMM)
  • misclassification

Cite this

@article{8475ae76f9da4e739af197f01c9ecfeb,
title = "How linkage error affects hidden Markov models: a sensitivity analysis",
abstract = "Latent class models (LCM) are increasingly used to estimate and correct for classification error in categorical data, without the need for a “gold standard”, error-free, data source. To accomplish this, LCMs require multiple indicators of the same phenomenon within one data collection wave – “latent structure model” – or multiple observations over time on a single indicator – “hidden Markov model (HMM) ” – and assume that the errors in these indicators are conditionally independent. Unfortunately, this “local independence” assumption is often unrealistic, untestable, and a source of serious bias. Linking independent data sources can solve this problem by making the local independence assumption plausible across sources, while potentially allowing for local dependence within sources. However, record linkage introduces a new problem: the records may be erroneously linked. In this paper we investigate the effects of linkage error on HMM estimates of employment contract types. Our data come from linking a labor force survey to administrative employer records; this linkage yields two indicators per time point that are plausibly conditionally independent. Our results indicate that false-negative linkage error (exclusion) turns out to be problematic only if it is large and highly correlated with the dependent variable. Moreover, under many conditions, false-positive linkage error (mislinkage) turns out to act as another source of misclassification that the HMM can absorb into theerror-rate estimates, leaving the latent transition estimates unbiased. In thesecases, measurement error modeling already accounts for linkage error. Ourresults also indicate where these conditions break down and more complexmethods would be needed.",
keywords = "linkage error, classification error, measurement error, latent class model (LCM), hidden Markov model (HMM), misclassification",
author = "P.K.P. Pankowska and Bakker, {Bart F.M.} and D.L. Oberski and D. Pavlopoulos",
year = "2019",
month = "5",
day = "31",
language = "English",
journal = "Journal of Survey Statistics and Methodology",
issn = "2325-0984",
publisher = "Oxford University Press",

}

How linkage error affects hidden Markov models: a sensitivity analysis. / Pankowska, P.K.P.; Bakker, Bart F.M.; Oberski, D.L.; Pavlopoulos, D.

In: Journal of Survey Statistics and Methodology, 31.05.2019.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - How linkage error affects hidden Markov models: a sensitivity analysis

AU - Pankowska, P.K.P.

AU - Bakker, Bart F.M.

AU - Oberski, D.L.

AU - Pavlopoulos, D.

PY - 2019/5/31

Y1 - 2019/5/31

N2 - Latent class models (LCM) are increasingly used to estimate and correct for classification error in categorical data, without the need for a “gold standard”, error-free, data source. To accomplish this, LCMs require multiple indicators of the same phenomenon within one data collection wave – “latent structure model” – or multiple observations over time on a single indicator – “hidden Markov model (HMM) ” – and assume that the errors in these indicators are conditionally independent. Unfortunately, this “local independence” assumption is often unrealistic, untestable, and a source of serious bias. Linking independent data sources can solve this problem by making the local independence assumption plausible across sources, while potentially allowing for local dependence within sources. However, record linkage introduces a new problem: the records may be erroneously linked. In this paper we investigate the effects of linkage error on HMM estimates of employment contract types. Our data come from linking a labor force survey to administrative employer records; this linkage yields two indicators per time point that are plausibly conditionally independent. Our results indicate that false-negative linkage error (exclusion) turns out to be problematic only if it is large and highly correlated with the dependent variable. Moreover, under many conditions, false-positive linkage error (mislinkage) turns out to act as another source of misclassification that the HMM can absorb into theerror-rate estimates, leaving the latent transition estimates unbiased. In thesecases, measurement error modeling already accounts for linkage error. Ourresults also indicate where these conditions break down and more complexmethods would be needed.

AB - Latent class models (LCM) are increasingly used to estimate and correct for classification error in categorical data, without the need for a “gold standard”, error-free, data source. To accomplish this, LCMs require multiple indicators of the same phenomenon within one data collection wave – “latent structure model” – or multiple observations over time on a single indicator – “hidden Markov model (HMM) ” – and assume that the errors in these indicators are conditionally independent. Unfortunately, this “local independence” assumption is often unrealistic, untestable, and a source of serious bias. Linking independent data sources can solve this problem by making the local independence assumption plausible across sources, while potentially allowing for local dependence within sources. However, record linkage introduces a new problem: the records may be erroneously linked. In this paper we investigate the effects of linkage error on HMM estimates of employment contract types. Our data come from linking a labor force survey to administrative employer records; this linkage yields two indicators per time point that are plausibly conditionally independent. Our results indicate that false-negative linkage error (exclusion) turns out to be problematic only if it is large and highly correlated with the dependent variable. Moreover, under many conditions, false-positive linkage error (mislinkage) turns out to act as another source of misclassification that the HMM can absorb into theerror-rate estimates, leaving the latent transition estimates unbiased. In thesecases, measurement error modeling already accounts for linkage error. Ourresults also indicate where these conditions break down and more complexmethods would be needed.

KW - linkage error

KW - classification error

KW - measurement error

KW - latent class model (LCM)

KW - hidden Markov model (HMM)

KW - misclassification

M3 - Article

JO - Journal of Survey Statistics and Methodology

JF - Journal of Survey Statistics and Methodology

SN - 2325-0984

ER -