### Abstract

error-rate estimates, leaving the latent transition estimates unbiased. In these

cases, measurement error modeling already accounts for linkage error. Our

results also indicate where these conditions break down and more complex

methods would be needed.

Original language | English |
---|---|

Journal | Journal of Survey Statistics and Methodology |

Publication status | Published - 31 May 2019 |

### Fingerprint

### Keywords

- linkage error
- classification error
- measurement error
- latent class model (LCM)
- hidden Markov model (HMM)
- misclassification

### Cite this

}

**How linkage error affects hidden Markov models: a sensitivity analysis.** / Pankowska, P.K.P.; Bakker, Bart F.M.; Oberski, D.L.; Pavlopoulos, D.

Research output: Contribution to Journal › Article › Academic › peer-review

TY - JOUR

T1 - How linkage error affects hidden Markov models: a sensitivity analysis

AU - Pankowska, P.K.P.

AU - Bakker, Bart F.M.

AU - Oberski, D.L.

AU - Pavlopoulos, D.

PY - 2019/5/31

Y1 - 2019/5/31

N2 - Latent class models (LCM) are increasingly used to estimate and correct for classification error in categorical data, without the need for a “gold standard”, error-free, data source. To accomplish this, LCMs require multiple indicators of the same phenomenon within one data collection wave – “latent structure model” – or multiple observations over time on a single indicator – “hidden Markov model (HMM) ” – and assume that the errors in these indicators are conditionally independent. Unfortunately, this “local independence” assumption is often unrealistic, untestable, and a source of serious bias. Linking independent data sources can solve this problem by making the local independence assumption plausible across sources, while potentially allowing for local dependence within sources. However, record linkage introduces a new problem: the records may be erroneously linked. In this paper we investigate the effects of linkage error on HMM estimates of employment contract types. Our data come from linking a labor force survey to administrative employer records; this linkage yields two indicators per time point that are plausibly conditionally independent. Our results indicate that false-negative linkage error (exclusion) turns out to be problematic only if it is large and highly correlated with the dependent variable. Moreover, under many conditions, false-positive linkage error (mislinkage) turns out to act as another source of misclassification that the HMM can absorb into theerror-rate estimates, leaving the latent transition estimates unbiased. In thesecases, measurement error modeling already accounts for linkage error. Ourresults also indicate where these conditions break down and more complexmethods would be needed.

AB - Latent class models (LCM) are increasingly used to estimate and correct for classification error in categorical data, without the need for a “gold standard”, error-free, data source. To accomplish this, LCMs require multiple indicators of the same phenomenon within one data collection wave – “latent structure model” – or multiple observations over time on a single indicator – “hidden Markov model (HMM) ” – and assume that the errors in these indicators are conditionally independent. Unfortunately, this “local independence” assumption is often unrealistic, untestable, and a source of serious bias. Linking independent data sources can solve this problem by making the local independence assumption plausible across sources, while potentially allowing for local dependence within sources. However, record linkage introduces a new problem: the records may be erroneously linked. In this paper we investigate the effects of linkage error on HMM estimates of employment contract types. Our data come from linking a labor force survey to administrative employer records; this linkage yields two indicators per time point that are plausibly conditionally independent. Our results indicate that false-negative linkage error (exclusion) turns out to be problematic only if it is large and highly correlated with the dependent variable. Moreover, under many conditions, false-positive linkage error (mislinkage) turns out to act as another source of misclassification that the HMM can absorb into theerror-rate estimates, leaving the latent transition estimates unbiased. In thesecases, measurement error modeling already accounts for linkage error. Ourresults also indicate where these conditions break down and more complexmethods would be needed.

KW - linkage error

KW - classification error

KW - measurement error

KW - latent class model (LCM)

KW - hidden Markov model (HMM)

KW - misclassification

M3 - Article

JO - Journal of Survey Statistics and Methodology

JF - Journal of Survey Statistics and Methodology

SN - 2325-0984

ER -