Prediction of heart failure 1 year before diagnosis in general practitioner patients using machine learning algorithms: a retrospective case-control study

Frank C. Bennis*, Mark Hoogendoorn, Claire Aussems, Joke C. Korevaar

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Objectives Heart failure (HF) is a commonly occurring health problem with high mortality and morbidity. If potential cases could be detected earlier, it may be possible to intervene earlier, which may slow progression in some patients. Preferably, it is desired to reuse already measured data for screening of all persons in an age group, such as general practitioner (GP) data. Furthermore, it is essential to evaluate the number of people needed to screen to find one patient using true incidence rates, as this indicates the generalisability in the true population. Therefore, we aim to create a machine learning model for the prediction of HF using GP data and evaluate the number needed to screen with true incidence rates. Design, settings and participants GP data from 8543 patients (-2 to -1 year before diagnosis) and controls aged 70+ years were obtained retrospectively from 01 January 2012 to 31 December 2019 from the Nivel Primary Care Database. Codes about chronic illness, complaints, diagnostics and medication were obtained. Data were split in a train/test set. Datasets describing demographics, the presence of codes (non-sequential) and upon each other following codes (sequential) were created. Logistic regression, random forest and XGBoost models were trained. Predicted outcome was the presence of HF after 1 year. The ratio case:control in the test set matched true incidence rates (1:45). Results Sole demographics performed average (area under the curve (AUC) 0.692, CI 0.677 to 0.706). Adding non-sequential information combined with a logistic regression model performed best and significantly improved performance (AUC 0.772, CI 0.759 to 0.785, p<0.001). Further adding sequential information did not alter performance significantly (AUC 0.767, CI 0.754 to 0.780, p=0.07). The number needed to screen dropped from 14.11 to 5.99 false positives per true positive. Conclusion This study created a model able to identify patients with pending HF a year before diagnosis.

Original languageEnglish
Article number060458
Pages (from-to)1-12
Number of pages12
JournalBMJ Open
Volume12
Issue number8
DOIs
Publication statusPublished - 30 Aug 2022

Bibliographical note

Funding Information:
FCB was partly supported by the Netherlands Cardiovascular Research Initiative (CVON) (CVON2017-15 RESCUED) for a different, but related project. This research itself received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Publisher Copyright:
© Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Funding

FCB was partly supported by the Netherlands Cardiovascular Research Initiative (CVON) (CVON2017-15 RESCUED) for a different, but related project. This research itself received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Keywords

  • heart failure
  • preventive medicine
  • public health

Fingerprint

Dive into the research topics of 'Prediction of heart failure 1 year before diagnosis in general practitioner patients using machine learning algorithms: a retrospective case-control study'. Together they form a unique fingerprint.

Cite this