Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer

Mark Hoogendoorn, Peter Szolovits, Leon M G Moons, Mattijs E. Numans

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Objective Machine learning techniques can be used to extract predictive models for diseases from electronic medical records (EMRs). However, the nature of EMRs makes it difficult to apply off-the-shelf machine learning techniques while still exploiting the rich content of the EMRs. In this paper, we explore the usage of a range of natural language processing (NLP) techniques to extract valuable predictors from uncoded consultation notes and study whether they can help to improve predictive performance. Methods We study a number of existing techniques for the extraction of predictors from the consultation notes, namely a bag of words based approach and topic modeling. In addition, we develop a dedicated technique to match the uncoded consultation notes with a medical ontology. We apply these techniques as an extension to an existing pipeline to extract predictors from EMRs. We evaluate them in the context of predictive modeling for colorectal cancer (CRC), a disease known to be difficult to diagnose before performing an endoscopy. Results Our results show that we are able to extract useful information from the consultation notes. The predictive performance of the ontology-based extraction method moves significantly beyond the benchmark of age and gender alone (area under the receiver operating characteristic curve (AUC) of 0.870 versus 0.831). We also observe more accurate predictive models by adding features derived from processing the consultation notes compared to solely using coded data (AUC of 0.896 versus 0.882) although the difference is not significant. The extracted features from the notes are shown be equally predictive (i.e. there is no significant difference in performance) compared to the coded data of the consultations. Conclusion It is possible to extract useful predictors from uncoded consultation notes that improve predictive performance. Techniques linking text to concepts in medical ontologies to derive these predictors are shown to perform best for predicting CRC in our EMR dataset.
Original languageEnglish
Pages (from-to)53-61
Number of pages9
JournalArtificial Intelligence in Medicine
Volume69
DOIs
Publication statusPublished - 6 Nov 2015

Fingerprint

Electronic medical equipment
Electronic Health Records
Colorectal Neoplasms
Referral and Consultation
Ontology
Learning systems
Area Under Curve
Endoscopy
Natural Language Processing
Processing
Benchmarking
Pipelines
ROC Curve

Keywords

  • Colorectal cancer
  • Natural language processing
  • Predictive modeling
  • Uncoded consultation notes

Cite this

@article{cdf9640547004efd974e1aa625f6b234,
title = "Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer",
abstract = "Objective Machine learning techniques can be used to extract predictive models for diseases from electronic medical records (EMRs). However, the nature of EMRs makes it difficult to apply off-the-shelf machine learning techniques while still exploiting the rich content of the EMRs. In this paper, we explore the usage of a range of natural language processing (NLP) techniques to extract valuable predictors from uncoded consultation notes and study whether they can help to improve predictive performance. Methods We study a number of existing techniques for the extraction of predictors from the consultation notes, namely a bag of words based approach and topic modeling. In addition, we develop a dedicated technique to match the uncoded consultation notes with a medical ontology. We apply these techniques as an extension to an existing pipeline to extract predictors from EMRs. We evaluate them in the context of predictive modeling for colorectal cancer (CRC), a disease known to be difficult to diagnose before performing an endoscopy. Results Our results show that we are able to extract useful information from the consultation notes. The predictive performance of the ontology-based extraction method moves significantly beyond the benchmark of age and gender alone (area under the receiver operating characteristic curve (AUC) of 0.870 versus 0.831). We also observe more accurate predictive models by adding features derived from processing the consultation notes compared to solely using coded data (AUC of 0.896 versus 0.882) although the difference is not significant. The extracted features from the notes are shown be equally predictive (i.e. there is no significant difference in performance) compared to the coded data of the consultations. Conclusion It is possible to extract useful predictors from uncoded consultation notes that improve predictive performance. Techniques linking text to concepts in medical ontologies to derive these predictors are shown to perform best for predicting CRC in our EMR dataset.",
keywords = "Colorectal cancer, Natural language processing, Predictive modeling, Uncoded consultation notes",
author = "Mark Hoogendoorn and Peter Szolovits and Moons, {Leon M G} and Numans, {Mattijs E.}",
year = "2015",
month = "11",
day = "6",
doi = "10.1016/j.artmed.2016.03.003",
language = "English",
volume = "69",
pages = "53--61",
journal = "Artificial Intelligence in Medicine",
issn = "0933-3657",
publisher = "Elsevier",

}

Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. / Hoogendoorn, Mark; Szolovits, Peter; Moons, Leon M G; Numans, Mattijs E.

In: Artificial Intelligence in Medicine, Vol. 69, 06.11.2015, p. 53-61.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer

AU - Hoogendoorn, Mark

AU - Szolovits, Peter

AU - Moons, Leon M G

AU - Numans, Mattijs E.

PY - 2015/11/6

Y1 - 2015/11/6

N2 - Objective Machine learning techniques can be used to extract predictive models for diseases from electronic medical records (EMRs). However, the nature of EMRs makes it difficult to apply off-the-shelf machine learning techniques while still exploiting the rich content of the EMRs. In this paper, we explore the usage of a range of natural language processing (NLP) techniques to extract valuable predictors from uncoded consultation notes and study whether they can help to improve predictive performance. Methods We study a number of existing techniques for the extraction of predictors from the consultation notes, namely a bag of words based approach and topic modeling. In addition, we develop a dedicated technique to match the uncoded consultation notes with a medical ontology. We apply these techniques as an extension to an existing pipeline to extract predictors from EMRs. We evaluate them in the context of predictive modeling for colorectal cancer (CRC), a disease known to be difficult to diagnose before performing an endoscopy. Results Our results show that we are able to extract useful information from the consultation notes. The predictive performance of the ontology-based extraction method moves significantly beyond the benchmark of age and gender alone (area under the receiver operating characteristic curve (AUC) of 0.870 versus 0.831). We also observe more accurate predictive models by adding features derived from processing the consultation notes compared to solely using coded data (AUC of 0.896 versus 0.882) although the difference is not significant. The extracted features from the notes are shown be equally predictive (i.e. there is no significant difference in performance) compared to the coded data of the consultations. Conclusion It is possible to extract useful predictors from uncoded consultation notes that improve predictive performance. Techniques linking text to concepts in medical ontologies to derive these predictors are shown to perform best for predicting CRC in our EMR dataset.

AB - Objective Machine learning techniques can be used to extract predictive models for diseases from electronic medical records (EMRs). However, the nature of EMRs makes it difficult to apply off-the-shelf machine learning techniques while still exploiting the rich content of the EMRs. In this paper, we explore the usage of a range of natural language processing (NLP) techniques to extract valuable predictors from uncoded consultation notes and study whether they can help to improve predictive performance. Methods We study a number of existing techniques for the extraction of predictors from the consultation notes, namely a bag of words based approach and topic modeling. In addition, we develop a dedicated technique to match the uncoded consultation notes with a medical ontology. We apply these techniques as an extension to an existing pipeline to extract predictors from EMRs. We evaluate them in the context of predictive modeling for colorectal cancer (CRC), a disease known to be difficult to diagnose before performing an endoscopy. Results Our results show that we are able to extract useful information from the consultation notes. The predictive performance of the ontology-based extraction method moves significantly beyond the benchmark of age and gender alone (area under the receiver operating characteristic curve (AUC) of 0.870 versus 0.831). We also observe more accurate predictive models by adding features derived from processing the consultation notes compared to solely using coded data (AUC of 0.896 versus 0.882) although the difference is not significant. The extracted features from the notes are shown be equally predictive (i.e. there is no significant difference in performance) compared to the coded data of the consultations. Conclusion It is possible to extract useful predictors from uncoded consultation notes that improve predictive performance. Techniques linking text to concepts in medical ontologies to derive these predictors are shown to perform best for predicting CRC in our EMR dataset.

KW - Colorectal cancer

KW - Natural language processing

KW - Predictive modeling

KW - Uncoded consultation notes

UR - http://www.scopus.com/inward/record.url?scp=84964319416&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964319416&partnerID=8YFLogxK

U2 - 10.1016/j.artmed.2016.03.003

DO - 10.1016/j.artmed.2016.03.003

M3 - Article

VL - 69

SP - 53

EP - 61

JO - Artificial Intelligence in Medicine

JF - Artificial Intelligence in Medicine

SN - 0933-3657

ER -