Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns

Xixi Lu, Seyed Amin Tabatabaei, Mark Hoogendoorn, Hajo A. Reijers

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients’ clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. In this paper, we propose a novel trace clustering approach by using sample sets of patients provided by medical experts. More specifically, we learn frequent sequence patterns on a sample set, rank each patient based on the patterns, and use an automated approach to determine the corresponding cluster. We find each cluster separately, while the frequent sequence patterns are used to discover a process map. The approach is implemented in ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.

Original languageEnglish
Title of host publicationBusiness Process Management
Subtitle of host publication17th International Conference, BPM 2019, Proceedings
EditorsThomas Hildebrandt, Boudewijn F. van Dongen, Maximilian Röglinger, Jan Mendling
PublisherSpringer Verlag
Pages198-215
Number of pages18
ISBN (Print)9783030266189
DOIs
Publication statusPublished - 2019
Event17th International Conference on Business Process Management, BPM 2019 - Vienna, Austria
Duration: 1 Sep 20196 Sep 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11675 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Business Process Management, BPM 2019
CountryAustria
CityVienna
Period1/09/196/09/19

Fingerprint

Healthcare
Trace
Clustering
Pathway
Medical problems
Tumors
Diabetes
Kidney
Large Data Sets
Grouping
Tumor
Distinct
Line
Evaluation

Keywords

  • Frequent sequential patterns
  • Machine learning
  • Process mining
  • Trace clustering

Cite this

Lu, X., Tabatabaei, S. A., Hoogendoorn, M., & Reijers, H. A. (2019). Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns. In T. Hildebrandt, B. F. van Dongen, M. Röglinger, & J. Mendling (Eds.), Business Process Management: 17th International Conference, BPM 2019, Proceedings (pp. 198-215). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11675 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-26619-6_14
Lu, Xixi ; Tabatabaei, Seyed Amin ; Hoogendoorn, Mark ; Reijers, Hajo A. / Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns. Business Process Management: 17th International Conference, BPM 2019, Proceedings. editor / Thomas Hildebrandt ; Boudewijn F. van Dongen ; Maximilian Röglinger ; Jan Mendling. Springer Verlag, 2019. pp. 198-215 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{294cb95bdabc4046a62c97c7f40c3ab3,
title = "Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns",
abstract = "Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients’ clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. In this paper, we propose a novel trace clustering approach by using sample sets of patients provided by medical experts. More specifically, we learn frequent sequence patterns on a sample set, rank each patient based on the patterns, and use an automated approach to determine the corresponding cluster. We find each cluster separately, while the frequent sequence patterns are used to discover a process map. The approach is implemented in ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.",
keywords = "Frequent sequential patterns, Machine learning, Process mining, Trace clustering",
author = "Xixi Lu and Tabatabaei, {Seyed Amin} and Mark Hoogendoorn and Reijers, {Hajo A.}",
year = "2019",
doi = "10.1007/978-3-030-26619-6_14",
language = "English",
isbn = "9783030266189",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "198--215",
editor = "Thomas Hildebrandt and {van Dongen}, {Boudewijn F.} and Maximilian R{\"o}glinger and Jan Mendling",
booktitle = "Business Process Management",
address = "Germany",

}

Lu, X, Tabatabaei, SA, Hoogendoorn, M & Reijers, HA 2019, Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns. in T Hildebrandt, BF van Dongen, M Röglinger & J Mendling (eds), Business Process Management: 17th International Conference, BPM 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11675 LNCS, Springer Verlag, pp. 198-215, 17th International Conference on Business Process Management, BPM 2019, Vienna, Austria, 1/09/19. https://doi.org/10.1007/978-3-030-26619-6_14

Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns. / Lu, Xixi; Tabatabaei, Seyed Amin; Hoogendoorn, Mark; Reijers, Hajo A.

Business Process Management: 17th International Conference, BPM 2019, Proceedings. ed. / Thomas Hildebrandt; Boudewijn F. van Dongen; Maximilian Röglinger; Jan Mendling. Springer Verlag, 2019. p. 198-215 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11675 LNCS).

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns

AU - Lu, Xixi

AU - Tabatabaei, Seyed Amin

AU - Hoogendoorn, Mark

AU - Reijers, Hajo A.

PY - 2019

Y1 - 2019

N2 - Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients’ clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. In this paper, we propose a novel trace clustering approach by using sample sets of patients provided by medical experts. More specifically, we learn frequent sequence patterns on a sample set, rank each patient based on the patterns, and use an automated approach to determine the corresponding cluster. We find each cluster separately, while the frequent sequence patterns are used to discover a process map. The approach is implemented in ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.

AB - Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients’ clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. In this paper, we propose a novel trace clustering approach by using sample sets of patients provided by medical experts. More specifically, we learn frequent sequence patterns on a sample set, rank each patient based on the patterns, and use an automated approach to determine the corresponding cluster. We find each cluster separately, while the frequent sequence patterns are used to discover a process map. The approach is implemented in ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.

KW - Frequent sequential patterns

KW - Machine learning

KW - Process mining

KW - Trace clustering

UR - http://www.scopus.com/inward/record.url?scp=85072854261&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072854261&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-26619-6_14

DO - 10.1007/978-3-030-26619-6_14

M3 - Conference contribution

SN - 9783030266189

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 198

EP - 215

BT - Business Process Management

A2 - Hildebrandt, Thomas

A2 - van Dongen, Boudewijn F.

A2 - Röglinger, Maximilian

A2 - Mendling, Jan

PB - Springer Verlag

ER -

Lu X, Tabatabaei SA, Hoogendoorn M, Reijers HA. Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns. In Hildebrandt T, van Dongen BF, Röglinger M, Mendling J, editors, Business Process Management: 17th International Conference, BPM 2019, Proceedings. Springer Verlag. 2019. p. 198-215. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-26619-6_14