TY - GEN
T1 - Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns
AU - Lu, Xixi
AU - Tabatabaei, Seyed Amin
AU - Hoogendoorn, Mark
AU - Reijers, Hajo A.
PY - 2019
Y1 - 2019
N2 - Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients’ clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. In this paper, we propose a novel trace clustering approach by using sample sets of patients provided by medical experts. More specifically, we learn frequent sequence patterns on a sample set, rank each patient based on the patterns, and use an automated approach to determine the corresponding cluster. We find each cluster separately, while the frequent sequence patterns are used to discover a process map. The approach is implemented in ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.
AB - Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients’ clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. In this paper, we propose a novel trace clustering approach by using sample sets of patients provided by medical experts. More specifically, we learn frequent sequence patterns on a sample set, rank each patient based on the patterns, and use an automated approach to determine the corresponding cluster. We find each cluster separately, while the frequent sequence patterns are used to discover a process map. The approach is implemented in ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.
KW - Frequent sequential patterns
KW - Machine learning
KW - Process mining
KW - Trace clustering
UR - http://www.scopus.com/inward/record.url?scp=85072854261&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072854261&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-26619-6_14
DO - 10.1007/978-3-030-26619-6_14
M3 - Conference contribution
AN - SCOPUS:85072854261
SN - 9783030266189
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 198
EP - 215
BT - Business Process Management
A2 - Hildebrandt, Thomas
A2 - van Dongen, Boudewijn F.
A2 - Röglinger, Maximilian
A2 - Mendling, Jan
PB - Springer Verlag
T2 - 17th International Conference on Business Process Management, BPM 2019
Y2 - 1 September 2019 through 6 September 2019
ER -