Skip to main navigation Skip to search Skip to main content

Computationally efficient multi-sample flow cytometry data analysis using Gaussian mixture models

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Background: An important challenge in flow cytometry (FCM) data analysis is making comparisons of corresponding cell populations across multiple FCM samples. An interesting solution is creating a statistical mixture model for multiple samples simultaneously, as such a multi-sample model can characterize a heterogeneous set of samples, and facilitates direct comparison of cell populations across the data samples. The multi-sample approach to statistical mixture modeling has been explored in a number of reports, mostly within a Bayesian framework and with high computational complexity. Although these approaches are effective, they are also computationally demanding, and therefore do not relate well to the requirement of scalability, which is essential in the multi-sample setting. This limits their utility in the analysis of large sets of large FCM samples.

Results: We show that basic Gaussian mixture models can be extended to large data sets consisting of multiple samples, using a computationally efficient implementation of the expectation-maximization algorithm. We show that the multi-sample Gaussian mixture model (MSGMM) is competitive with other models, in both rare cell detection and sample classification accuracy. This allows us to further explore the utility of MSGMMs in the analysis of heterogeneous sets of samples. We demonstrate how simple heuristics on MSGMM model output can directly reveal structural patterns in a collection of FCM samples.

Conclusions: We recover the efficiency and utility of the basic MSGMM which underlies more complex and non-parametric Bayesian hierarchical mixture models. The possibility of fitting GMMs to large sets of FCM samples provides opportunities for the discovery of associations between sample composition and sample meta-data such as treatment responses and clinical outcomes.
Original languageEnglish
Article number262
Pages (from-to)1-18
Number of pages18
JournalBMC Bioinformatics
Volume26
Issue number1
Early online date23 Oct 2025
DOIs
Publication statusPublished - Dec 2025

Funding

This work has been funded by Stichting Cancer Center Amsterdam (grant ID CCA2020-9-68).

FundersFunder number
Cancer Center AmsterdamCCA2020-9-68

    Fingerprint

    Dive into the research topics of 'Computationally efficient multi-sample flow cytometry data analysis using Gaussian mixture models'. Together they form a unique fingerprint.

    Cite this