Abstract
Background: An important challenge in flow cytometry (FCM) data analysis is making comparisons of corresponding cell populations across multiple FCM samples. An interesting solution is creating a statistical mixture model for multiple samples simultaneously, as such a multi-sample model can characterize a heterogeneous set of samples, and facilitates direct comparison of cell populations across the data samples. The multi-sample approach to statistical mixture modeling has been explored in a number of reports, mostly within a Bayesian framework and with high computational complexity. Although these approaches are effective, they are also computationally demanding, and therefore do not relate well to the requirement of scalability, which is essential in the multi-sample setting. This limits their utility in the analysis of large sets of large FCM samples.
Results: We show that basic Gaussian mixture models can be extended to large data sets consisting of multiple samples, using a computationally efficient implementation of the expectation-maximization algorithm. We show that the multi-sample Gaussian mixture model (MSGMM) is competitive with other models, in both rare cell detection and sample classification accuracy. This allows us to further explore the utility of MSGMMs in the analysis of heterogeneous sets of samples. We demonstrate how simple heuristics on MSGMM model output can directly reveal structural patterns in a collection of FCM samples.
Conclusions: We recover the efficiency and utility of the basic MSGMM which underlies more complex and non-parametric Bayesian hierarchical mixture models. The possibility of fitting GMMs to large sets of FCM samples provides opportunities for the discovery of associations between sample composition and sample meta-data such as treatment responses and clinical outcomes.
Results: We show that basic Gaussian mixture models can be extended to large data sets consisting of multiple samples, using a computationally efficient implementation of the expectation-maximization algorithm. We show that the multi-sample Gaussian mixture model (MSGMM) is competitive with other models, in both rare cell detection and sample classification accuracy. This allows us to further explore the utility of MSGMMs in the analysis of heterogeneous sets of samples. We demonstrate how simple heuristics on MSGMM model output can directly reveal structural patterns in a collection of FCM samples.
Conclusions: We recover the efficiency and utility of the basic MSGMM which underlies more complex and non-parametric Bayesian hierarchical mixture models. The possibility of fitting GMMs to large sets of FCM samples provides opportunities for the discovery of associations between sample composition and sample meta-data such as treatment responses and clinical outcomes.
| Original language | English |
|---|---|
| Article number | 262 |
| Pages (from-to) | 1-18 |
| Number of pages | 18 |
| Journal | BMC Bioinformatics |
| Volume | 26 |
| Issue number | 1 |
| Early online date | 23 Oct 2025 |
| DOIs | |
| Publication status | Published - Dec 2025 |
Funding
This work has been funded by Stichting Cancer Center Amsterdam (grant ID CCA2020-9-68).
| Funders | Funder number |
|---|---|
| Cancer Center Amsterdam | CCA2020-9-68 |
Fingerprint
Dive into the research topics of 'Computationally efficient multi-sample flow cytometry data analysis using Gaussian mixture models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver