Abstract
Motivation
In pathway analysis, we aim to establish a connection between the activity of a particular biological pathway and a difference in phenotype. There are many available methods to perform pathway analysis, many of them rely on an upstream differential expression analysis, and many model the relations between the abundances of the analytes in a pathway as linear relationships.
Results
Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles and, therefore, does not model the association between pathway activity and phenotype, resulting in relatively few assumptions. For this, we construct a graph of the data points for each pathway using a nearest-neighbor approach and score the association between the structure of this graph and the phenotype of these same samples using Mutual Information while adjusting for the effects of random chance in each score. The initial nearest neighbor approach evades individual gene-level comparisons, hence making the method scalable and less vulnerable to missing values. These properties make our method particularly useful for single-cell data. We benchmarked our method on several single-cell datasets, comparing it to established and new methods, and found that it produces robust, reproducible, and meaningful scores.
In pathway analysis, we aim to establish a connection between the activity of a particular biological pathway and a difference in phenotype. There are many available methods to perform pathway analysis, many of them rely on an upstream differential expression analysis, and many model the relations between the abundances of the analytes in a pathway as linear relationships.
Results
Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles and, therefore, does not model the association between pathway activity and phenotype, resulting in relatively few assumptions. For this, we construct a graph of the data points for each pathway using a nearest-neighbor approach and score the association between the structure of this graph and the phenotype of these same samples using Mutual Information while adjusting for the effects of random chance in each score. The initial nearest neighbor approach evades individual gene-level comparisons, hence making the method scalable and less vulnerable to missing values. These properties make our method particularly useful for single-cell data. We benchmarked our method on several single-cell datasets, comparing it to established and new methods, and found that it produces robust, reproducible, and meaningful scores.
Original language | English |
---|---|
Article number | btad776 |
Pages (from-to) | 1-10 |
Number of pages | 10 |
Journal | Bioinformatics |
Volume | 40 |
Issue number | 1 |
Early online date | 9 Jan 2024 |
DOIs | |
Publication status | Published - Jan 2024 |
Bibliographical note
Funding Information:This work was supported by a grant from the Swedish Foundation for Strategic Research [BD15-0043 to L.K.].
Publisher Copyright:
© 2024 The Author(s). Published by Oxford University Press.
Keywords
- pathway analysis
- information theory
- single cell