Bayesian mixture regression analysis for regulation of Pluripotency in ES cells

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Background: Observed levels of gene expression strongly depend on both activity of DNA binding transcription factors (TFs) and chromatin state through different histone modifications (HMs). In order to recover the functional relationship between local chromatin state, TF binding and observed levels of gene expression, regression methods have proven to be useful tools. They have been successfully applied to predict mRNA levels from genome-wide experimental data and they provide insight into context-dependent gene regulatory mechanisms. However, heterogeneity arising from gene-set specific regulatory interactions is often overlooked. Results: We show that regression models that predict gene expression by using experimentally derived ChIP-seq profiles of TFs can be significantly improved by mixture modelling. In order to find biologically relevant gene clusters, we employ a Bayesian allocation procedure which allows us to integrate additional biological information such as three-dimensional nuclear organization of chromosomes and gene function. The data integration procedure involves transforming the additional data into gene similarity values. We propose a generic similarity measure that is especially suitable for situations where the additional data are of both continuous and discrete type, and compare its performance with similar measures in the context of mixture modelling. Conclusions: We applied the proposed method on a data from mouse embryonic stem cells (ESC). We find that including additional data results in mixture components that exhibit biologically meaningful gene clusters, and provides valuable insight into the heterogeneity of the regulatory interactions.

Original languageEnglish
Article number3
Pages (from-to)1-13
Number of pages13
JournalBMC Bioinformatics
Volume21
DOIs
Publication statusPublished - 2 Jan 2020

Fingerprint

Regression Analysis
Regression analysis
Genes
Gene
Transcription Factors
Cell
Multigene Family
Transcription Factor
Transcription factors
Gene Expression
Chromatin
Mixture Modeling
Gene expression
Histone Code
Regulator Genes
Predict
Functional Relationship
Stem Cells
Chromosomes
Data Integration

Keywords

  • Bayesian analysis
  • Data integration
  • Mixture regression
  • Pluripotency
  • Transcription regulation

Cite this

@article{dd55fd2cbef349129375234baef5712d,
title = "Bayesian mixture regression analysis for regulation of Pluripotency in ES cells",
abstract = "Background: Observed levels of gene expression strongly depend on both activity of DNA binding transcription factors (TFs) and chromatin state through different histone modifications (HMs). In order to recover the functional relationship between local chromatin state, TF binding and observed levels of gene expression, regression methods have proven to be useful tools. They have been successfully applied to predict mRNA levels from genome-wide experimental data and they provide insight into context-dependent gene regulatory mechanisms. However, heterogeneity arising from gene-set specific regulatory interactions is often overlooked. Results: We show that regression models that predict gene expression by using experimentally derived ChIP-seq profiles of TFs can be significantly improved by mixture modelling. In order to find biologically relevant gene clusters, we employ a Bayesian allocation procedure which allows us to integrate additional biological information such as three-dimensional nuclear organization of chromosomes and gene function. The data integration procedure involves transforming the additional data into gene similarity values. We propose a generic similarity measure that is especially suitable for situations where the additional data are of both continuous and discrete type, and compare its performance with similar measures in the context of mixture modelling. Conclusions: We applied the proposed method on a data from mouse embryonic stem cells (ESC). We find that including additional data results in mixture components that exhibit biologically meaningful gene clusters, and provides valuable insight into the heterogeneity of the regulatory interactions.",
keywords = "Bayesian analysis, Data integration, Mixture regression, Pluripotency, Transcription regulation",
author = "Mehran Aflakparast and Geert Geeven and {De Gunst}, {Mathisca C.M.}",
year = "2020",
month = "1",
day = "2",
doi = "10.1186/s12859-019-3331-2",
language = "English",
volume = "21",
pages = "1--13",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

Bayesian mixture regression analysis for regulation of Pluripotency in ES cells. / Aflakparast, Mehran; Geeven, Geert; De Gunst, Mathisca C.M.

In: BMC Bioinformatics, Vol. 21, 3, 02.01.2020, p. 1-13.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Bayesian mixture regression analysis for regulation of Pluripotency in ES cells

AU - Aflakparast, Mehran

AU - Geeven, Geert

AU - De Gunst, Mathisca C.M.

PY - 2020/1/2

Y1 - 2020/1/2

N2 - Background: Observed levels of gene expression strongly depend on both activity of DNA binding transcription factors (TFs) and chromatin state through different histone modifications (HMs). In order to recover the functional relationship between local chromatin state, TF binding and observed levels of gene expression, regression methods have proven to be useful tools. They have been successfully applied to predict mRNA levels from genome-wide experimental data and they provide insight into context-dependent gene regulatory mechanisms. However, heterogeneity arising from gene-set specific regulatory interactions is often overlooked. Results: We show that regression models that predict gene expression by using experimentally derived ChIP-seq profiles of TFs can be significantly improved by mixture modelling. In order to find biologically relevant gene clusters, we employ a Bayesian allocation procedure which allows us to integrate additional biological information such as three-dimensional nuclear organization of chromosomes and gene function. The data integration procedure involves transforming the additional data into gene similarity values. We propose a generic similarity measure that is especially suitable for situations where the additional data are of both continuous and discrete type, and compare its performance with similar measures in the context of mixture modelling. Conclusions: We applied the proposed method on a data from mouse embryonic stem cells (ESC). We find that including additional data results in mixture components that exhibit biologically meaningful gene clusters, and provides valuable insight into the heterogeneity of the regulatory interactions.

AB - Background: Observed levels of gene expression strongly depend on both activity of DNA binding transcription factors (TFs) and chromatin state through different histone modifications (HMs). In order to recover the functional relationship between local chromatin state, TF binding and observed levels of gene expression, regression methods have proven to be useful tools. They have been successfully applied to predict mRNA levels from genome-wide experimental data and they provide insight into context-dependent gene regulatory mechanisms. However, heterogeneity arising from gene-set specific regulatory interactions is often overlooked. Results: We show that regression models that predict gene expression by using experimentally derived ChIP-seq profiles of TFs can be significantly improved by mixture modelling. In order to find biologically relevant gene clusters, we employ a Bayesian allocation procedure which allows us to integrate additional biological information such as three-dimensional nuclear organization of chromosomes and gene function. The data integration procedure involves transforming the additional data into gene similarity values. We propose a generic similarity measure that is especially suitable for situations where the additional data are of both continuous and discrete type, and compare its performance with similar measures in the context of mixture modelling. Conclusions: We applied the proposed method on a data from mouse embryonic stem cells (ESC). We find that including additional data results in mixture components that exhibit biologically meaningful gene clusters, and provides valuable insight into the heterogeneity of the regulatory interactions.

KW - Bayesian analysis

KW - Data integration

KW - Mixture regression

KW - Pluripotency

KW - Transcription regulation

UR - http://www.scopus.com/inward/record.url?scp=85077467390&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077467390&partnerID=8YFLogxK

U2 - 10.1186/s12859-019-3331-2

DO - 10.1186/s12859-019-3331-2

M3 - Article

VL - 21

SP - 1

EP - 13

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 3

ER -