Batch correction of genomic data in chronic fatigue syndrome using CMA-ES

A.L. Rincon, A.D. Kraneveld, A. Tonda

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

© 2020 Owner/Author.Modern genomic sequencing machines can measure thousands of probes from different specimens. Nevertheless, theoretically comparable datasets can show considerably distinguishable properties, depending on both platform and specimen, a phenomenon known as batch effect. Batch correction is the technique aiming at removing this effect from the data. A possible approach to batch correction is to find a transformation function between different datasets, but optimizing the weights of such a function is not trivial: As there is no explicit gradient to follow, traditional optimization techniques would fail. In this work, we propose to use a state-of-the-art evolutionary algorithm, Covariance Matrix Adaptation Evolution Strategy, to optimize the weights of a transformation function for batch correction. The fitness function is driven by the classification accuracy of an ensemble of algorithms on the transformed data. The case study selected to test the proposed approach is mRNA gene expression data of Chronic Fatigue Syndrome, a disease for which there is currently no established diagnostic test. The transformation function obtained from three datasets, produced from different specimens, remarkably improves the performance of classifiers on the task of diagnosing Chronic Fatigue. The presented results are an important steppingstone towards a reliable diagnostic test for this syndrome.
Original languageEnglish
Title of host publicationGECCO 2020 Companion - Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
PublisherAssociation for Computing Machinery, Inc
Pages277-278
ISBN (Electronic)9781450371278
DOIs
Publication statusPublished - 8 Jul 2020
Externally publishedYes
Event2020 Genetic and Evolutionary Computation Conference, GECCO 2020 - Cancun, Mexico
Duration: 8 Jul 202012 Jul 2020

Conference

Conference2020 Genetic and Evolutionary Computation Conference, GECCO 2020
Country/TerritoryMexico
CityCancun
Period8/07/2012/07/20

Fingerprint

Dive into the research topics of 'Batch correction of genomic data in chronic fatigue syndrome using CMA-ES'. Together they form a unique fingerprint.

Cite this