Stepwise classification of cancer samples using clinical and molecular data

A. Obulkasim, G.A. Meijer, M.A. van de Wiel

    Research output: Contribution to JournalArticleAcademicpeer-review

    Abstract

    Background: Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient.Results: We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples.Conclusions: Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website. © 2011 Obulkasim et al; licensee BioMed Central Ltd.
    Original languageEnglish
    Article number422
    JournalBMC Bioinformatics
    Volume12
    Issue number1
    DOIs
    Publication statusPublished - 2011

    Fingerprint

    Cancer
    Classifiers
    Costs and Cost Analysis
    Neoplasms
    Classifier
    Costs
    Computational Biology
    Uncertainty
    Bioinformatics
    Websites
    Measure Data
    Prediction
    Risk Factors
    Shortage
    Classification Algorithm
    Loss Function
    Waiting Time
    Data analysis
    Exceed
    Proportion

    Cite this

    Obulkasim, A. ; Meijer, G.A. ; van de Wiel, M.A. / Stepwise classification of cancer samples using clinical and molecular data. In: BMC Bioinformatics. 2011 ; Vol. 12, No. 1.
    @article{67be6f629ddb41cfb864699204464d29,
    title = "Stepwise classification of cancer samples using clinical and molecular data",
    abstract = "Background: Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient.Results: We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples.Conclusions: Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website. {\circledC} 2011 Obulkasim et al; licensee BioMed Central Ltd.",
    author = "A. Obulkasim and G.A. Meijer and {van de Wiel}, M.A.",
    year = "2011",
    doi = "10.1186/1471-2105-12-422",
    language = "English",
    volume = "12",
    journal = "BMC Bioinformatics",
    issn = "1471-2105",
    publisher = "BioMed Central",
    number = "1",

    }

    Stepwise classification of cancer samples using clinical and molecular data. / Obulkasim, A.; Meijer, G.A.; van de Wiel, M.A.

    In: BMC Bioinformatics, Vol. 12, No. 1, 422, 2011.

    Research output: Contribution to JournalArticleAcademicpeer-review

    TY - JOUR

    T1 - Stepwise classification of cancer samples using clinical and molecular data

    AU - Obulkasim, A.

    AU - Meijer, G.A.

    AU - van de Wiel, M.A.

    PY - 2011

    Y1 - 2011

    N2 - Background: Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient.Results: We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples.Conclusions: Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website. © 2011 Obulkasim et al; licensee BioMed Central Ltd.

    AB - Background: Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient.Results: We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples.Conclusions: Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website. © 2011 Obulkasim et al; licensee BioMed Central Ltd.

    U2 - 10.1186/1471-2105-12-422

    DO - 10.1186/1471-2105-12-422

    M3 - Article

    VL - 12

    JO - BMC Bioinformatics

    JF - BMC Bioinformatics

    SN - 1471-2105

    IS - 1

    M1 - 422

    ER -