Computational analyses to characterise hidden information in short and long read sequencing data of human genomes: there’s more than meets the reference

Jasper Linthorst

    Research output: PhD ThesisPhD-Thesis - Research and graduation internal

    389 Downloads (Pure)

    Abstract

    Next generation sequencing (NGS) has enabled us to accurately determine the nucleotide sequence of short fragments of DNA at a massive scale, which has led to various clinical applications of human genome sequencing. To extract information from these NGS experiments, virtually all analyses make use of a reference assembly of the human genome to map sequenced reads. Importantly, in these experiments a large fraction (~12%) of the sequenced DNA fragments are ignored as the origin of these sequences cannot be traced back to a (single) position on the reference assembly. The origin of these ignored or unmapped fragments is dual. On the one hand these fragments originate from sequence that occurs more than once (repeats). On the other hand, these fragments originate from sequence that is absent from the reference assembly. In practice, many of these unmapped fragments originate from so-called structural variations (SVs) where the sequenced genome differs from the reference assembly. In Part 1 of this thesis, we study this source of sequence variation by making use of so-called long-read sequencing technology and introduce methods to do so. In Part 2 of this thesis, we specifically study the DNA fragments that can’t be traced back to the human reference assembly, but instead seem to originate from DNA viruses.
    Original languageEnglish
    QualificationPhD
    Awarding Institution
    • Vrije Universiteit Amsterdam
    Supervisors/Advisors
    • Sistermans, E.A., Supervisor, -
    • Reinders, M.J.T., Supervisor, -
    • Holstege, H., Co-supervisor, -
    Award date9 Dec 2022
    Place of Publications.l.
    Publisher
    Publication statusPublished - 9 Dec 2022

    Keywords

    • long-read sequencing
    • structural variation
    • de-novo assembly
    • next-generation sequencing
    • viral DNA
    • cell-free DNA
    • non-invasive prenatal testing
    • NIPT

    Fingerprint

    Dive into the research topics of 'Computational analyses to characterise hidden information in short and long read sequencing data of human genomes: there’s more than meets the reference'. Together they form a unique fingerprint.

    Cite this