De novo approaches to haplotype-aware genome assembly

Jasmijn Anne Baaijens

Research output: ThesisDoctoral thesis 2 (Research NOT UU / Graduation UU)

Abstract

An organism's genome serves as a genetic blueprint, storing all information needed to build and maintain the organism. Genomes often come in copies, where each copy stems from one of the ancestors. Within a population, genomes show genetic variation as a result of mutation and recombination; also the copies of the genome in a single individual will differ in terms of the genetic variants affecting them. These copy-specific sequences are referred to as haplotypes and their analysis plays an important role in genetics, medicine, and various other disciplines. Sequencing technologies enable reading genomic sequences, but only for relatively short pieces of sequence. The goal of haplotype-aware genome assembly is to reconstruct each of the individual haplotypes from a given set of short sequences obtained from a sequencing machine. This, however, is a major challenge: sequencing technologies are error-prone and haplotypes may occur only rarely within a population. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Existing methods typically make use of a so-called "reference genome", an established genome sequence resulting from earlier studies, as a starting point for haplotype reconstruction. This strategy leads to reference-induced biases, which can have a great impact on assembly quality when dealing with divergent haplotypes and can hamper the discovery of novel sequences. We avoid such issues by developing de novo methods, meaning that we do not require any prior information such as a reference genome. We present several new approaches to de novo assembly of individual haplotypes from mixed samples, which can be combined to form the first de novo approach for full-length viral genome reconstruction. This can be applied for analysis of viral infections from patient samples, such as Zika virus, HIV, Ebola virus, and hepatitis C virus. In addition, we also enable accurate reconstruction of heavily divergent regions in mammalian (including human) genomes.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Utrecht University
Supervisors/Advisors
  • Schönhuth, A., Primary supervisor
Award date25 Sept 2019
Place of PublicationUtrecht
Publisher
Print ISBNs9789463237437
Electronic ISBNs9789463237437
Publication statusPublished - 25 Sept 2019

Keywords

  • genome assemby
  • viral quasispecies
  • haplotype
  • de novo assembly
  • overlap graph
  • variation graph

Fingerprint

Dive into the research topics of 'De novo approaches to haplotype-aware genome assembly'. Together they form a unique fingerprint.

Cite this