Abstract
An organism's genome serves as a genetic blueprint, storing all information needed to build and maintain the organism. Genomes often come in copies, where each copy stems from one of the ancestors. Within a population, genomes show genetic variation as a result of mutation and recombination; also the copies of the genome in a single individual will differ in terms of the genetic variants affecting them. These copy-specific sequences are referred to as haplotypes and their analysis plays an important role in genetics, medicine, and various other disciplines.
Sequencing technologies enable reading genomic sequences, but only for relatively short pieces of sequence. The goal of haplotype-aware genome assembly is to reconstruct each of the individual haplotypes from a given set of short sequences obtained from a sequencing machine. This, however, is a major challenge: sequencing technologies are error-prone and haplotypes may occur only rarely within a population. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies.
Existing methods typically make use of a so-called "reference genome", an established genome sequence resulting from earlier studies, as a starting point for haplotype reconstruction. This strategy leads to reference-induced biases, which can have a great impact on assembly quality when dealing with divergent haplotypes and can hamper the discovery of novel sequences. We avoid such issues by developing de novo methods, meaning that we do not require any prior information such as a reference genome.
We present several new approaches to de novo assembly of individual haplotypes from mixed samples, which can be combined to form the first de novo approach for full-length viral genome reconstruction. This can be applied for analysis of viral infections from patient samples, such as Zika virus, HIV, Ebola virus, and hepatitis C virus. In addition, we also enable accurate reconstruction of heavily divergent regions in mammalian (including human) genomes.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 25 Sept 2019 |
Place of Publication | Utrecht |
Publisher | |
Print ISBNs | 9789463237437 |
Electronic ISBNs | 9789463237437 |
Publication status | Published - 25 Sept 2019 |
Keywords
- genome assemby
- viral quasispecies
- haplotype
- de novo assembly
- overlap graph
- variation graph