Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis
Dissertationsschrift
(Sprache: Englisch)
This PhD thesis provides novel solutions to major topics within the analysis of next-generation sequencing data, focusing on parallelization, scalability and reproducibility.
Voraussichtlich lieferbar in 3 Tag(en)
versandkostenfrei
Buch (Kartoniert)
Fr. 12.90
inkl. MwSt.
- Kreditkarte, Paypal, Rechnungskauf
- 30 Tage Widerrufsrecht
Produktdetails
Produktinformationen zu „Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis “
This PhD thesis provides novel solutions to major topics within the analysis of next-generation sequencing data, focusing on parallelization, scalability and reproducibility.
Klappentext zu „Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis “
The analysis of next-generation sequencing (NGS) data is a major topic in bioinfor-matics: short reads obtained from DNA, the molecule encoding the genome of livingorganisms, are processed to provide insight into biological or medical questions. Thisthesis provides novel solutions to major topics within the analysis of NGS data, focusingon parallelization, scalability and reproducibility.The read mapping problem is to find the origin of the short reads within a given referencegenome. We contribute the q-group index, a novel data structure for read mapping withparticularly small memory footprint. The q-group index comes with massively parallelbuild and query algorithms targeted towards modern graphics processing units (GPUs).On top, the read mapping software PEANUT is presented, which outperforms state ofthe art read mappers in speed while maintaining their accuracy.The variant calling problem is to infer (i.e., call) genetic variants of individuals comparedto a reference genome using mapped reads. It is usually solved in a Bayesian way. In this work, we show how to integrate filtering of variants into the calling with analgebraic approach and provide an intuitive solution for controlling the false discoveryrate along with solving other challenges of variant calling like scaling with a growingset of biological samples.Depending on the research question, the analysis of NGS data entails many other steps,typically involving diverse tools, data transformations and aggregation of results. Thesesteps can be orchestrated by workflow management. We present the general purposeworkflow system Snakemake, which provides an easy to read domain-specific languagefor defining and documenting workflows. Snakemake provides an execution environmentthat allows to scale a workflow to available resources, including parallelization acrossCPU cores or cluster nodes, restricting memory usage or the number of availablecoprocessors like GPUs.
Autoren-Porträt von Johannes Köster
Johannes Köster is a computer scientist with a focus on algorithm engineering and data analysis in bioinformatics. Currently, he works as a Postdoctoral Research Fellow in the groups of Shirley Liu, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health and Myles Brown, Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute.
Bibliographische Angaben
- Autor: Johannes Köster
- Altersempfehlung: Ab 18 Jahre
- 2015, 132 Seiten, Masse: 14,8 x 21 cm, Kartoniert (TB), Englisch
- Verlag: epubli
- ISBN-10: 3737537771
- ISBN-13: 9783737537773
Sprache:
Englisch
Kommentar zu "Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis"
0 Gebrauchte Artikel zu „Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis“
Zustand | Preis | Porto | Zahlung | Verkäufer | Rating |
---|
Schreiben Sie einen Kommentar zu "Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis".
Kommentar verfassen