dc.contributor.advisor | Willassen, Nils-Peder | |
dc.contributor.author | Robertsen, Espen Mikal | |
dc.date.accessioned | 2017-06-27T07:18:53Z | |
dc.date.available | 2017-06-27T07:18:53Z | |
dc.date.issued | 2017-05-19 | |
dc.description.abstract | With the accelerated advances in sequencing technology the last decade, the field of metagenomics has progressed immensely. Sampling and sequencing of metagenomic data is now prevalent, and publicly available data sets from mundane soil and water environments to exotic niche habitats such as geothermal hot springs are readily available through sequence data repositories such as the European Nucleotide Archive. Meanwhile, the computational resource requirements for a complete and comprehensive analysis of metagenomic data have escalated dramatically, due to a tremendous increase in data set sizes. To analyze and make sense of these samples, researchers can choose to employ public resources for metagenomic analysis. However, most of the available public resources provide generic analyses and are not suited for applications such as bioprospecting or samples from complex habitats such as the marine domain.
In this thesis, we introduce a metagenomic analysis pipeline coined META-pipe. With META-pipe, we aim to supply a public analysis resource catered for the marine domain, with an emphasis on analysis of full-length genes. META-pipe offers pre-processing, assembly, taxonomic classification and functional analysis of metagenomic sequence data. The pipeline has gone through several iterations, both in terms of functionality and implementation. In \textbf{Paper 1} we describe the initial version of META-pipe, including biological functionality, implementation details and integration with identity provider services, distributed storage, distributed computation and the Galaxy workflow manager. We evaluate the performance of META-pipe through two separate use cases, as presented in \textbf{Paper 2} and \textbf{Paper 3}. These use cases demonstrate the usability of META-pipe and gave us an opportunity to refine and enhance the pipeline through evaluation of biological results and computational performance characteristics.
In summary, this dissertation gives an overview of common strategies for metagenomic analysis in a pipeline context. It discusses the development of META-pipe through refinement and presents the current version. The pipeline is now a deliverable to the ELIXIR infrastructure, hence future versions of META-pipe will continue to improve and expand both in functionality and public usage, providing a sustainable resource for metagenomic analysis in years to come. | en_US |
dc.description.doctoraltype | ph.d. | en_US |
dc.description.popularabstract | We have developed a metagenomic analysis pipeline coined META-pipe. This software uses distributed computer systems to utilize extensive hardware resources and is accessible through a standalone web portal. Using this public analysis resource, researchers can make sense of their environmental samples with minimal effort. We have evaluated META-pipe through two biological use cases, both in terms of performance scalability and biological results. The first one is a pilot project in collaboration with the European Bioinformatics Institute, where we compare our pipeline with theirs, and refine it based on an evaluation. In the second use case we apply artificial neural nets to alleviate the burden of submitting metadata for users of metagenomic public resources, based on supervised training. Both use cases demonstrate the process of developing state-of-the-art analysis resource and refining it through evaluation. META-pipe is a deliverable to the ELIXIR infrastructure and will continue to expand and evolve in years to come. | en_US |
dc.description.sponsorship | UiT The Arctic University of Norway | en_US |
dc.description | The papers of this thesisi are not available in Munin. <br>
<p>
Paper 1: Robertsen, E. M., Kahlke, T., Raknes, I. A., Pedersen, E., Semb, E. K., Ernstsen, M., Bongo, L. A., Willassen. N. P.: “META-pipe – Pipeline annotation, analysis and visualization
of marine metagenomic sequence data”. Available in <a href= https://arxiv.org/abs/1604.04103> https://arxiv.org/abs/1604.04103. </a>
<p>
Paper 2: Robertsen, E. M., Denise, H. Mitchell, A. Finn, R. D., Bongo, L. A., Willassen, N. P.: “ELIXIR pilot action: Marine metagenomics – towards a domain specific set of sustainable services”. (Manuscript). Available in <a href=http://dx.doi.org/10.12688/f1000research.10443.1> 10.12688/f1000research.10443.1. </a>
<p>
Paper 3: Robertsen, E. M., Bongo, L. A., Willassen, N. P.: “Automatic Contextual Data Curation – Applying Artificial Neural Nets to Taxonomic Classifications of Metagenomes”. (Manuscript). | en_US |
dc.identifier.isbn | 978-82-8236-262-7 (trykt) og 978-82-8236-263-4 (pdf) | |
dc.identifier.uri | https://hdl.handle.net/10037/11180 | |
dc.language.iso | eng | en_US |
dc.publisher | UiT Norges arktiske universitet | en_US |
dc.publisher | UiT The Arctic University of Norway | en_US |
dc.rights.accessRights | openAccess | en_US |
dc.rights.holder | Copyright 2017 The Author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/3.0 | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) | en_US |
dc.subject | Metagenomics | en_US |
dc.subject | Bioinformatics | en_US |
dc.subject | VDP::Matematikk og Naturvitenskap: 400::Basale biofag: 470::Bioinformatikk: 475 | en_US |
dc.subject | VDP::Mathematics and natural science: 400::Basic biosciences: 470::Bioinformatics: 475 | en_US |
dc.subject | VDP::Medisinske Fag: 700::Basale medisinske, odontologiske og veterinærmedisinske fag: 710::Medisinsk genetikk: 714 | en_US |
dc.subject | VDP::Medical disciplines: 700::Basic medical, dental and veterinary science disciplines: 710::Medical genetics: 714 | en_US |
dc.title | META-pipe – Distributed Pipeline Analysis of Marine Metagenomic Sequence Data | en_US |
dc.type | Doctoral thesis | en_US |
dc.type | Doktorgradsavhandling | en_US |