Show simple item record

dc.contributor.advisorWillassen, Nils-Peder
dc.contributor.authorRobertsen, Espen Mikal
dc.date.accessioned2017-06-27T07:18:53Z
dc.date.available2017-06-27T07:18:53Z
dc.date.issued2017-05-19
dc.description.abstractWith the accelerated advances in sequencing technology the last decade, the field of metagenomics has progressed immensely. Sampling and sequencing of metagenomic data is now prevalent, and publicly available data sets from mundane soil and water environments to exotic niche habitats such as geothermal hot springs are readily available through sequence data repositories such as the European Nucleotide Archive. Meanwhile, the computational resource requirements for a complete and comprehensive analysis of metagenomic data have escalated dramatically, due to a tremendous increase in data set sizes. To analyze and make sense of these samples, researchers can choose to employ public resources for metagenomic analysis. However, most of the available public resources provide generic analyses and are not suited for applications such as bioprospecting or samples from complex habitats such as the marine domain. In this thesis, we introduce a metagenomic analysis pipeline coined META-pipe. With META-pipe, we aim to supply a public analysis resource catered for the marine domain, with an emphasis on analysis of full-length genes. META-pipe offers pre-processing, assembly, taxonomic classification and functional analysis of metagenomic sequence data. The pipeline has gone through several iterations, both in terms of functionality and implementation. In \textbf{Paper 1} we describe the initial version of META-pipe, including biological functionality, implementation details and integration with identity provider services, distributed storage, distributed computation and the Galaxy workflow manager. We evaluate the performance of META-pipe through two separate use cases, as presented in \textbf{Paper 2} and \textbf{Paper 3}. These use cases demonstrate the usability of META-pipe and gave us an opportunity to refine and enhance the pipeline through evaluation of biological results and computational performance characteristics. In summary, this dissertation gives an overview of common strategies for metagenomic analysis in a pipeline context. It discusses the development of META-pipe through refinement and presents the current version. The pipeline is now a deliverable to the ELIXIR infrastructure, hence future versions of META-pipe will continue to improve and expand both in functionality and public usage, providing a sustainable resource for metagenomic analysis in years to come.en_US
dc.description.doctoraltypeph.d.en_US
dc.description.popularabstractWe have developed a metagenomic analysis pipeline coined META-pipe. This software uses distributed computer systems to utilize extensive hardware resources and is accessible through a standalone web portal. Using this public analysis resource, researchers can make sense of their environmental samples with minimal effort. We have evaluated META-pipe through two biological use cases, both in terms of performance scalability and biological results. The first one is a pilot project in collaboration with the European Bioinformatics Institute, where we compare our pipeline with theirs, and refine it based on an evaluation. In the second use case we apply artificial neural nets to alleviate the burden of submitting metadata for users of metagenomic public resources, based on supervised training. Both use cases demonstrate the process of developing state-of-the-art analysis resource and refining it through evaluation. META-pipe is a deliverable to the ELIXIR infrastructure and will continue to expand and evolve in years to come.en_US
dc.description.sponsorshipUiT The Arctic University of Norwayen_US
dc.descriptionThe papers of this thesisi are not available in Munin. <br> <p> Paper 1: Robertsen, E. M., Kahlke, T., Raknes, I. A., Pedersen, E., Semb, E. K., Ernstsen, M., Bongo, L. A., Willassen. N. P.: “META-pipe – Pipeline annotation, analysis and visualization of marine metagenomic sequence data”. Available in <a href= https://arxiv.org/abs/1604.04103> https://arxiv.org/abs/1604.04103. </a> <p> Paper 2: Robertsen, E. M., Denise, H. Mitchell, A. Finn, R. D., Bongo, L. A., Willassen, N. P.: “ELIXIR pilot action: Marine metagenomics – towards a domain specific set of sustainable services”. (Manuscript). Available in <a href=http://dx.doi.org/10.12688/f1000research.10443.1> 10.12688/f1000research.10443.1. </a> <p> Paper 3: Robertsen, E. M., Bongo, L. A., Willassen, N. P.: “Automatic Contextual Data Curation – Applying Artificial Neural Nets to Taxonomic Classifications of Metagenomes”. (Manuscript).en_US
dc.identifier.isbn978-82-8236-262-7 (trykt) og 978-82-8236-263-4 (pdf)
dc.identifier.urihttps://hdl.handle.net/10037/11180
dc.language.isoengen_US
dc.publisherUiT Norges arktiske universiteten_US
dc.publisherUiT The Arctic University of Norwayen_US
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2017 The Author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/3.0en_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)en_US
dc.subjectMetagenomicsen_US
dc.subjectBioinformaticsen_US
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Basale biofag: 470::Bioinformatikk: 475en_US
dc.subjectVDP::Mathematics and natural science: 400::Basic biosciences: 470::Bioinformatics: 475en_US
dc.subjectVDP::Medisinske Fag: 700::Basale medisinske, odontologiske og veterinærmedisinske fag: 710::Medisinsk genetikk: 714en_US
dc.subjectVDP::Medical disciplines: 700::Basic medical, dental and veterinary science disciplines: 710::Medical genetics: 714en_US
dc.titleMETA-pipe – Distributed Pipeline Analysis of Marine Metagenomic Sequence Dataen_US
dc.typeDoctoral thesisen_US
dc.typeDoktorgradsavhandlingen_US


File(s) in this item

Thumbnail
Thumbnail

This item appears in the following collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)