Sammendrag
With the accelerated advances in sequencing technology the last decade, the field of metagenomics has progressed immensely. Sampling and sequencing of metagenomic data is now prevalent, and publicly available data sets from mundane soil and water environments to exotic niche habitats such as geothermal hot springs are readily available through sequence data repositories such as the European Nucleotide Archive. Meanwhile, the computational resource requirements for a complete and comprehensive analysis of metagenomic data have escalated dramatically, due to a tremendous increase in data set sizes. To analyze and make sense of these samples, researchers can choose to employ public resources for metagenomic analysis. However, most of the available public resources provide generic analyses and are not suited for applications such as bioprospecting or samples from complex habitats such as the marine domain.
In this thesis, we introduce a metagenomic analysis pipeline coined META-pipe. With META-pipe, we aim to supply a public analysis resource catered for the marine domain, with an emphasis on analysis of full-length genes. META-pipe offers pre-processing, assembly, taxonomic classification and functional analysis of metagenomic sequence data. The pipeline has gone through several iterations, both in terms of functionality and implementation. In \textbf{Paper 1} we describe the initial version of META-pipe, including biological functionality, implementation details and integration with identity provider services, distributed storage, distributed computation and the Galaxy workflow manager. We evaluate the performance of META-pipe through two separate use cases, as presented in \textbf{Paper 2} and \textbf{Paper 3}. These use cases demonstrate the usability of META-pipe and gave us an opportunity to refine and enhance the pipeline through evaluation of biological results and computational performance characteristics.
In summary, this dissertation gives an overview of common strategies for metagenomic analysis in a pipeline context. It discusses the development of META-pipe through refinement and presents the current version. The pipeline is now a deliverable to the ELIXIR infrastructure, hence future versions of META-pipe will continue to improve and expand both in functionality and public usage, providing a sustainable resource for metagenomic analysis in years to come.
Beskrivelse
The papers of this thesisi are not available in Munin.
Paper 1: Robertsen, E. M., Kahlke, T., Raknes, I. A., Pedersen, E., Semb, E. K., Ernstsen, M., Bongo, L. A., Willassen. N. P.: “META-pipe – Pipeline annotation, analysis and visualization
of marine metagenomic sequence data”. Available in https://arxiv.org/abs/1604.04103.
Paper 2: Robertsen, E. M., Denise, H. Mitchell, A. Finn, R. D., Bongo, L. A., Willassen, N. P.: “ELIXIR pilot action: Marine metagenomics – towards a domain specific set of sustainable services”. (Manuscript). Available in 10.12688/f1000research.10443.1.
Paper 3: Robertsen, E. M., Bongo, L. A., Willassen, N. P.: “Automatic Contextual Data Curation – Applying Artificial Neural Nets to Taxonomic Classifications of Metagenomes”. (Manuscript).