Transparent Incremental Updates for Genomics Data Analysis Pipelines

Pedersen, Edvard; Willassen, Nils Peder; Bongo, Lars Ailo

(PDF)

Dato

2014

Type

Chapter
Bokkapittel

Forfatter

Pedersen, Edvard; Willassen, Nils Peder; Bongo, Lars Ailo

Sammendrag

A large up-to-date compendium of integrated genomic data is often required for biological data analysis. The compendium can be tens of terabytes in size, and must often be frequently updated with new experimental or meta-data. Manual compendium update is cumbersome, requires a lot of unnecessary computation, and it may result in errors or inconsistencies in the compendium. We propose a transparent file based approach for adding incremental update ca-pabilities to unmodified genomics data analysis tools and pipeline workflow managers. This approach is implemented in the GeStore system. We evaluate GeStore using a real world genomics compendium. Our results show that it is easy to add incremental updates to genomics data processing pipelines, and that incremental updates can reduce the computation time such that it becomes prac-tical to maintain large-scale up-to-date genomics compendia on small clusters.

Beskrivelse

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-54420-0_31

Forlag

Springer Berlin Heidelberg

Serie

Euro-Par 2013: Parallel Processing Workshops, Lecture Notes in Computer Science, Vol. 8374, 2014, pp 311-320

Metadata

Vis full innførsel

Samlinger

Artikler, rapporter og annet (informatikk) [486]

Følgende lisensfil er knyttet til denne innførselen:

Original lisens