Show simple item record

dc.contributor.authorBongo, Lars Ailo
dc.contributor.authorPedersen, Edvard
dc.contributor.authorErnstsen, Martin
dc.date.accessioned2016-03-09T14:08:04Z
dc.date.available2016-03-09T14:08:04Z
dc.date.issued2015-11-18
dc.description.abstractBiological data analysis is typically implemented using a deep pipeline that combines a wide array of tools and databases. These pipelines must scale to very large datasets, and consequently require parallel and distributed computing. It is therefore important to choose a hardware platform and underlying data management and processing systems well suited for processing large datasets. There are many infrastructure systems for such data-intensive computing. However, in our experience, most biological data analysis pipelines do not leverage these systems. We give an overview of data-intensive computing infrastructure systems, and describe how we have leveraged these for: (i) scalable fault-tolerant computing for large-scale biological data; (ii) incremental updates to reduce the resource usage required to update large-scale compendium; and (iii) interactive data analysis and exploration. We provide lessons learned and describe problems we have encountered during development and deployment. We also provide a literature survey on the use of data-intensive computing systems for biological data processing. Our results show how unmodified biological data analysis tools can benefit from infrastructure systems for data-intensive computing.en_US
dc.descriptionAccepted manuscript version. The final publication is available at Springer via <a href=http://dx.doi.org/10.1007/978-3-319-24462-4_22>http://dx.doi.org/10.1007/978-3-319-24462-4_22</a>.en_US
dc.identifier.citationLecture Notes in Computer Science 2015, 8623:259-272en_US
dc.identifier.cristinIDFRIDAID 1319765
dc.identifier.doi10.1007/978-3-319-24462-4_22
dc.identifier.issn1611-3349
dc.identifier.urihttps://hdl.handle.net/10037/8816
dc.identifier.urnURN:NBN:no-uit_munin_8358
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.rights.accessRightsopenAccess
dc.subjectdata-intensive computingen_US
dc.subjectbiological data analysisen_US
dc.subjectflexible pipelinesen_US
dc.subjectinfrastructure systemsen_US
dc.subjectVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551en_US
dc.titleData-intensive computing infrastructure systems for unmodified biological data analysis pipelinesen_US
dc.typeJournal articleen_US
dc.typeTidsskriftartikkelen_US
dc.typePeer revieweden_US


File(s) in this item

Thumbnail

This item appears in the following collection(s)

Show simple item record