dc.contributor.author | Bongo, Lars Ailo | |
dc.contributor.author | Pedersen, Edvard | |
dc.contributor.author | Ernstsen, Martin | |
dc.date.accessioned | 2016-03-09T14:08:04Z | |
dc.date.available | 2016-03-09T14:08:04Z | |
dc.date.issued | 2015-11-18 | |
dc.description.abstract | Biological data analysis is typically implemented using a deep pipeline that combines a wide array of tools and databases. These pipelines must scale to very large datasets, and consequently require parallel and distributed computing. It is therefore important to choose a hardware platform and underlying data management and processing systems well suited for processing large datasets. There are many infrastructure systems for such data-intensive computing. However, in our experience, most biological data analysis pipelines do not leverage these systems.
We give an overview of data-intensive computing infrastructure systems, and describe how we have leveraged these for: (i) scalable fault-tolerant computing for large-scale biological data; (ii) incremental updates to reduce the resource usage required to update large-scale compendium; and (iii) interactive data analysis and exploration. We provide lessons learned and describe problems we have encountered during development and deployment. We also provide a literature survey on the use of data-intensive computing systems for biological data processing. Our results show how unmodified biological data analysis tools can benefit from infrastructure systems for data-intensive computing. | en_US |
dc.description | Accepted manuscript version. The final publication is available at Springer via <a href=http://dx.doi.org/10.1007/978-3-319-24462-4_22>http://dx.doi.org/10.1007/978-3-319-24462-4_22</a>. | en_US |
dc.identifier.citation | Lecture Notes in Computer Science 2015, 8623:259-272 | en_US |
dc.identifier.cristinID | FRIDAID 1319765 | |
dc.identifier.doi | 10.1007/978-3-319-24462-4_22 | |
dc.identifier.issn | 1611-3349 | |
dc.identifier.uri | https://hdl.handle.net/10037/8816 | |
dc.identifier.urn | URN:NBN:no-uit_munin_8358 | |
dc.language.iso | eng | en_US |
dc.publisher | Springer | en_US |
dc.rights.accessRights | openAccess | |
dc.subject | data-intensive computing | en_US |
dc.subject | biological data analysis | en_US |
dc.subject | flexible pipelines | en_US |
dc.subject | infrastructure systems | en_US |
dc.subject | VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551 | en_US |
dc.title | Data-intensive computing infrastructure systems for unmodified biological data analysis pipelines | en_US |
dc.type | Journal article | en_US |
dc.type | Tidsskriftartikkel | en_US |
dc.type | Peer reviewed | en_US |