Search
Now showing items 1-10 of 18
The metagenomic data life-cycle: standards and best practices
(Journal article; Tidsskriftartikkel; Peer reviewed, 2017-08-01)
Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonized way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (i) material sampling, (ii) material sequencing, (iii) data analysis, and (iv) data archiving and publishing. Taking examples from marine ...
IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks
(Journal article; Tidsskriftartikkel; Peer reviewed, 2012)
Integrative multi-species prediction (IMP) is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides a framework for biologists to analyze their candidate gene sets in the context of functional networks, as they expand or ...
Transparent Incremental Updates for Genomics Data Analysis Pipelines
(Chapter; Bokkapittel, 2014)
A large up-to-date compendium of integrated genomic data is often required for biological data analysis. The compendium can be tens of terabytes in size, and must often be frequently updated with new experimental or meta-data. Manual compendium update is cumbersome, requires a lot of unnecessary computation, and it may result in errors or inconsistencies in the compendium. We propose a transparent ...
Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs
(Journal article; Tidsskriftartikkel; Peer reviewed, 2019-06-06)
We have attempted to reproduce the results in <i>Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs</i>, published in JAMA 2016; 316(22), using publicly available data sets. We re-implemented the main method in the original study since the source code is not available. The original study used non-public fundus images from EyePACS ...
Kvik: three-tier data exploration tools for flexible analysis of genomic data in epidemiological studies
(Journal article; Tidsskriftartikkel; Peer reviewed, 2015-03-30)
Kvik is an open-source system that we developed for explorative analysis of functional genomics data from large epidemiological studies. Creating such studies requires a significant amount of time and resources. It is therefore usual to reuse the data from one study for several research projects. Often each project requires implementing new analysis code, integration with specific knowledge bases, ...
Transparent Incremental Updates for Genomics Data Analysis Pipelines
(Chapter; Bokkapittel, 2014)
Using a virtual event space to understand parallel application communication behavior
(Research report; Forskningsrapport, 2003)
We have developed EventSpace, a configurable data collecting, management and observation system for monitoring low-level synchronization and communication events with the purpose of understanding the behavior of parallel applications on clusters and multi-clusters. Applications are instrumented by adding data collecting code in the form of event collectors to an applications communication paths. ...
Evaluating the performance of the allreduce collective operation on clusters. Approach and results
(Research report; Forskningsrapport, 2004)
The performance of the collective operations provided by a communication library is important for many applications run on clusters. The communication structure of collective operations can be organized as a tree. Performance can be improved by configuring and mapping the tree to the clusters in use. We describe and demonstrate an approach for evaluating the performance of different configurations ...
The Longcut Wide Area Network Emulator. Design and Evaluation
(Research report; Forskningsrapport, 2005)
Experiments run on a Grid, consisting of clusters administered by multiple organizations connected by shared wide area networks (WANs), may not be reproducible. First, traffic on the WAN cannot be controlled. Second, allocating the same resources for subsequent experiments can be difficult. Longcut solves both problems by splitting a single cluster into several parts, and for each part having one ...
Mr. Clean: A Tool for Tracking and Comparing the Lineage of Scientific Visualization Code
(Conference object; Konferansebidrag, 2014)