| Abstract: | Graphic Processing Units have during the recent years evolved into inexpensive high-performance many-core computing units. Earlier being accessible only by graphic APIs, new hardware architectures and programming tools have made it possible to program these devices using arbitrary data types and standard languages like C. This thesis investigates the development process and performance of image and video processing algorithms on graphic processing units, regardless of vendors. The tool used for programming the graphic processing units is OpenCL, a rela- tively new specification for heterogenous computing. Two image algorithms are investigated, bilateral filter and histogram. In addition, an attempt have been tried to make a template-based solution for generation and auto-optimalization of device code, but this approach seemed to have some shortcomings to be usable enough at this time. |
| URI: | http://hdl.handle.net/10037/4346 |
| Abstract: | Genomics is the study of the genomes of organisms. Metagenomics is the study of environmental genomic samples. For both genomics and metagenomics DNA sequencing, and the analysis of these sequences, is an important tool. This analysis is done through integration of sequence data with existing meta-data collections. Genomics is the study of the genomes of organisms, and involves cultivating organisms in a lab and analyzing them. Metagenomics is the study of genomic samples collected directly from the environment, allowing researchers to study organisms that are difficult to cultivate in a petri dish. DNA sequencing and the analysis of these sequences is an important tool for both genomics and metagenomics. The integration of the data produced by sequencing with existing meta-data collections is particularly interesting for metagenomics, as a single biological sample can contain thousands of different organisms. The recent developments in DNA sequencing technology mean that the volume of data that can be produced per dollar is increasing faster than the volume of data that can be analyzed and stored per dollar. This data growth means that the initial analysis of these massive data sets becomes increasingly expensive. In addition, there is a need to periodically update old results using new meta-data from the many knowledge bases (meta-data collections) for biological data. Today, this typically requires rerunning the experimental analysis. Such incremental analysis is interesting for metagenomics since environmental samples potentially contain thousands of organisms. In metagenomic analysis, different sets of tools are used depending on the type of information required. These tools are generally arranged in a pipeline, where the output files of one tool acts as the input for the next. The analysis done by some steps is dependent on different meta-data collections. When meta-data is updated, these steps and all subsequent steps typically need to be executed again. Incremental updates can save significant computation time by running these pipelines against the updated segments, rather than the full meta-data collections. We believe that systems for incremental updates for metagenomic analysis pipelines have the following requirements; (i) reduce the computational resource requirements by using incremental update techniques (ii) the meta-data collections should be accessible without the use of proprietary or computationally expensive techniques (iii) do the incremental updates on demand, due to different needs of experiments, through handling meta-data updates and generating arbitrary delta meta-data collections (iv) support most genomic analysis tools and run on most job management systems (v) no changes should be made to the tools that the pipeline is comprised of, since modifying the many available tools is impractical (vi) the changes to the job management and resource allocation system should be minimal, to save implementation time for the pipeline system maintainer (vii) maintain a view of previous meta-data collections, so old experiments can be repeated with the correct meta-data collection version. To our knowledge no existing incremental update systems satisfy all seven requirements. Often they do not support on-demand processing or maintaining views of old data, in addition many systems require computations to be done within a specific framework or programming language. In this thesis we describe the GeStore incremental update system which satisfies all seven requirements. GeStore reduces the size of the meta-data collections, and thus the computational requirements for the pipeline, by leveraging incremental update techniques, satisfying requirements (i) and (iii). In addition it reduces the storage requirements of the meta-data collections, while still maintaining a complete view of the meta-data collection in a plain-text format, fulfilling requirement (ii) and (vii). It also presents a simple interface to the application programmer, so that integrating the system with existing pipeline solutions does not require large changes to the pipeline system or tools, in accordance with requirements (vi), (iv) and (v). GeStore has been implemented using the MapReduce framework, along with HBase, to provide scalable meta-data processing. We demonstrate the system by generating subsets of meta-data collections for use by the widely used genomic tool BLAST. In our evaluation, we have integrated GeStore with an existing pipelining system, GePan; a metagenomic pipeline system developed for a local biotech company in Tromsø, Norway, and used real-world data to evaluate the performance and benefits of GeStore. Our experimental results show that GeStore is able to reduce the runtime of the incremental updates by up to 65\% when compared to unmodified GePan, while introducing a low storage overhead and requiring minimal changes to GePan. We beleive that efficient on-demand updates of metagenomic data, as provided by GeStore, will be useful to our biology collaborators. |
| URI: | http://hdl.handle.net/10037/4272 |
| Abstract: | The ability to deliver computing as a metered service has made the cloud an attractive platform for deployment of applications. Using the cloud, enterprises experience a decrease in maintenance overhead, faster deployment, and that cloud elasticity can be exploited to meet fluctuating resource demands. |
| URI: | http://hdl.handle.net/10037/4263 |
| Abstract: | The domain of sports analysis is a huge field in sports science. Several different computer systems are available for doing analysis, both expensive and less expensive. Some specialize in specific sports such as football or ice hockey, while others are sports agnostic. However, a common property of most of these systems is that they try to give in-depth and detailed analysis of the sport in question. This thesis proposes and describes a system that provides the user with the ability to annotate interesting happenings during a live sporting event, through a non-invasive mobile device interface. The device permits focus on important happenings by filtering out unnecessary detail. Our system provides corresponding video of the annotations on the same mobile device, thereby facilitating the process of giving video feedback to the involved coaches and players. We have implemented a prototype of the system that enables evaluation of this idea, and through case studies with Tromsø Idrettslag, a Norwegian Premier League football club, we show its usefulness and applicability. |
| URI: | http://hdl.handle.net/10037/4262 |
| Abstract: | In recent years, social network providers has become one of the largest industries in the world. These networks created a new arena for sharing information over the Internet, and thus changed the way people interact with each other. Hundreds of millions of social network users are updating statuses and sending messages to each other every day. These interactions produce vast amounts of social data. This data is the core of the social network providers business model, and it is sold to large companies to perform personalized advertisement, brand monitoring and viral marketing. The price of this data can be intimidating, and some might be unable or unwilling to pay for it because of its price. If the data was freely available, research that could benefit from this data would be derived more freely, leading to new knowledge. This thesis presents Harvest, a collaborative system for retrieving social data. Harvest is a peer-to-peer system consisting of contributing social network users, inspired by public resource computing. Harvest shares social network account-bound resources to retrieve large social data sets. Contribution is achieved by running an application on the contributors computer like other public resource computing system such as the @home systems. The system implements retrieval of data from Twitter. Experiments on real Twitter data show that the system scales with increased contribution. The data retrieval bandwidth per contributing user is quite low, and the number of contributors needed to achieve a considerably large data retrieval bandwidth is high, but there are no associated financial costs with the system. Harvest would benefit greatly by retrieving data from more sources as this would increase its data retrieval bandwidth, in addition to offer more abundant data. |
| URI: | http://hdl.handle.net/10037/4248 |
Munin is powered by DSpace 1.8.2
The University Library of Tromsø, N-9037 Tromsø
Tel: +47 77 64 40 00, E-mail: munin@ub.uit.no