Now showing items 12-23 of 23
| Abstract: | Genomics is the study of the genomes of organisms. Metagenomics is the study of environmental genomic samples. For both genomics and metagenomics DNA sequencing, and the analysis of these sequences, is an important tool. This analysis is done through integration of sequence data with existing meta-data collections. Genomics is the study of the genomes of organisms, and involves cultivating organisms in a lab and analyzing them. Metagenomics is the study of genomic samples collected directly from the environment, allowing researchers to study organisms that are difficult to cultivate in a petri dish. DNA sequencing and the analysis of these sequences is an important tool for both genomics and metagenomics. The integration of the data produced by sequencing with existing meta-data collections is particularly interesting for metagenomics, as a single biological sample can contain thousands of different organisms. The recent developments in DNA sequencing technology mean that the volume of data that can be produced per dollar is increasing faster than the volume of data that can be analyzed and stored per dollar. This data growth means that the initial analysis of these massive data sets becomes increasingly expensive. In addition, there is a need to periodically update old results using new meta-data from the many knowledge bases (meta-data collections) for biological data. Today, this typically requires rerunning the experimental analysis. Such incremental analysis is interesting for metagenomics since environmental samples potentially contain thousands of organisms. In metagenomic analysis, different sets of tools are used depending on the type of information required. These tools are generally arranged in a pipeline, where the output files of one tool acts as the input for the next. The analysis done by some steps is dependent on different meta-data collections. When meta-data is updated, these steps and all subsequent steps typically need to be executed again. Incremental updates can save significant computation time by running these pipelines against the updated segments, rather than the full meta-data collections. We believe that systems for incremental updates for metagenomic analysis pipelines have the following requirements; (i) reduce the computational resource requirements by using incremental update techniques (ii) the meta-data collections should be accessible without the use of proprietary or computationally expensive techniques (iii) do the incremental updates on demand, due to different needs of experiments, through handling meta-data updates and generating arbitrary delta meta-data collections (iv) support most genomic analysis tools and run on most job management systems (v) no changes should be made to the tools that the pipeline is comprised of, since modifying the many available tools is impractical (vi) the changes to the job management and resource allocation system should be minimal, to save implementation time for the pipeline system maintainer (vii) maintain a view of previous meta-data collections, so old experiments can be repeated with the correct meta-data collection version. To our knowledge no existing incremental update systems satisfy all seven requirements. Often they do not support on-demand processing or maintaining views of old data, in addition many systems require computations to be done within a specific framework or programming language. In this thesis we describe the GeStore incremental update system which satisfies all seven requirements. GeStore reduces the size of the meta-data collections, and thus the computational requirements for the pipeline, by leveraging incremental update techniques, satisfying requirements (i) and (iii). In addition it reduces the storage requirements of the meta-data collections, while still maintaining a complete view of the meta-data collection in a plain-text format, fulfilling requirement (ii) and (vii). It also presents a simple interface to the application programmer, so that integrating the system with existing pipeline solutions does not require large changes to the pipeline system or tools, in accordance with requirements (vi), (iv) and (v). GeStore has been implemented using the MapReduce framework, along with HBase, to provide scalable meta-data processing. We demonstrate the system by generating subsets of meta-data collections for use by the widely used genomic tool BLAST. In our evaluation, we have integrated GeStore with an existing pipelining system, GePan; a metagenomic pipeline system developed for a local biotech company in Tromsø, Norway, and used real-world data to evaluate the performance and benefits of GeStore. Our experimental results show that GeStore is able to reduce the runtime of the incremental updates by up to 65\% when compared to unmodified GePan, while introducing a low storage overhead and requiring minimal changes to GePan. We beleive that efficient on-demand updates of metagenomic data, as provided by GeStore, will be useful to our biology collaborators. |
| URI: | http://hdl.handle.net/10037/4272 |
| Abstract: | In recent years, social network providers has become one of the largest industries in the world. These networks created a new arena for sharing information over the Internet, and thus changed the way people interact with each other. Hundreds of millions of social network users are updating statuses and sending messages to each other every day. These interactions produce vast amounts of social data. This data is the core of the social network providers business model, and it is sold to large companies to perform personalized advertisement, brand monitoring and viral marketing. The price of this data can be intimidating, and some might be unable or unwilling to pay for it because of its price. If the data was freely available, research that could benefit from this data would be derived more freely, leading to new knowledge. This thesis presents Harvest, a collaborative system for retrieving social data. Harvest is a peer-to-peer system consisting of contributing social network users, inspired by public resource computing. Harvest shares social network account-bound resources to retrieve large social data sets. Contribution is achieved by running an application on the contributors computer like other public resource computing system such as the @home systems. The system implements retrieval of data from Twitter. Experiments on real Twitter data show that the system scales with increased contribution. The data retrieval bandwidth per contributing user is quite low, and the number of contributors needed to achieve a considerably large data retrieval bandwidth is high, but there are no associated financial costs with the system. Harvest would benefit greatly by retrieving data from more sources as this would increase its data retrieval bandwidth, in addition to offer more abundant data. |
| URI: | http://hdl.handle.net/10037/4248 |
| Abstract: | The ability to deliver computing as a metered service has made the cloud an attractive platform for deployment of applications. Using the cloud, enterprises experience a decrease in maintenance overhead, faster deployment, and that cloud elasticity can be exploited to meet fluctuating resource demands. |
| URI: | http://hdl.handle.net/10037/4263 |
| Abstract: | Graphic Processing Units have during the recent years evolved into inexpensive high-performance many-core computing units. Earlier being accessible only by graphic APIs, new hardware architectures and programming tools have made it possible to program these devices using arbitrary data types and standard languages like C. This thesis investigates the development process and performance of image and video processing algorithms on graphic processing units, regardless of vendors. The tool used for programming the graphic processing units is OpenCL, a rela- tively new specification for heterogenous computing. Two image algorithms are investigated, bilateral filter and histogram. In addition, an attempt have been tried to make a template-based solution for generation and auto-optimalization of device code, but this approach seemed to have some shortcomings to be usable enough at this time. |
| URI: | http://hdl.handle.net/10037/4346 |
| Abstract: | In looking at the XMPP protocol as an alternative to the ordinary way of transferring files within a health network setting, namely e-mail, performance and security are important factors to consider. For security reasons we preferred to use in-band over out-of-band file transfer. The tradeoff is that this method puts a higher strain on the XMPP server and is significantly slower than its counterpart, out-of-band. In researching a specific XMPP implementation, the Openfire XMPP server, and looking into how it deals with in-band file transfers, we have found some ways to increase in-band file transfer performance, but not in the originally intended way, which would be through improvements in the Openfire source code concerning in-band file transfers. |
| URI: | http://hdl.handle.net/10037/2243 |
| Abstract: | Images are commonly used on a daily basis for research, information and entertainment. The introduction of digital cameras and especially the incorporation of cameras in mobile phones makes people able to snap photos almost everywhere at any time since their mobile phone is almost always brought with them. The fast evolution in hardware enables users to store large image collection without high costs. Making use of these image collections requires efficient image retrieval techniques. Traditional image retrieval techniques like text-based image retrieval and content-based image retrieval techniques have shortcomings. New techniques or combination of existing techniques must be established to provide users with adequate image retrieval functionality. This thesis describes two systems enabling users to retrieve information such as images, textual information, WAP-links or videos using SMS or MMS. One of the services, M2S is meant for tourists to retrieve information about attractions in Lofoten. M2S uses content-based image retrieval to retrieve the information requested. This service is designed and implemented in cooperation with Telenor R&I. The other system, CAIR is meant for users who want to retrieve images from an image collection using SMS. CAIR uses a context-based image retrieval to retrieve images. This system is designed, but not yet implemented. |
| URI: | http://hdl.handle.net/10037/1141 |
| Abstract: | I denne hovedfagsavhandlinga undersøkes avbildningsmekanismer mellom ANSAware applikasjoner og en føderativ omgivelse. Dette gjøres innefor rammen av ODS-gruppas arbeid med samvirkende informasjonssystemer. For å få kunnskap om problemområdet og om hvordan avbildning effektivt kan utføres, utvikles et rammeverk for modellering, design og implementasjon av avbildningsmekanismer. Vi fokuserer spesielt på hvordan man i føderasjonen kan gi inntrykk av ANSAware objekter som persistente. Vi baserer oss på en persistensmodell som innebærer at vi stiller ulike krav til objekt-identitet. Vi trenger bare permanent identitet for noen få objekter. For disse er det behov for mekanismer for transparent forvaltning (aktivisering/passivisering). Vi undersøker to logiske komponenter som samarbeider om avbildning: Objekt-adapter som har ansvaret for forvaltning og objekt-identitet og språkbindinger som representerer programmeringsgrensesnitt for den aktuelle klient-omgivelse og realiserer aksess-transparens ved hjelp av stubs. Vi innfører begrepet proxy-objekt som representerer identifikasjon av objekter i objekt-adapter og presenterer en konseptuell modell for interaksjon med klient. Et gjenbrukbart objekt-orientert rammeverk er realisert. Dette representerer design og delvis implementasjon av objekt-adapter. Denne påbygges med applikasjonsspesifikk software for å bli komplett. En notasjon for definisjon av forvaltning av permanente objekter blir utviklet. Egenskaper ved språkbindinger blir undersøkt. Her gjøres et skille mellom direkte binding hvor klient er i samme prosess og binding via eksplisistt grensesnitt (kanonisk språk). Ei språkbinding til C++ er realisert, og det er skissert et rammeverk for binding til FRIL som er et funksjonelt og objektorientert språk for integrasjon og samvirke mellom ulike informasjonssystemer. |
| Description: | Dette er en hovedoppgave |
| URI: | http://hdl.handle.net/10037/1248 |
| Abstract: | The domain of sports analysis is a huge field in sports science. Several different computer systems are available for doing analysis, both expensive and less expensive. Some specialize in specific sports such as football or ice hockey, while others are sports agnostic. However, a common property of most of these systems is that they try to give in-depth and detailed analysis of the sport in question. This thesis proposes and describes a system that provides the user with the ability to annotate interesting happenings during a live sporting event, through a non-invasive mobile device interface. The device permits focus on important happenings by filtering out unnecessary detail. Our system provides corresponding video of the annotations on the same mobile device, thereby facilitating the process of giving video feedback to the involved coaches and players. We have implemented a prototype of the system that enables evaluation of this idea, and through case studies with Tromsø Idrettslag, a Norwegian Premier League football club, we show its usefulness and applicability. |
| URI: | http://hdl.handle.net/10037/4262 |
| Abstract: | Existing networked filesystems are usually either client/server - allowing storage only on one node - hard to use, or both. The advanced ones also like to use their own on-disk format, complicating migration both ways. Skynet attempts to remedy this. It is a distributed filesystem with master/slave redundancy that is easy to use, relatively safe for your data and can be easily converted to/from a non-distributed filesystem. It uses an existing filesystem for file storage, takes care of its own maintenance as far as possible, and supports approximate POSIX semantics, with POSIX, eventual and session coherency modes. It favors speed over correctness, where this would rarely be noticed. Part of the project includes a cryptographically secure message-passing middleware for Haskell called Hermes, with distribution transparency and a gossip system. High-level SHA2 and AES bindings are also included. The Hermes portion of the project is complete and usable, the Skynet portion is not. Skynet and Hermes are designed for small-to-medium networks, achieving optimal performance at this size and suffering significant degradation in large networks. They are meant to be used for non-administrated applications in home networks. Skynet offers single-bit security. The network is encrypted and authenticated, but there is no further security inside the network. |
| URI: | http://hdl.handle.net/10037/2543 |
| Abstract: | This dissertation studies composing a super sensor network from the combination of three functional sensor networks; A Sensor data producing network, a sensor data computing network and a sensor controlling network. The target devices are today labeled as large sensor nodes. The communication are based on an IP network using HTTP as the main protocol. Bonjour is used for service discovery, with some adjustments for technical reasons. This allows for naming and location of available services without centralized servers, and it is implementable in small devices. A super sensor network for meteorological observations is emulated using a computer cluster. The emulated measurements are accessed from stations available from observation collection systems accessible on the Internet. Images from web cameras are one kind of observation type used. The implemented system uses Python for rapid prototyping and for support for multiple operating systems. This dissertation demonstrates that the selected technology and architecture may handle some of the demands in a sensor network, and that the architecture gives new opportunities on how to handle updates and sensor network control. The implemented system also demonstrates that using standard Internet protocols can make access to services in the sensor network easy. A web browser may become the preferred user interface for controlling and accessing all parts of the sensor network, as it has for controlling printers and simple network devices. |
| URI: | http://hdl.handle.net/10037/1445 |
| Abstract: | I dag er elektronisk post en naturlig måte å kommunisere på. Dessverre er det enkelte brukergrupper som ikke kan utnytte denne teknologien. Vårt fokus er barn i alderen 4-8 år. Symbo prosjektet tar sikte på å gi disse brukerne muligheten til nettopp dette. Målet med dette prosjektet er å lage et symbolbasert språk kalt SymboL. Dette er en samling symboler som gjennom bruk gir et begrenset ordforråd, setningsoppbygning og grammatikk. SymboL skal så benyttes i en applikasjon kalt Symbo, som sender og mottar symbolbasert e-post. I utviklingen av SymboL har vi samarbeidet med barn i alderen 4-8 år. Vi har drevet med lavteknologisk prototypearbeid, for å få de data som ga oss våre kriterier til design av SymboL. Barna har bidratt med informasjon om hvordan symbolene skulle se ut, og hvordan de skulle grupperes. Etter design og implementasjon, ble SymboL testet sammen med applikasjonen Symbo. Barna var også her med og evaluerte både SymboL og Symbo. Det var helt nødvendig for oss å bruke publiserte metoder for utvikling av programvare til barn. Blant annet var bruk av videokamera for å dokumentere vårt arbeid uunnværlig. Vi gjennomførte vårt mål med å gi barna et symbolbasert E-post system der de kunne kommunisere med andre. Barna både likte og mestret denne formen for kommunikasjon. Det var liten tid til å teste SymboL språket skikkelig, men vi fikk indikasjoner om at SymboL kan forbedres. |
| Description: | Dette er en hovedoppgave |
| URI: | http://hdl.handle.net/10037/1043 |
| Abstract: | To achieve low overhead, traditional cluster monitoring systems sample data at low frequencies and with coarse granularity. However, interactive monitoring requires frequent (up to 60 Hz) sampling of fine-grained data and visualization tools that can explore and display data in near real-time. This makes traditional cluster monitoring systems unsuited for interactive monitoring of distributed cluster applications, as they fail to capture short-duration events, making understanding the performance relationship between processes on the same or different nodes difficult. To address this issue, WallMon was developed, a tool for interactive visual exploration of performance behaviors in distributed systems. For gathering of data, WallMon is centered around an abstraction of collectors and handlers; collectors gathers data of interest, such as CPU and memory usage, and forwards it to handlers in a push-based fashion, while handlers take action upon the data. WallMon captures and visualizes data for every process on every node, as well as overall node statistics. Data is visualized using a technique inspired by the concept of information flocking. WallMon's design is based on the client-server model, and it is extensible through a module system that encapsulates functionality specific to monitoring (collectors) and visualization (handlers). A set of experiments have been carried out on a cluster of 29 nodes with 180 processes per node. Performance results show 7% (of 100) CPU usage at 64 Hz sampling rate when performing process-level monitoring with WallMon. Using WallMon's interactive visualization, we have observed interesting patterns in different parallel and distributed systems, such as unexpected ratio of user- and kernel-level execution among processes in a particular distributed system. |
| URI: | http://hdl.handle.net/10037/3991 |
Now showing items 12-23 of 23
Munin is powered by DSpace 1.8.2
The University Library of Tromsø, N-9037 Tromsø
Tel: +47 77 64 40 00, E-mail: munin@ub.uit.no