Harvest : a collaborative system for distributed retrieval of social data

DSpace/Manakin Repository

Harvest : a collaborative system for distributed retrieval of social data


Tweet Share on Facebook
Title: Harvest : a collaborative system for distributed retrieval of social data
Author: Kreutzer, Tor
Date: 11-Jun-2012
Type: Master thesis; Mastergradsoppgave
Abstract: In recent years, social network providers has become one of the largest industries in the world. These networks created a new arena for sharing information over the Internet, and thus changed the way people interact with each other. Hundreds of millions of social network users are updating statuses and sending messages to each other every day. These interactions produce vast amounts of social data. This data is the core of the social network providers business model, and it is sold to large companies to perform personalized advertisement, brand monitoring and viral marketing. The price of this data can be intimidating, and some might be unable or unwilling to pay for it because of its price. If the data was freely available, research that could benefit from this data would be derived more freely, leading to new knowledge. This thesis presents Harvest, a collaborative system for retrieving social data. Harvest is a peer-to-peer system consisting of contributing social network users, inspired by public resource computing. Harvest shares social network account-bound resources to retrieve large social data sets. Contribution is achieved by running an application on the contributors computer like other public resource computing system such as the @home systems. The system implements retrieval of data from Twitter. Experiments on real Twitter data show that the system scales with increased contribution. The data retrieval bandwidth per contributing user is quite low, and the number of contributors needed to achieve a considerably large data retrieval bandwidth is high, but there are no associated financial costs with the system. Harvest would benefit greatly by retrieving data from more sources as this would increase its data retrieval bandwidth, in addition to offer more abundant data.
Publisher: Universitetet i Tromsø; University of Tromsø
URI: http://hdl.handle.net/10037/4248

File(s) in this item

Files Size Format View Description
Harvest.zip 107.3Kb Unknown View/Open Source code of the thesis
thesis.pdf 916.5Kb PDF View/Open

The following license file are associated with this item:

This item appears in the following collection(s)

Show full item record

Search Munin

Advanced Search