Harvest : a collaborative system for distributed retrieval of social data
In recent years, social network providers has become one of the largest industries in the world. These networks created a new arena for sharing information over the Internet, and thus changed the way people interact with each other. Hundreds of millions of social network users are updating statuses and sending messages to each other every day. These interactions produce vast amounts of social data. This data is the core of the social network providers business model, and it is sold to large companies to perform personalized advertisement, brand monitoring and viral marketing. The price of this data can be intimidating, and some might be unable or unwilling to pay for it because of its price. If the data was freely available, research that could benefit from this data would be derived more freely, leading to new knowledge. This thesis presents Harvest, a collaborative system for retrieving social data. Harvest is a peer-to-peer system consisting of contributing social network users, inspired by public resource computing. Harvest shares social network account-bound resources to retrieve large social data sets. Contribution is achieved by running an application on the contributors computer like other public resource computing system such as the @home systems. The system implements retrieval of data from Twitter. Experiments on real Twitter data show that the system scales with increased contribution. The data retrieval bandwidth per contributing user is quite low, and the number of contributors needed to achieve a considerably large data retrieval bandwidth is high, but there are no associated financial costs with the system. Harvest would benefit greatly by retrieving data from more sources as this would increase its data retrieval bandwidth, in addition to offer more abundant data.
PublisherUniversitetet i Tromsø
University of Tromsø
The following license file are associated with this item:
Showing items related by title, author, creator and subject.
Marzullo, Keith; Lauvset, Kåre J.; Johansen, Dag (Research report; Forskningsrapport, 2000-03-02)Distributed systems are becoming harder to manage, in part because the uses we put to distributed systems are rapidly changing. Hence, the software used to manage a distributed system needs to be flexible enough to accommodate ...
Characterization of sub-seabed fluid flow and hydrate systems at Nyegga, offshore mid- Norway : integration of seismic imaging and velocity modeling Plaza-Faverola, Andreia Aletia (Doctoral thesis; Doktorgradsavhandling, 2010-11-12)What causes the escape of natural gases to the seafloor and how significant this escape has been along the mid-Norwegian continental margin? These are questions that guided the doctoral thesis. The thesis is part of a ...
Gjerdrum, Anders Tungeland (Master thesis; Mastergradsoppgave, 2012-07-06)The advent of cloud computing alongside with pervasive form factors such as smart devices, introduces a new meaning to asymmetric system models. These new clients act as a presentational layer alleviating much of the ...