Improving freshness of web-extracted metadata
Permanent link
https://hdl.handle.net/10037/2400Date
2009-12-18Type
Master thesisMastergradsoppgave
Author
Heimdal, Tord-ArneAbstract
Live video search is emerging as a platform for multimedia production and entertainment service.
Such systems rely on a stream of live video and metadata describing the video content.
A high quality source for such metadata can be found on the web.
Identifying and extracting metadata from web pages can be done by crawling and scraping.
However, general crawler politeness rules limit per-site polling frequency, and therefore the freshness of the retrieved data is also limited.
% our solution
In this thesis we present a metadata extraction system capable of combining high metadata freshness, while at the same time adhering to polling politeness rules.
To achieve this, the proposed solution uses a pool of web sources containing overlapping information scheduled in a round-robin fashion.
% evaluation
Our experiments and analysis show that our system is capable of keeping the average metadata freshness higher than any single-source solution, while at the same time adhere to polling politeness rules.
Publisher
Universitetet i TromsøUniversity of Tromsø
Metadata
Show full item recordCollections
Copyright 2009 The Author(s)
The following license file are associated with this item: