Improving freshness of web-extracted metadata
Live video search is emerging as a platform for multimedia production and entertainment service. Such systems rely on a stream of live video and metadata describing the video content. A high quality source for such metadata can be found on the web. Identifying and extracting metadata from web pages can be done by crawling and scraping. However, general crawler politeness rules limit per-site polling frequency, and therefore the freshness of the retrieved data is also limited. % our solution In this thesis we present a metadata extraction system capable of combining high metadata freshness, while at the same time adhering to polling politeness rules. To achieve this, the proposed solution uses a pool of web sources containing overlapping information scheduled in a round-robin fashion. % evaluation Our experiments and analysis show that our system is capable of keeping the average metadata freshness higher than any single-source solution, while at the same time adhere to polling politeness rules.
PublisherUniversitetet i Tromsø
University of Tromsø
The following license file are associated with this item: