English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

The Gist of Everything New: Personalized Top-k Processing over Web 2.0 Streams

MPS-Authors
/persons/resource/persons45041

Michel,  Sebastian
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Haghani, P., Michel, S., & Aberer, K. (2010). The Gist of Everything New: Personalized Top-k Processing over Web 2.0 Streams. In X. J. Huang, G. Jones, N. Koudas, X. Wu, & K. Collins-Thompson (Eds.), Proceedings of the 19th ACM Conference on Information and Knowledge Management (pp. 489-498). New York, NY: ACM. doi:10.1145/1871437.1871502.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-153D-C
Abstract
Web 2.0 portals have made content generation easier than ever with millions of users contributing news stories in form of posts in weblogs or short textual snippets as in Twitter. Efficient and effective filtering solutions are key to allow users stay tuned to this ever-growing ocean of information, releasing only relevant trickles of personal interest. In classical information filtering systems, user interests are formulated using standard IR techniques and data from all available information sources is filtered based on a predefined absolute quality-based threshold. In contrast to this restrictive approach which may still overwhelm the user with the returned stream of data, we envision a system which continuously keeps the user updated with only the top-$k$ relevant new information. Freshness of data is guaranteedby considering it valid for a particular time interval, controlled by a sliding window. Considering relevance as relative to the existing pool of new information creates a highly dynamic setting. We present POL-filter which together with our maintenance module constitute an efficient solution to this kind of problem. We show by comprehensive performance evaluations using real world data, obtained from a weblog crawl, that our approach brings performance gains compared to state-of-the-art.