English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

MPS-Authors
/persons/resource/persons45609

Theobald,  Martin
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44076

Bast,  Holger
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44972

Majumdar,  Debapriyo
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel,  Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Theobald, M., Bast, H., Majumdar, D., Schenkel, R., & Weikum, G. (2008). TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. VLDB Journal, 17(1), 81-115. doi:10.1007/s00778-007-0072-z.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-1D3B-3
Abstract
Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-$k$ retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the $k$ top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dy\-namic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: 1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, 2) efficient and effective top-$k$ query processing for semistructured data, 3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and \linebreak query expansion, and 4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.