TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

Theobald, Martin; Bast, Holger; Majumdar, Debapriyo; Schenkel, Ralf; Weikum, Gerhard

doi:10.1007/s00778-007-0072-z

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

MPS-Authors

/persons/resource/persons45609

Theobald, Martin
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44076

Bast, Holger
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44972

Majumdar, Debapriyo
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel, Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Theobald, M., Bast, H., Majumdar, D., Schenkel, R., & Weikum, G. (2008). TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. VLDB Journal, 17(1), 81-115. doi:10.1007/s00778-007-0072-z.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-1D3B-3

Abstract

Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-$k$ retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the $k$ top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dy\-namic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: 1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, 2) efficient and effective top-$k$ query processing for semistructured data, 3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and \linebreak query expansion, and 4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.