TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

Theobald, Martin; Bast, Holger; Majumdar, Debapriyo; Schenkel, Ralf; Weikum, Gerhard

doi:10.1007/s00778-007-0072-z

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Zeitschriftenartikel

TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

MPG-Autoren

/persons/resource/persons45609

Theobald, Martin
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44076

Bast, Holger
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44972

Majumdar, Debapriyo
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel, Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Theobald, M., Bast, H., Majumdar, D., Schenkel, R., & Weikum, G. (2008). TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. VLDB Journal, 17(1), 81-115. doi:10.1007/s00778-007-0072-z.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-1D3B-3

Zusammenfassung

Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-$k$ retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the $k$ top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dy\-namic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: 1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, 2) efficient and effective top-$k$ query processing for semistructured data, 3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and \linebreak query expansion, and 4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.