A Reproducible Benchmark for P2P Retrieval

Neumann, Thomas; Bender, Matthias; Michel, Sebastian; Weikum, Gerhard; Bonnet, Philippe; Manolescu, Ioana

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

A Reproducible Benchmark for P2P Retrieval

MPG-Autoren

/persons/resource/persons127842

Neumann, Thomas
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44113

Bender, Matthias
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45041

Michel, Sebastian
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Neumann, T., Bender, M., Michel, S., & Weikum, G. (2006). A Reproducible Benchmark for P2P Retrieval. In Proceedings of the 1st International Workshop on Performance and Evaluation of Data Management Systems, ExpDB 2006, in cooperation with ACM SIGMOD (pp. 1-8). New York, USA: ACM.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-221F-7

Zusammenfassung

With the growing popularity of information retrieval (IR) in distributed systems and in particular {P2P} Web search, a huge number of protocols and prototypes have been introduced in the literature. However, nearly every paper considers a different benchmark for its experimental evaluation, rendering their mutual comparison and the quantification of performance improvements an impossible task. We present a standardized, general purpose benchmark for {P2P IR} systems that finally makes this possible. We start by presenting a detailed requirement analysis for such a standardized benchmark framework that allows for reproducible and comparable experimental setups without sacrificing flexibility to suit different system models. We further suggest Wikipedia as a publicly-available and all-purpose document corpus and finally introduce a simple but yet flexible clustering strategy that assigns the Wikipedia articles as documents to an arbitrary number of peers. After proposing a standardized, real-world query set as the benchmark workload, we review the metrics to evaluate the benchmark results and present an example benchmark run for our fullyimplemented {P2P} Web search prototype {MINERVA}.