Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Konferenzbeitrag

Restrictive Clustering and Metaclustering for self-organizing Document Collections

MPG-Autoren
/persons/resource/persons45482

Siersdorfer,  Stefan
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45500

Sizov,  Sergej
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine externen Ressourcen hinterlegt
Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte in PuRe verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Siersdorfer, S., & Sizov, S. (2004). Restrictive Clustering and Metaclustering for self-organizing Document Collections. In Proceedings of SIGIR 2004: the Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 226-233). New York, USA: ACM.


Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-2B27-F
Zusammenfassung
This paper addresses the problem of automatically structuring heterogenous document collections by using clustering methods. In contrast to traditional clustering, we study restrictive methods and ensemble-based meta methods that may decide to leave out some documents rather than assigning them to inappropriate clusters with low confidence. These techniques result in higher cluster purity, better overall accuracy, and make unsupervised self-organization more robust. Our comprehensive experimental studies on three different real-world data collections demonstrate these benefits. The proposed methods seem particularly suitable for automatically substructuring personal email folders or personal Web directories that are populated by focused crawlers, and they can be combined with supervised classification techniques.