In most real world clustering scenarios, experts generally dispose of limited background information, but such knowledge is valuable and may guide the analysis process. Semi-supervised clustering can be used to drive the algorithmic process with prior knowledge and to enable the discovery of clusters that meet the analyst's expectations. Usually, in the semi-supervised clustering setting, the background knowledge is converted to some kind of constraint and, successively, metric learning or constrained clustering are adopted to obtain the final data partition. Conversely, we propose a new semi-supervised clustering algorithm that directly exploits prior knowledge, under the form of labeled examples, avoiding the necessity to derive constraints. Our algorithm employs a multiresolution strategy to generate an ensemble of semi-supervised autoencoders that fit the data together with the background knowledge. Successively, the network models are employed to supply a new embedding representation on which clustering is performed. The proposed strategy is evaluated on a set of real-world benchmarks also in comparison with well-known state-of-the-art semi-supervised clustering methods. The experimental results highlight the benefit of directly leveraging the prior knowledge and show the quality of the representation learnt by the multiresolution schema.

Semi-Supervised Clustering With Multiresolution Autoencoders

D. Ienco
Co-first
;
R. G. Pensa
Co-first
2018-01-01

Abstract

In most real world clustering scenarios, experts generally dispose of limited background information, but such knowledge is valuable and may guide the analysis process. Semi-supervised clustering can be used to drive the algorithmic process with prior knowledge and to enable the discovery of clusters that meet the analyst's expectations. Usually, in the semi-supervised clustering setting, the background knowledge is converted to some kind of constraint and, successively, metric learning or constrained clustering are adopted to obtain the final data partition. Conversely, we propose a new semi-supervised clustering algorithm that directly exploits prior knowledge, under the form of labeled examples, avoiding the necessity to derive constraints. Our algorithm employs a multiresolution strategy to generate an ensemble of semi-supervised autoencoders that fit the data together with the background knowledge. Successively, the network models are employed to supply a new embedding representation on which clustering is performed. The proposed strategy is evaluated on a set of real-world benchmarks also in comparison with well-known state-of-the-art semi-supervised clustering methods. The experimental results highlight the benefit of directly leveraging the prior knowledge and show the quality of the representation learnt by the multiresolution schema.
2018
2018 International Joint Conference on Neural Networks (IJCNN)
Rio de Janeiro, Brazil
8-13 July 2018
Proceedings of 2018 International Joint Conference on Neural Networks (IJCNN)
IEEE
2905
2912
9781509060146
9781509060153
https://ieeexplore.ieee.org/document/8489353
semi-supervised clustering, background knowledge, autoencoders, ensemble
D. Ienco;R. G. Pensa
File in questo prodotto:
File Dimensione Formato  
08489353.pdf

Accesso riservato

Descrizione: paper (editoriale)
Tipo di file: PDF EDITORIALE
Dimensione 2 MB
Formato Adobe PDF
2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
PID5342881.pdf

Accesso riservato

Descrizione: paper (postprint)
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 866.67 kB
Formato Adobe PDF
866.67 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1678079
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 1
social impact