The effect of low number of points in clustering validation via the negentropy increment
Entity
UAM. Departamento de Ingeniería InformáticaPublisher
Elsevier B.V.Date
2011-09Citation
10.1016/j.neucom.2011.03.023
Neurocomputing 74.16 (2011): 2657-2664
ISSN
0925-2312 (print); 1872-8286 (online)DOI
10.1016/j.neucom.2011.03.023Funded by
This work has been funded by DGUI-CAM/UAM (Project CCG10-UAM/TIC-5864)Editor's Version
http://dx.doi.org/10.1016/j.neucom.2011.03.023Subjects
Inteligencia artificial; Crisp clustering; Cluster validation; Negentropy incrementNote
This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 74, 16, (2011) DOI: 10.1016/j.neucom.2011.03.023Selected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009)
Rights
© 2011 Elsevier B.V. All rights reservedEsta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Abstract
We recently introduced the negentropy increment, a validity index for crisp clustering that quantifies the average normality of the clustering partitions using the negentropy. This index can satisfactorily deal with clusters with heterogeneous orientations, scales and densities. One of the main advantages of the index is the simplicity of its calculation, which only requires the computation of the log-determinants of the covariance matrices and the prior probabilities of each cluster. The negentropy increment provides validation results which are in general better than those from other classic cluster validity indices. However, when the number of data points in a partition region is small, the quality in the estimation of the log-determinant of the covariance matrix can be very poor. This affects the proper quantification of the index and therefore the quality of the clustering, so additional requirements such as limitations on the minimum number of points in each region are needed. Although this kind of constraints can provide good results, they need to be adjusted depending on parameters such as the dimension of the data space. In this article we investigate how the estimation of the negentropy increment of a clustering partition is affected by the presence of regions with small number of points. We find that the error in this estimation depends on the number of points in each region, but not on the scale or orientation of their distribution, and show how to correct this error in order to obtain an unbiased estimator of the negentropy increment. We also quantify the amount of uncertainty in the estimation. As we show, both for 2D synthetic problems and multidimensional real benchmark problems, these results can be used to validate clustering partitions with a substantial improvement.
Files in this item
Google Scholar:Lago Fernández, Luis Fernando
-
Sánchez-Montañés Isla, Manuel Antonio
-
Corbacho Abelaira, Fernando
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.