A memetic algorithm for clustering with cluster based feature selection

2022-8
Şener, İlyas Alper
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of features, so the problem is named as clustering with cluster based feature selection problem. We approach the problem as a center based clustering and three tasks which are selection of relevant features, decision of cluster centroid locations, and assignment of data points to the clusters are considered. The problem is combinatorially NP-Hard and mathematical models are not capable to solve large-size problems. Moreover, developed heuristics in the literature obtain results with high variance. Therefore, a metaheuristic framework which follows the problem characteristics is proposed. A memetic algorithm that integrates a genetic approach and neighborhood search is proposed to solve data sets with high number of data points. A modified version of this algorithm is also developed for high dimensional data sets. Proposed algorithms have been tested on different problem instances with different size and dimensions. Both simulated and real data sets are utilized for the tests. Experimental results have shown that the proposed approach obtains stable results with high accuracy and outperforms the state of the art.

Suggestions

A Multi-objective approach to cluster ensemble selection problem
Aktaş, Dilay; Lokman, Banu; Department of Operational Research (2019)
Clustering is an unsupervised learning method that partitions a data set into groups. The aim is to assign similar points to the same cluster and dissimilar points to different clusters with respect to some notion of similarity. It is applicable to a wide range of areas such as recommender systems, anomaly detection, market research, and customer segmentation. With the advances in the computational power, a diverse set of clustering solutions can be obtained from a dataset using different clustering algorit...
AN EFFICIENT DATABASE TRANSITIVE CLOSURE ALGORITHM
Toroslu, İsmail Hakkı; HENSCHEN, L (Springer Science and Business Media LLC, 1994-05-01)
The integration of logic rules and relational databases has recently emerged as an important technique for developing knowledge management systems. An important class of logic rules utilized by these systems is the so-called transitive closure rules, the processing of which requires the computation of the transitive closure of database relations referenced by these rules. This article presents a new algorithm suitable for computing the transitive closure of very large database relations. This algorithm proc...
Mixed integer programming and heuristics approachesfor clustering with cluster-based feature selection
Önen Öz, Sen; İyigün, Cem; Department of Industrial Engineering (2019)
Cluster analysis tries to figure out the hidden similarities between data points in orderto place similar data points into the same group and different data points into separategroups using unlabeled data. Understanding the data becomes difficult and the powerof obtaining informative clusters for an algorithm decreases as the dimensionality ofthe data set gets high. Identifying the relevant features of high dimensional data setsis the mostly used technique in order to increase the performance of the algorit...
A binomial noised model for cluster validation
Toledano-Kitai, Dvora; Avros, Renata; Volkovich, Zeev; Weber, Gerhard Wilhelm; Yahalom, Orly (IOS Press, 2013-01-01)
Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal ("true") number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implemen...
A hybrid swarm intelligence algorithm for simultaneous feature selection and clustering
Geren, Hasan; Özdemirel, Nur Evin; Department of Industrial Engineering (2022-6-20)
In this study, we address the feature selection and clustering problems by using a hybrid swarm intelligence approach. We assume that the number of clusters is known, clusters can be of any shape and have different densities, but there are no outliers or noise. The data set may have high dimensionality and redundant features. We propose a swarm intelligence algorithm, namely ACOVNS, which is a hybridization of Ant Colony Optimization (ACO) and Variable Neighborhood Search (VNS). We utilize the ACO mechanism...
Citation Formats
İ. A. Şener, “A memetic algorithm for clustering with cluster based feature selection,” M.S. - Master of Science, Middle East Technical University, 2022.