Exploiting Index Pruning Methods for Clustering XML Collections

Download
2010-01-01
In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3 M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.

Suggestions

Cluster searching strategies for collaborative recommendation systems
Altıngövde, İsmail Sengör; Ulusoy, Ozgur (2013-05-01)
In-memory nearest neighbor computation is a typical collaborative filtering approach for high recommendation accuracy. However, this approach is not scalable given the huge number of customers and items in typical commercial applications. Cluster-based collaborative filtering techniques can be a remedy for the efficiency problem, but they usually provide relatively lower accuracy figures, since they may become over-generalized and produce less-personalized recommendations. Our research explores an individua...
Similarity matrix framework for data from union of subspaces
Aldroubi, Akram; Sekmen, Ali; Koku, Ahmet Buğra; Cakmak, Ahmet Faruk (2018-09-01)
This paper presents a framework for finding similarity matrices for the segmentation of data W = [w(1)...w(N)] subset of R-D drawn from a union U = boolean OR(M)(i=1) S-i, of independent subspaces {S-i}(i=1)(M), of dimensions {d(i)}(i=1)(M). It is shown that any factorization of W = BP, where columns of B form a basis for data W and they also come from U, can be used to produce a similarity matrix Xi w. In other words, Xi w(i, j) not equal 0, when the columns w(i) and w(j) of W come from the same subspace, ...
Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm
Kaya, Semih; Vural, Elif (2021-01-01)
While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of th...
Consensus clustering of time series data
Yetere Kurşun, Ayça; Batmaz, İnci; İyigün, Cem; Department of Scientific Computing (2014)
In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time serie...
Optimization of Mesa Structured InGaAs Based Photodiode Arrays
Dolas, M. Halit; Çırçır, Kübra; Kocaman, Serdar (2017-04-13)
We design lattice matched InP/In0.53Ga0.47As mesa structured heterojunction p-n photodiodes with a novel passivation methodology based on a fully depleted thin p-InP layer. Mesa-structured detectors are targeted due to their competitive advantages for applications such as multicolor/hyperspectral imaging. Test detector pixels with different perimeter/area ratios are fabricated with and without etching thin InP passivation layer between pixels in order to comparatively examine passivating behavior. I-V chara...
Citation Formats
İ. S. Altıngövde and O. Ulusoy, “Exploiting Index Pruning Methods for Clustering XML Collections,” 2010, vol. 6203, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35247.