Mean shift spectral clustering using kernel entropy component analysis
AuthorAgersborg, Jørgen Andreas
Clustering is an unsupervised pattern recognition technique for finding natural groups in data, whether it is a grouping of web pages found by a search engine or segmenting satellite images into different types of ground cover. There exists a variety of different ways to perform clustering ranging from heuristics rules designed for a specific dataset to general procedures which can be applied to all datasets with varying degrees of success. The k-means algorithm is a well known example of the latter approach that can be expected to give good results when the data is easily separable. Another example of general clustering procedures are spectral clustering methods which involve a non-linear data transformation that allows them to handle complex cluster structures. They are considered to be the state of the art in the clustering literature, but are computationally demanding and unable to handle large datasets. To overcome the size limitations, this thesis uses a two-stage approach. The first stage can be seen as a preprocessing to reduce the size of the input for the spectral clustering in the second stage. The preprocessing is accomplished by using a simple clustering method on the dataset. These clusters represents a partitioning and is a more natural way of representing the dataset than randomly selecting a subset of samples. One can then adjust the number of partitions so spectral clustering methods can be used. The procedure is called Mean Shift Spectral Clustering (MSSC) as the mean shift clustering algorithm is used in the first stage. Each partition found by mean shift consists of data points that are close to the same mode in the estimated probability density function. Hence the partitions will be representative of the geometric structure of the dataset. This thesis realizes for the first time the idea of spectral clustering based on the partitions found by mean shift. The Kernel Entropy Component Analysis (KECA) spectral clustering method, a recent development in the field of spectral clustering, is used for this purpose and compared with the better known Kernel Principal Component Analysis (KPCA) method. A comprehensive collection of MATLAB functions has been developed to allow the testing of this procedure which is able to handle large and high dimensional datasets. It is able to cluster images directly with as many feature vectors as there are pixels. The experiments show how the clustering accuracy varies as a function of the primary parameters. This gives a good overall characterization of the method and what can be expected when used for unfamiliar datasets. The MSSC procedure is shown to provide good clustering results when following some basic principles for selecting parameters.
PublisherUniversitetet i Tromsø
University of Tromsø
The following license file are associated with this item: