Mean shift spectral clustering using kernel entropy component analysis

Agersborg, Jørgen Andreas

dc.contributor.advisor	Jenssen, Robert
dc.contributor.author	Agersborg, Jørgen Andreas
dc.date.accessioned	2012-11-02T13:43:49Z
dc.date.available	2012-11-02T13:43:49Z
dc.date.issued	2012-06-01
dc.description.abstract	Clustering is an unsupervised pattern recognition technique for finding natural groups in data, whether it is a grouping of web pages found by a search engine or segmenting satellite images into different types of ground cover. There exists a variety of different ways to perform clustering ranging from heuristics rules designed for a specific dataset to general procedures which can be applied to all datasets with varying degrees of success. The k-means algorithm is a well known example of the latter approach that can be expected to give good results when the data is easily separable. Another example of general clustering procedures are spectral clustering methods which involve a non-linear data transformation that allows them to handle complex cluster structures. They are considered to be the state of the art in the clustering literature, but are computationally demanding and unable to handle large datasets. To overcome the size limitations, this thesis uses a two-stage approach. The first stage can be seen as a preprocessing to reduce the size of the input for the spectral clustering in the second stage. The preprocessing is accomplished by using a simple clustering method on the dataset. These clusters represents a partitioning and is a more natural way of representing the dataset than randomly selecting a subset of samples. One can then adjust the number of partitions so spectral clustering methods can be used. The procedure is called Mean Shift Spectral Clustering (MSSC) as the mean shift clustering algorithm is used in the first stage. Each partition found by mean shift consists of data points that are close to the same mode in the estimated probability density function. Hence the partitions will be representative of the geometric structure of the dataset. This thesis realizes for the first time the idea of spectral clustering based on the partitions found by mean shift. The Kernel Entropy Component Analysis (KECA) spectral clustering method, a recent development in the field of spectral clustering, is used for this purpose and compared with the better known Kernel Principal Component Analysis (KPCA) method. A comprehensive collection of MATLAB functions has been developed to allow the testing of this procedure which is able to handle large and high dimensional datasets. It is able to cluster images directly with as many feature vectors as there are pixels. The experiments show how the clustering accuracy varies as a function of the primary parameters. This gives a good overall characterization of the method and what can be expected when used for unfamiliar datasets. The MSSC procedure is shown to provide good clustering results when following some basic principles for selecting parameters.	en
dc.identifier.uri	https://hdl.handle.net/10037/4610
dc.identifier.urn	URN:NBN:no-uit_munin_4325
dc.language.iso	eng	en
dc.publisher	Universitetet i Tromsø	en
dc.publisher	University of Tromsø	en
dc.rights.accessRights	openAccess
dc.rights.holder	Copyright 2012 The Author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/3.0	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)	en_US
dc.subject.courseID	FYS-3921	en
dc.subject	VDP::Mathematics and natural science: 400::Information and communication science: 420::Knowledge based systems: 425	en
dc.subject	VDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Kunnskapsbaserte systemer: 425	en
dc.title	Mean shift spectral clustering using kernel entropy component analysis	en
dc.type	Master thesis	en
dc.type	Mastergradsoppgave	en

Tilhørende fil(er)

Navn:: license.txt
Størrelse:: 1.498Kb
Format:: Tekstfil

Åpne

Navn:: thesis.pdf
Størrelse:: 2.488Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Mastergradsoppgaver i teknologi - anvendt fysikk [75]

Vis enkel innførsel

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)