Vis enkel innførsel

dc.contributor.advisorJenssen, Robert
dc.contributor.authorAgersborg, Jørgen Andreas
dc.date.accessioned2012-11-02T13:43:49Z
dc.date.available2012-11-02T13:43:49Z
dc.date.issued2012-06-01
dc.description.abstractClustering is an unsupervised pattern recognition technique for finding natural groups in data, whether it is a grouping of web pages found by a search engine or segmenting satellite images into different types of ground cover. There exists a variety of different ways to perform clustering ranging from heuristics rules designed for a specific dataset to general procedures which can be applied to all datasets with varying degrees of success. The k-means algorithm is a well known example of the latter approach that can be expected to give good results when the data is easily separable. Another example of general clustering procedures are spectral clustering methods which involve a non-linear data transformation that allows them to handle complex cluster structures. They are considered to be the state of the art in the clustering literature, but are computationally demanding and unable to handle large datasets. To overcome the size limitations, this thesis uses a two-stage approach. The first stage can be seen as a preprocessing to reduce the size of the input for the spectral clustering in the second stage. The preprocessing is accomplished by using a simple clustering method on the dataset. These clusters represents a partitioning and is a more natural way of representing the dataset than randomly selecting a subset of samples. One can then adjust the number of partitions so spectral clustering methods can be used. The procedure is called Mean Shift Spectral Clustering (MSSC) as the mean shift clustering algorithm is used in the first stage. Each partition found by mean shift consists of data points that are close to the same mode in the estimated probability density function. Hence the partitions will be representative of the geometric structure of the dataset. This thesis realizes for the first time the idea of spectral clustering based on the partitions found by mean shift. The Kernel Entropy Component Analysis (KECA) spectral clustering method, a recent development in the field of spectral clustering, is used for this purpose and compared with the better known Kernel Principal Component Analysis (KPCA) method. A comprehensive collection of MATLAB functions has been developed to allow the testing of this procedure which is able to handle large and high dimensional datasets. It is able to cluster images directly with as many feature vectors as there are pixels. The experiments show how the clustering accuracy varies as a function of the primary parameters. This gives a good overall characterization of the method and what can be expected when used for unfamiliar datasets. The MSSC procedure is shown to provide good clustering results when following some basic principles for selecting parameters.en
dc.identifier.urihttps://hdl.handle.net/10037/4610
dc.identifier.urnURN:NBN:no-uit_munin_4325
dc.language.isoengen
dc.publisherUniversitetet i Tromsøen
dc.publisherUniversity of Tromsøen
dc.rights.accessRightsopenAccess
dc.rights.holderCopyright 2012 The Author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/3.0en_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)en_US
dc.subject.courseIDFYS-3921en
dc.subjectVDP::Mathematics and natural science: 400::Information and communication science: 420::Knowledge based systems: 425en
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Kunnskapsbaserte systemer: 425en
dc.titleMean shift spectral clustering using kernel entropy component analysisen
dc.typeMaster thesisen
dc.typeMastergradsoppgaveen


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)