Parallelization of the Alternating-Least-Squares Algorithm With Weighted Regularization for Efficient GPU Execution in Recommender Systems

Kampffmeyer, Michael Christian

dc.contributor.advisor	Johansen, Håvard
dc.contributor.advisor	Jenssen, Robert
dc.contributor.author	Kampffmeyer, Michael Christian
dc.date.accessioned	2015-08-26T13:11:34Z
dc.date.available	2015-08-26T13:11:34Z
dc.date.issued	2015-06-15
dc.description.abstract	Collaborative filtering recommender systems have become essential to many Internet services, providing, for instance, book recommendations at Amazon's online e-commerce service, music recommendation in Spotify and movie recommendation in Netflix. Matrix factorization and Restricted Boltzmann Machines (RBMs) are two popular methods for implementing recommender systems, both providing superior accuracy over common neighborhood models. Both methods also shift much of the computation from the prediction phase to the model training phase, which enables fast predictions once the model has been trained. This thesis suggests a novel approach for performing matrix factorization using the Alternating-Least-Squares with Weighted-Lambda-Regularization (ALS-WR) algorithm on CUDA (ALS-CUDA). The algorithm is implemented and evaluated in the context of recommender systems by comparing it to other commonly used approaches. These include an RBM and a stochastic gradient descent (SGD) approach. Our evaluation shows that significant speedups can be achieved by using CUDA and GPUs for training recommender systems. The ALS-CUDA algorithm implemented in this thesis provided speedup factors of up to 175.4 over the sequential CPU ALS implementation and scales linearly with the number of CUDA threads assigned to it until the GPUs shared memory has been saturated. Comparing the performance of the ALS-CUDA algorithm to CUDA implementations of the SGD and the RBM algorithms shows that the ALS-CUDA algorithm outperformed the RBM. For a sparse dataset, results indicate that the ALS-CUDA algorithm performs slightly worse than the SGD implementation, while for a dense dataset, ALS-CUDA outperforms the SGD. However, generally the advantage of the ALS-CUDA algorithm does not necessarily lie in its speed, but also in the fact that it requires fewer parameters than the SGD. It therefore represents a viable option when some speed can be traded off for algorithmic stability, or when the dataset is dense.	en_US
dc.identifier.uri	https://hdl.handle.net/10037/7987
dc.identifier.urn	URN:NBN:no-uit_munin_7569
dc.language.iso	eng	en_US
dc.publisher	UiT Norges arktiske universitet	en_US
dc.publisher	UiT The Arctic University of Norway	en_US
dc.rights.accessRights	openAccess
dc.rights.holder	Copyright 2015 The Author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/3.0	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)	en_US
dc.subject.courseID	INF-3981	en_US
dc.subject	VDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Anvendt matematikk: 413	en_US
dc.subject	VDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413	en_US
dc.subject	VDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Algoritmer og beregnbarhetsteori: 422	en_US
dc.subject	VDP::Mathematics and natural science: 400::Information and communication science: 420::Algorithms and computability theory: 422	en_US
dc.title	Parallelization of the Alternating-Least-Squares Algorithm With Weighted Regularization for Efficient GPU Execution in Recommender Systems	en_US
dc.type	Master thesis	en_US
dc.type	Mastergradsoppgave	en_US

Tilhørende fil(er)

Navn:: license.txt
Størrelse:: 1.402Kb
Format:: Tekstfil

Åpne

Navn:: thesis.pdf
Størrelse:: 9.798Mb
Format:: PDF
Beskrivelse:: thesis

Åpne

Denne innførselen finnes i følgende samling(er)

Mastergradsoppgaver i teknologi - informatikk [136]

Vis enkel innførsel

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)