Deep Learning: From Data Extraction to Large-Scale Analysis

Voets, Mike

dc.contributor.advisor	Bongo, Lars Ailo
dc.contributor.author	Voets, Mike
dc.date.accessioned	2018-05-31T07:44:18Z
dc.date.available	2018-05-31T07:44:18Z
dc.date.issued	2018-05-15
dc.description.abstract	We aim to give an insight into aspects of developing and deploying a deep learning algorithm to automate biomedical image analyses. We anonymize sensitive data from a medical archive system, attempt to replicate and further improve published methods, and scale out our algorithm to support large-scale analyses. Specifically, our contributions are described as follows. First, to anonymize and extract mammograms for the development of a breast cancer detection algorithm, we wrote a script for mammograms that reside in a data-locking, sensitive, and proprietary PACS. The script will be used in a larger project to extract mammograms from all screening points in Norway. Second, because this script is currently being authorized by Helsenord IKT, we instead developed an algorithm for a similar screening problem in the biomedical field. In order not to reinvent the wheel, we investigated earlier work. The high-impact article JAMA 2016; 316(22) describes a high performance deep learning algorithm that detects diabetic retinopathy, reporting a receiver operating characteristic curve (AUC) of 0.99. We attempted to replicate the method. Our AUC of 0.74 and 0.59 did however not reach the reported results, possibly by differences in data, or by missing details in the methodology. Third, by modifying the data preprocessing methods in the diabetic retinopathy algorithm slightly, the AUC increased to 0.94 and 0.82. These findings emphasize the challenges of replicating deep learning methods that have their source code not published, and do not use publicly available data. Fourth, benchmarks were run to assess the resources needed to run algorithm development and automated analyses on a national (Norwegian) scale. We estimate that a breast cancer detection algorithm can be trained on 4 GPUs in less than 17 hours, with a sublinear speed-up of 3.36 times compared to 1 GPU. Evaluation with inexpensive GPUs has been shown to perform instantly. Lastly, with our experiences and lessons learned in mind, we conclude with literature suggestions and recommendations to develop and to deploy an algorithm for breast cancer detection in a large-scale screening program.	en_US
dc.identifier.uri	https://hdl.handle.net/10037/12808
dc.language.iso	eng	en_US
dc.publisher	UiT Norges arktiske universitet	en_US
dc.publisher	UiT The Arctic University of Norway	en_US
dc.rights.accessRights	openAccess	en_US
dc.rights.holder	Copyright 2018 The Author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/3.0	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)	en_US
dc.subject.courseID	INF-3990
dc.subject	VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551	en_US
dc.subject	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551	en_US
dc.title	Deep Learning: From Data Extraction to Large-Scale Analysis	en_US
dc.type	Master thesis	en_US
dc.type	Mastergradsoppgave	en_US

Tilhørende fil(er)

Navn:: thesis.pdf
Størrelse:: 5.591Mb
Format:: PDF

Åpne

Navn:: license.txt
Størrelse:: 1.402Kb
Format:: Tekstfil

Åpne

Denne innførselen finnes i følgende samling(er)

Mastergradsoppgaver i informatikk [135]

Vis enkel innførsel

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)