Detecting Unhealthy Comments in Norwegian using BERT

Warholm, Joakim

dc.contributor.advisor	Myhre, Jonas Nordhaug
dc.contributor.author	Warholm, Joakim
dc.date.accessioned	2021-07-09T06:34:48Z
dc.date.available	2021-07-09T06:34:48Z
dc.date.issued	2021-05-28	en
dc.description.abstract	In this work we present a new Norwegian labeled dataset of 7078 comments for unhealthy comment detection. The dataset is used to fine-tune a BERT model, and demonstrates that BERT has the ability to detect subtle forms of toxicity, also in Norwegian. We compare how the different newly released Norwegian BERT models perform when fine-tuned on our dataset, and we also experiment with how English data can be utilized to fine-tune one of the models. We fine-tune BERT to recognize unhealthy comments in Norwegian, as well as a list of other characteristics a comment may have such as being hostile, antagonising/insulting/trolling, dismissive, condescending, sarcastic, or being an unfair generalisation. Our AUC scores beat the AUC scores from previous work on detecting unhealthy comments in English on all categories, except dismissive.	en_US
dc.identifier.uri	https://hdl.handle.net/10037/21853
dc.language.iso	eng	en_US
dc.publisher	UiT Norges arktiske universitet	no
dc.publisher	UiT The Arctic University of Norway	en
dc.rights.holder	Copyright 2021 The Author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/4.0	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)	en_US
dc.subject.courseID	FYS-3900
dc.subject	VDP::Mathematics and natural science: 400::Physics: 430::Electronics: 435	en_US
dc.subject	VDP::Matematikk og Naturvitenskap: 400::Fysikk: 430::Elektronikk: 435	en_US
dc.title	Detecting Unhealthy Comments in Norwegian using BERT	en_US
dc.type	Mastergradsoppgave	nor
dc.type	Master thesis	eng

File(s) in this item

Name:: thesis.pdf
Size:: 2.161Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 1.093Kb
Format:: Text file

View/Open

This item appears in the following collection(s)

Mastergradsoppgaver IFT [102]

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)