Vis enkel innførsel

dc.contributor.advisorMyhre, Jonas Nordhaug
dc.contributor.authorWarholm, Joakim
dc.date.accessioned2021-07-09T06:34:48Z
dc.date.available2021-07-09T06:34:48Z
dc.date.issued2021-05-28en
dc.description.abstractIn this work we present a new Norwegian labeled dataset of 7078 comments for unhealthy comment detection. The dataset is used to fine-tune a BERT model, and demonstrates that BERT has the ability to detect subtle forms of toxicity, also in Norwegian. We compare how the different newly released Norwegian BERT models perform when fine-tuned on our dataset, and we also experiment with how English data can be utilized to fine-tune one of the models. We fine-tune BERT to recognize unhealthy comments in Norwegian, as well as a list of other characteristics a comment may have such as being hostile, antagonising/insulting/trolling, dismissive, condescending, sarcastic, or being an unfair generalisation. Our AUC scores beat the AUC scores from previous work on detecting unhealthy comments in English on all categories, except dismissive.en_US
dc.identifier.urihttps://hdl.handle.net/10037/21853
dc.language.isoengen_US
dc.publisherUiT Norges arktiske universitetno
dc.publisherUiT The Arctic University of Norwayen
dc.rights.holderCopyright 2021 The Author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0en_US
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)en_US
dc.subject.courseIDFYS-3900
dc.subjectVDP::Mathematics and natural science: 400::Physics: 430::Electronics: 435en_US
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Fysikk: 430::Elektronikk: 435en_US
dc.titleDetecting Unhealthy Comments in Norwegian using BERTen_US
dc.typeMastergradsoppgavenor
dc.typeMaster thesiseng


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)