Show simple item record

dc.contributor.advisorMyhre, Jonas Nordhaug
dc.contributor.authorWarholm, Joakim
dc.date.accessioned2021-07-09T06:34:48Z
dc.date.available2021-07-09T06:34:48Z
dc.date.issued2021-05-28en
dc.description.abstractIn this work we present a new Norwegian labeled dataset of 7078 comments for unhealthy comment detection. The dataset is used to fine-tune a BERT model, and demonstrates that BERT has the ability to detect subtle forms of toxicity, also in Norwegian. We compare how the different newly released Norwegian BERT models perform when fine-tuned on our dataset, and we also experiment with how English data can be utilized to fine-tune one of the models. We fine-tune BERT to recognize unhealthy comments in Norwegian, as well as a list of other characteristics a comment may have such as being hostile, antagonising/insulting/trolling, dismissive, condescending, sarcastic, or being an unfair generalisation. Our AUC scores beat the AUC scores from previous work on detecting unhealthy comments in English on all categories, except dismissive.en_US
dc.identifier.urihttps://hdl.handle.net/10037/21853
dc.language.isoengen_US
dc.publisherUiT Norges arktiske universitetno
dc.publisherUiT The Arctic University of Norwayen
dc.rights.holderCopyright 2021 The Author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0en_US
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)en_US
dc.subject.courseIDFYS-3900
dc.subjectVDP::Mathematics and natural science: 400::Physics: 430::Electronics: 435en_US
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Fysikk: 430::Elektronikk: 435en_US
dc.titleDetecting Unhealthy Comments in Norwegian using BERTen_US
dc.typeMastergradsoppgavenor
dc.typeMaster thesiseng


File(s) in this item

Thumbnail
Thumbnail

This item appears in the following collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)