dc.contributor.advisor | Myhre, Jonas Nordhaug | |
dc.contributor.author | Warholm, Joakim | |
dc.date.accessioned | 2021-07-09T06:34:48Z | |
dc.date.available | 2021-07-09T06:34:48Z | |
dc.date.issued | 2021-05-28 | en |
dc.description.abstract | In this work we present a new Norwegian labeled dataset of 7078 comments for unhealthy comment detection. The dataset is used to fine-tune a BERT model, and demonstrates that BERT has the ability to detect subtle forms of toxicity, also in Norwegian. We compare how the different newly released Norwegian BERT models perform when fine-tuned on our dataset, and we also experiment with how English data can be utilized to fine-tune one of the models. We fine-tune BERT to recognize unhealthy comments in Norwegian, as well as a list of other characteristics a comment may have such as being hostile, antagonising/insulting/trolling, dismissive, condescending, sarcastic, or being an unfair generalisation. Our AUC scores beat the AUC scores from previous work on detecting unhealthy comments in English on all categories, except dismissive. | en_US |
dc.identifier.uri | https://hdl.handle.net/10037/21853 | |
dc.language.iso | eng | en_US |
dc.publisher | UiT Norges arktiske universitet | no |
dc.publisher | UiT The Arctic University of Norway | en |
dc.rights.holder | Copyright 2021 The Author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0 | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | en_US |
dc.subject.courseID | FYS-3900 | |
dc.subject | VDP::Mathematics and natural science: 400::Physics: 430::Electronics: 435 | en_US |
dc.subject | VDP::Matematikk og Naturvitenskap: 400::Fysikk: 430::Elektronikk: 435 | en_US |
dc.title | Detecting Unhealthy Comments in Norwegian using BERT | en_US |
dc.type | Mastergradsoppgave | nor |
dc.type | Master thesis | eng |