Show simple item record

dc.contributor.authorMoshagen, Sjur Nørstebø
dc.contributor.authorAntonsen, Lene
dc.contributor.authorWiechetek, Linda
dc.contributor.authorTrosterud, Trond
dc.date.accessioned2024-12-17T07:57:49Z
dc.date.available2024-12-17T07:57:49Z
dc.date.issued2024-11-13
dc.description.abstractMost modern language technology for proofing tools, machine translation and other applications is based on machine learning. However, very few Indigenous languages have the necessary amount of texts for making tools based on this technology. When most language technology is based on large language models (LLMs), it bears the risk of most of Indigenous language online text being produced by neural text generation. The result would be that online texts cannot be trusted as a source for authentic Indigenous languages anymore. An alternative is the work done at UiT – The Arctic University of Norway during the last 20 years, based on linguistics. Sámi language tools have been made available for both industry and language communities, with open licenses. These have been widely used by translators, teachers and various software companies. The article analyzes the following four parts of language technology development: language data, language tool development, making the tools available to users, and ethical use of available language technology tools. We make extensive use of the CARE principles, and discuss the shortcomings of existing software and data licensing schemes. Finally, we introduce a 3D table to help classify language technology projects with respect to their suitability for Indigenous languages.en_US
dc.identifier.citationMoshagen, Antonsen, Wiechetek, Trosterud. Indigenous language technology in the age of machine learning. Acta Borealia. 2024;41(2):102-116en_US
dc.identifier.cristinIDFRIDAID 2326609
dc.identifier.doi10.1080/08003831.2024.2410124
dc.identifier.issn0800-3831
dc.identifier.issn1503-111X
dc.identifier.urihttps://hdl.handle.net/10037/36008
dc.language.isoengen_US
dc.publisherTaylor & Francisen_US
dc.relation.journalActa Borealia
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2024 The Author(s)en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0en_US
dc.rightsAttribution 4.0 International (CC BY 4.0)en_US
dc.title.alternativeEamiálbmot giellateknologiija dihtoroahppan áigodagasen_US
dc.titleIndigenous language technology in the age of machine learningen_US
dc.type.versionpublishedVersionen_US
dc.typeJournal articleen_US
dc.typeTidsskriftartikkelen_US
dc.typePeer revieweden_US


File(s) in this item

Thumbnail

This item appears in the following collection(s)

Show simple item record

Attribution 4.0 International (CC BY 4.0)
Except where otherwise noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)