Indigenous language technology in the age of machine learning
Permanent link
https://hdl.handle.net/10037/36008Date
2024-11-13Type
Journal articleTidsskriftartikkel
Peer reviewed
Abstract
Most modern language technology for proofing tools, machine translation and other
applications is based on machine learning. However, very few Indigenous languages
have the necessary amount of texts for making tools based on this technology. When
most language technology is based on large language models (LLMs), it bears the risk
of most of Indigenous language online text being produced by neural text
generation. The result would be that online texts cannot be trusted as a source for
authentic Indigenous languages anymore. An alternative is the work done at UiT –
The Arctic University of Norway during the last 20 years, based on linguistics. Sámi
language tools have been made available for both industry and language communities,
with open licenses. These have been widely used by translators, teachers and various
software companies. The article analyzes the following four parts of language
technology development: language data, language tool development, making the tools
available to users, and ethical use of available language technology tools. We make
extensive use of the CARE principles, and discuss the shortcomings of existing software
and data licensing schemes. Finally, we introduce a 3D table to help classify language
technology projects with respect to their suitability for Indigenous languages.
Publisher
Taylor & FrancisCitation
Moshagen, Antonsen, Wiechetek, Trosterud. Indigenous language technology in the age of machine learning. Acta Borealia. 2024;41(2):102-116Metadata
Show full item recordCollections
Copyright 2024 The Author(s)