• The GiellaLT infrastructure: A multilingual infrastructure for rule-based NLP 

      Nørstebø Moshagen, Sjur; Pirinen, Flammie; Antonsen, Lene; Gaup, Børre; Mikkelsen, Inga Lill Sigga; Trosterud, Trond; Wiechetek, Linda; Hiovain-Asikainen, Katri (Journal article; Tidsskriftartikkel; Peer reviewed, 2023)
      This article gives an overview of the GiellaLT infrastructure, the main parts of it, and how it has been and can be used to support a large number of indigenous and minority languages, from keyboards to speech technology and advanced proofing tools. A special focus is given to languages with few or non-existing digital resources, and it is shown that many tools useful to the daily digital life of ...
    • Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp 

      Wiechetek, Linda; Pirinen, Flammie; Gaup, Børre; Argese, Chiara; Omma, Thomas (Journal article; Tidsskriftartikkel; Peer reviewed, 2022-08-30)
      Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper ...
    • Suoidne-varra-bleahkka-mála-bihkka-senet-dielku 'hay-blood-ink-paint-tar-mustard-stain' -Should compounds be lexicalized in NLP? 

      Wiechetek, Linda; Argese, Chiara; Pirinen, Tommi; Trosterud, Trond (Journal article; Tidsskriftartikkel, 2020-12-11)
      Lexicalizing compounds, in addition to treating them dynamically, is a key element in giving us idiomatic translations and detecting compound errors. We present and evaluate an e-dictionary (NDS) and a grammar checker (<i>GramDivvun</i>) for North Sámi. We achieve a coverage of 98% for NDSqueries and of 96% for compound error detection in <i>GramDivvun</i>.