• Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp 

      Wiechetek, Linda; Pirinen, Flammie; Gaup, Børre; Argese, Chiara; Omma, Thomas (Journal article; Tidsskriftartikkel; Peer reviewed, 2022-08-30)
      Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper ...
    • Suoidne-varra-bleahkka-mála-bihkka-senet-dielku 'hay-blood-ink-paint-tar-mustard-stain' -Should compounds be lexicalized in NLP? 

      Wiechetek, Linda; Argese, Chiara; Pirinen, Tommi; Trosterud, Trond (Journal article; Tidsskriftartikkel, 2020-12-11)
      Lexicalizing compounds, in addition to treating them dynamically, is a key element in giving us idiomatic translations and detecting compound errors. We present and evaluate an e-dictionary (NDS) and a grammar checker (<i>GramDivvun</i>) for North Sámi. We achieve a coverage of 98% for NDSqueries and of 96% for compound error detection in <i>GramDivvun</i>.
    • Using authentic texts for grammar exercises for a minority language 

      Antonsen, Lene; Argese, Chiara (Journal article; Tidsskriftartikkel; Peer reviewed, 2018-11-02)
      <p>This paper presents an ATICALL (Authentic Text ICALL) system with automatic visual input enhancement activities for training complex inflection systems in a minority language. We have adapted the freely available VIEW system which was designed to automatically generate activities from any web content.</p> <p>Our system is based on finite state transducers (FST) and Constraint Grammar, originally ...