Russian natural language processing for computer-assisted language learning: Capturing the benefits of deep morphological analysis in real-life applications
In this dissertation, I investigate practical and theoretical issues surrounding the use of natural language processing technology in the context of Russian Computer-Assisted Language-Learning, with particular emphasis on morphological analysis. In Part I, I present linguistic and practical issues surrounding the development and evaluation of two foundational technologies: a two-level morphological analyzer, and a constraint grammar to contextually disambiguate homonymy in the analyzer’s output. The analyzer was specially designed for L2 learner applications—with stress annotation and rule-based morphosyntactic disambiguation—and it is competitive with state-of-the-art Russian analyzers. The constraint grammar is designed to have high recall, allowing an L2-learner application to base decisions on all possible readings, and not just the single most likely reading. The constraint grammar resolves 44% of the ambiguity output by the morphological analyzer. A voting setup combining the constraint grammar with a trigram hidden markov model tagger demonstrates how a high-recall grammar can boost performance of probabilistic taggers, which are better suited to capturing highly idiosyncratic facts about collocational tendencies. In Part II, I present linguistic, theoretical, practical issues surrounding the application of the morphological analyzer and constraint grammar to three real-life computer-assisted language-learning tasks: automatic stress annotation, automatic grammar exercise generation from authentic texts, and automatic evaluation of text readability. The automatic stress placement task is vital for Russian language-learning applications. The morphological analyzer and constraint grammar yield state-of-the-art results, resolving 42% of stress ambiguity in a corpus of running text. In order to demonstrate the value of a high-recall constraint grammar, I developed Russian grammar activities for the VIEW platform, a system for providing automatic Visual Input Enhancement of Web documents. This system allows teachers and learners to automatically generate grammatical highlighting, identification activities, multiple-choice activities, and fill-in-the-blank activities, enabling them to study grammar using texts that are interesting or relevant to them. I show that the morphological analysis described above is instrumental not only for generating exercises, but also for providing adaptive feedback, a feature which typically requires encoding specific learner language features. A final test-case for morphological analysis in Russian language-learning is automatic readability assessment, which can help learners and teachers find texts at appropriate reading levels. I show that features based on morphology are among the most informative for L2 readability assessment.
PublisherUiT Norges arktiske universitet
UiT The Arctic University of Norway
The following license file are associated with this item:
Showing items related by title, author, creator and subject.
Presenting the Sámi when learning Norwegian. An analysis of the representation of the Sámi in Norwegian as a Foreign and Second Language textbooks. Thomine, Sébastien (Mastergradsoppgave; Master thesis, 2021-05-31)This thesis explores the representation of Sámi people in the textbooks used by foreigners to learn Norwegian. The research aims at identifying the different approaches used from the 1940s until the late 2010s to present Sámi people in Norwegian as Foreign Language textbooks and the factors influencing their presentation through a historical and sociocultural perspective. To do so, this research ...
Fábregas, Antonio; Marín, Rafael (Journal article; Tidsskriftartikkel; Peer reviewed, 2012)Most of the literature devoted to the study of deverbal nominalizations concentrates on the complex event reading (La concentración de partículas tiene lugar a temperatura ambiente, ‘The concentration of particles takes place at room temperature’) and the object reading (El paciente tenía concentraciones de calcio en el hombro, ‘The patient had calcium concentrations in the shoulder’), while ...
Svenonius, Peter (Journal article; Tidsskriftartikkel; Peer reviewed, 2003)All Germanic languages make extensive use of verb-particle combinations (known as separable-prefix verbs in the OV languages). I show some basic differences here distinguishing the Scandinavian type from the OV West Germanic languages, with English superficially patterning with Scandinavian but actually manifesting a distinct type. Specifically, I argue that the P projection is split into p and P ...