ub.xmlui.mirage2.page-structure.muninLogoub.xmlui.mirage2.page-structure.openResearchArchiveLogo
    • EnglishEnglish
    • norsknorsk
  • Velg spraaknorsk 
    • EnglishEnglish
    • norsknorsk
  • Administrasjon/UB
Vis innførsel 
  •   Hjem
  • Fakultet for humaniora, samfunnsvitenskap og lærerutdanning
  • Institutt for språk og kultur
  • Artikler, rapporter og annet (språk og kultur)
  • Vis innførsel
  •   Hjem
  • Fakultet for humaniora, samfunnsvitenskap og lærerutdanning
  • Institutt for språk og kultur
  • Artikler, rapporter og annet (språk og kultur)
  • Vis innførsel
JavaScript is disabled for your browser. Some features of this site may not work without it.

Big data in Russian linguistics? Another look at paucal constructions

Permanent lenke
https://hdl.handle.net/10037/15873
DOI
https://doi.org/10.1515/slaw-2019-0012
Thumbnail
Åpne
article.pdf (2.358Mb)
Publisher's version (PDF)
Dato
2019-05-28
Type
Journal article
Tidsskriftartikkel
Peer reviewed

Forfatter
Nesset, Tore
Sammendrag
With the advent of large web-based corpora, Russian linguistics steps into the era of “big data”. But how useful are large datasets in our field? What are the advantages? Which problems arise? The present study seeks to shed light on these questions based on an investigation of the Russian paucal construction in the RuTenTen corpus, a web-based corpus with more than ten billion words. The focus is on the choice between adjectives in the nominative (dve/tri/četyre starye knigi) and genitive (dve/tri/četyre staryx knigi) in paucal constructions with the numerals dve, tri or četyre and a feminine noun. Three generalizations emerge. First, the large RuTenTen dataset enables us to identify predictors that could not be explored in smaller corpora. In particular, it is shown that predicates, modifiers, prepositions and word-order affect the case of the adjective. Second, we identify situations where the RuTenTen data cannot be straightforwardly reconciled with findings from earlier studies or there appear to be discrepancies between different statistical models. In such cases, further research is called for. The effect of the numeral (dve, tri vs. četyre) and verbal government are relevant examples. Third, it is shown that adjectives in the nominative have more easily learnable predictors that cover larger classes of examples and show clearer preferences for the relevant case. It is therefore suggested that nominative adjectives have the potential to outcompete adjectives in the genitive over time. Although these three generalizations are valuable additions to our knowledge of Russian paucal constructions, three problems arise. Large internet-based corpora like the RuTenTen corpus (a) are not balanced, (b) involve a certain amount of “noise”, and (c) do not provide metadata. As a consequence of this, it is argued, it may be wise to exercise some caution with regard to conclusions based on “big data”.
Beskrivelse
Source at https://doi.org/10.1515/slaw-2019-0012.
Forlag
De Gruyter
Sitering
Nesset, T. (2019). Big data in Russian linguistics? Another look at paucal constructions. Zeitschrift für Slawistik, 64(2), 157-174. https://doi.org/10.1515/slaw-2019-0012
Metadata
Vis full innførsel
Samlinger
  • Artikler, rapporter og annet (språk og kultur) [1472]

Bla

Bla i hele MuninEnheter og samlingerForfatterlisteTittelDatoBla i denne samlingenForfatterlisteTittelDato
Logg inn

Statistikk

Antall visninger
UiT

Munin bygger på DSpace

UiT Norges Arktiske Universitet
Universitetsbiblioteket
uit.no/ub - munin@ub.uit.no

Tilgjengelighetserklæring