Show simple item record

dc.contributor.authorNesset, Tore
dc.date.accessioned2019-08-08T12:11:53Z
dc.date.available2019-08-08T12:11:53Z
dc.date.issued2019-05-28
dc.description.abstractWith the advent of large web-based corpora, Russian linguistics steps into the era of “big data”. But how useful are large datasets in our field? What are the advantages? Which problems arise? The present study seeks to shed light on these questions based on an investigation of the Russian paucal construction in the RuTenTen corpus, a web-based corpus with more than ten billion words. The focus is on the choice between adjectives in the nominative (dve/tri/četyre starye knigi) and genitive (dve/tri/četyre staryx knigi) in paucal constructions with the numerals dve, tri or četyre and a feminine noun. Three generalizations emerge. First, the large RuTenTen dataset enables us to identify predictors that could not be explored in smaller corpora. In particular, it is shown that predicates, modifiers, prepositions and word-order affect the case of the adjective. Second, we identify situations where the RuTenTen data cannot be straightforwardly reconciled with findings from earlier studies or there appear to be discrepancies between different statistical models. In such cases, further research is called for. The effect of the numeral (dve, tri vs. četyre) and verbal government are relevant examples. Third, it is shown that adjectives in the nominative have more easily learnable predictors that cover larger classes of examples and show clearer preferences for the relevant case. It is therefore suggested that nominative adjectives have the potential to outcompete adjectives in the genitive over time. Although these three generalizations are valuable additions to our knowledge of Russian paucal constructions, three problems arise. Large internet-based corpora like the RuTenTen corpus (a) are not balanced, (b) involve a certain amount of “noise”, and (c) do not provide metadata. As a consequence of this, it is argued, it may be wise to exercise some caution with regard to conclusions based on “big data”.en_US
dc.descriptionSource at <a href=https://doi.org/10.1515/slaw-2019-0012>https://doi.org/10.1515/slaw-2019-0012</a>.en_US
dc.identifier.citationNesset, T. (2019). Big data in Russian linguistics? Another look at paucal constructions. <i>Zeitschrift für Slawistik, 64</i>(2), 157-174. https://doi.org/10.1515/slaw-2019-0012en_US
dc.identifier.cristinIDFRIDAID 1701141
dc.identifier.doihttps://doi.org/10.1515/slaw-2019-0012
dc.identifier.issn0044-3506
dc.identifier.issn2196-7016
dc.identifier.urihttps://hdl.handle.net/10037/15873
dc.language.isoengen_US
dc.publisherDe Gruyteren_US
dc.relation.journalZeitschrift für Slawistik
dc.rights.accessRightsopenAccessen_US
dc.subjectVDP::Humanities: 000::Linguistics: 010::Russian language: 028en_US
dc.subjectVDP::Humaniora: 000::Språkvitenskapelige fag: 010::Russisk språk: 028en_US
dc.subjectBig dataen_US
dc.subjectcorpus linguisticsen_US
dc.subjectRussianen_US
dc.subjectnumeralen_US
dc.subjectpaucalen_US
dc.titleBig data in Russian linguistics? Another look at paucal constructionsen_US
dc.typeJournal articleen_US
dc.typeTidsskriftartikkelen_US
dc.typePeer revieweden_US


File(s) in this item

Thumbnail

This item appears in the following collection(s)

Show simple item record