Automatic Identification of Shared Arguments in Verbal Coordinations
Permanent lenke
https://hdl.handle.net/10037/7733Dato
2015Type
Journal articleTidsskriftartikkel
Peer reviewed
Sammendrag
We describe automatic conversion of the SynTagRus dependency treebank
of Russian to the PROIEL format (with the ultimate purpose of obtaining a single-format
diachronic treebank spanning more than a thousand years), focusing
on analysis of shared arguments in verbal coordinations. Whether arguments
are shared or private is not marked in the SynTagRus native format,
but the PROIEL format indicates sharing by means of secondary dependencies.
In order to recover missing information and insert secondary dependencies
into the converted SynTagRus, we create a simple guessing algorithm
based on four probabilistic features: how likely a given argument type
is to be shared; how likely an argument in a given position is to be shared;
how likely a given verb is to have a given argument; how likely a given verb
is to have a given argument frame. Boosted with a few deterministic rules and
trained on a small manually annotated sample (346 sentences), the guesser
very successfully inserts shared subjects (F-score 0.97), which results
in excellent overall performance (F-score 0.92). Non-subject arguments are
shared much more rarely, and for them the results are poorer (0.31 for objects;
0.22 for obliques). We show, however, that there are strong reasons
to believe that performance can be increased if a larger training sample
is used and the guesser gets to see enough positive examples. Apart from
describing a useful practical solution, the paper also provides quantitative
data about and offers non-trivial insights into Russian verbal coordination.
Sitering
Kompiuternaia lingvistika i intellektual'nye tekhnologii (2015) nr. 14 (21) s. 33-43Metadata
Vis full innførselSamlinger
Følgende lisensfil er knyttet til denne innførselen: