Guidance for Citing Linguistic Data

Conzett, Philipp; De Smedt, Koenraad

dc.contributor.author	Conzett, Philipp
dc.contributor.author	De Smedt, Koenraad
dc.date.accessioned	2023-02-17T09:51:24Z
dc.date.available	2023-02-17T09:51:24Z
dc.date.issued	2022
dc.description.abstract	<p>Linguistic data, in their many forms, are a valuable asset in research and education on language. From the predigital age, the earliest data to reach us are written records carved in stone, wooden sticks, or clay tablets, or penned on papyrus, parchment, and such. Early field linguists recorded samples obtained from informants and other sources in notebooks and card files. Speech was recorded on analog devices such as wax cylinders, phonograph records, and magnetic tape. Consultation of such materials as cited in studies was usually cumbersome, but their citation was often relatively straightforward. <p>In the early digital age, materials were shipped on digital tape reels or CD- ROM, and citation consisted of references to physical media. Nowadays, most digital materials are made available online. This has clear implications for the practice of citation. Furthermore, the use of digital data in linguistics has greatly expanded in volume and variety. Primary data in the form of large digital corpora of text, audio, and video have become widely available and are often annotated at one or more linguistic levels. Some other types of digital data (in the wide sense of the term) relevant for research on language are lexicons, term banks, word nets, computational grammars, translation memories, survey results, quantitative data from experiments, and so on. Locating specific data that were used in studies would amount to looking for a needle in a haystack were it not for proper citation. Unfortunately, citation practices haven’t fully kept pace with new kinds of digital data and their distribution. <p>In this chapter, we sometimes use the more general term resource when referring to different types of digital research products, including, for instance, language models and analyzers (e.g., grammars, parsers), annotation tools, statistical code associated with certain data sets, and other digital assets. Often, we mention data for simplicity but most guidelines for data also hold for other resources. A data set is a set of data items that is distributed as a whole, but often we use data and data set interchangeably. <p>The guidance given in this chapter is primarily targeted at authors of linguistic publications, while a secondary audience consists of academic publishers and resource providers such as repositories and archives.	en_US
dc.identifier.citation	Conzett Ph, De Smedt K J M J: Guidance for Citing Linguistic Data . In: Berez-Kroeker, McDonnell B, Koller, Collister LB. The Open Handbook of Linguistic Data Management, 2022. MIT Press p. 143-155	en_US
dc.identifier.cristinID	FRIDAID 1974792
dc.identifier.doi	10.7551/mitpress/12200.003.0015
dc.identifier.isbn	9780262366076
dc.identifier.uri	https://hdl.handle.net/10037/28574
dc.language.iso	eng	en_US
dc.publisher	MIT Press	en_US
dc.relation.projectID	Norges forskningsråd: 295700	en_US
dc.rights.accessRights	openAccess	en_US
dc.rights.holder	Copyright 2022 The Author(s)	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc/4.0	en_US
dc.rights	Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)	en_US
dc.title	Guidance for Citing Linguistic Data	en_US
dc.type.version	publishedVersion	en_US
dc.type	Chapter	en_US
dc.type	Bokkapittel	en_US

Tilhørende fil(er)

Navn:: article.pdf
Størrelse:: 431.5Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Artikler, rapporter og annet (UB) [2776]

Vis enkel innførsel

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)