INEL Dolgan Corpus

Name: INEL Dolgan Corpus
Published: 2022-11-30
License: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode

Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie

doi:10.25592/uhhfdm.11165

November 30, 2022 Dataset Open Access

INEL Dolgan Corpus

Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie

Data manager(s)

Ferger, Anne; Jettka, Daniel; Lazarenko, Elena; Lehmberg, Timm; Riaposov, Aleksandr

Researcher(s)

Wagner-Nagy, Be´ata; Arhipov, Aleksandre; Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie

Corpus Citation

Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie. 2022. INEL Dolgan Corpus. Version 2.0. Publication date 2022-11-30. https://hdl.handle.net/11022/0000-0007-F9A7-4. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1.

Corpus Description

The INEL Dolgan corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Dolgan language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Dolgan corpus is composed of texts from different sources: 1. Published folklore texts from an edited volume ("Fol'klor Dolgan", P.E. Efremov 2000), 2. Transcripts of recordings obtained from the Taymyr House of Folk Art (TDNT) in Dudinka (1970s-2000s), 3. Transcripts from the collection of Dr. Eugénie Stapert recorded on several fieldwork trips in 2007-2010, 4. Transcripts of recordings made on a fieldwork trip in 2017. The first group as well as parts of the third group were already transcribed and translated, the rest of the recordings was transcribed and translated within the INEL project.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information structure/information status.

New in release 2.0

20 glossed transcripts (2864 utterances, 19989 tokens) with 03:33:14 hours of corresponding sound
37 audio files with 10:00:36 hours of sound without glossed transcripts
Corrections of grammatical analyses and glossing according to the findings in Däbritz’s (2022) grammar, as well as cross-corpora harmonizations
Additional corpus-wide annotation of Mongolic borrowings
Additional corpus-wide annotation of existential, locative and possessive predication
Corrections in further annotations, translations and metadata

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Preview

Files (13.9 GB)

Name	Size
dolgan-2.0-documentation.pdf md5:ec3647edf1b70e222e25ff62482a9ff8	1.0 MB	Download
dolgan-2.0-mp3only.zip md5:831f74786313326775bc0b28ffbbc0f0	2.1 GB	Download
dolgan-2.0-noaudio.zip md5:6f85f1a09811f07606caf6747744eef2	40.4 MB	Download
dolgan-2.0.zip md5:d5bf810b24538594f52a5899cca4e074	11.7 GB	Download

Publication date:

November 30, 2022

DOI:

Keyword(s):

endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation existential predication locative predication non-verbal predication

Related identifiers:

Cited by:
11022/0000-0007-F9A7-4

Communities:

License (for files):

Creative Commons Attribution Non Commercial Share Alike 4.0 International

Versions

Version 2.0 10.25592/uhhfdm.11165	Nov 30, 2022
Version 1.0 10.25592/uhhfdm.9747	Aug 31, 2019

Cite all versions? You can cite all versions by using the DOI 10.25592/uhhfdm.9746. This DOI represents all versions, and will always resolve to the latest one.

Zentrumfür Nachhaltiges Forschungsdatenmanagement

Suche

INEL Dolgan Corpus

Data manager(s)

Researcher(s)

Versions

Cite record as

Export

INEL Dolgan Corpus

Data manager(s)

Researcher(s)

DOI Badge

Markdown

[![DOI](https://www.fdr.uni-hamburg.de/badge/DOI/10.25592/uhhfdm.11165.svg)](https://doi.org/10.25592/uhhfdm.11165)

reStructedText

.. image:: https://www.fdr.uni-hamburg.de/badge/DOI/10.25592/uhhfdm.11165.svg :target: https://doi.org/10.25592/uhhfdm.11165

HTML

<a href="https://doi.org/10.25592/uhhfdm.11165"><img src="https://www.fdr.uni-hamburg.de/badge/DOI/10.25592/uhhfdm.11165.svg" alt="DOI"></a>

Image URL

https://www.fdr.uni-hamburg.de/badge/DOI/10.25592/uhhfdm.11165.svg

Target URL

https://doi.org/10.25592/uhhfdm.11165

Versions

Cite record as

Export