Dataset Open Access

INEL Kamas Corpus

Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta

Data manager(s)
Ferger, Anne; Jettka, Daniel; Lehmberg, Timm
Researcher(s)
Wagner-Nagy, Beata; Arkhipov, Alexandre; Gusev, Valentin; Klooster, Tiina

Corpus Citation

Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta. 2023. “INEL Kamas Corpus.” Version 2.0. Publication date 2023-12-31. http://hdl.handle.net/11022/0000-0007-FC25-4. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages.https://hdl.handle.net/11022/0000-0007-F45A-1.

Corpus Description

The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of syntactic functions, semantic roles, Russian borrowings and code-switching. Some texts also have annotations for information status.

New in release 2.0

  • In texts from Donner’s collection, phonetic transcription according to Klumpp|s edition of Donner’s manuscripts has been added (as stl tier)
  • Five texts which were originally split between different tapes have been merged, as well as respective parts of recordings. Sentences in each resulting text are numbered throughout
    • PKZ_196X_Alenushka_flk + PKZ_196X_Alenushka_continuation_flk > PKZ_196X_Alenushka_flk
    • End of PKZ_196X_SU0226 starting from PKZ_196X_SU0226.203 (210) + PKZ_196X_Alenushka2_continuation_flk > PKZ_196X_Alenushka2_flk
    • PKZ_196X_BlacksmithAndMerchant_flk + PKZ_196X_BlacksmithAndMerchant_cont_flk > PKZ_196X_BlacksmithAndMerchant_flk
    • PKZ_196X_Finist_flk + PKZ_196X_Finist_continuation_flk > PKZ_196X_Finist_flk
    • PKZ_196X_StupidWolf_flk + PKZ_196X_StupidWolf_continuation_flk > PKZ_196X_StupidWolf_flk
  • Part of the texts are now annotated for existential, locative and possessive predication (ExLocPoss tier, by C.L. Däbritz)
  • Numerous corrections in glosses, other annotations and transcriptions, including:
    • Fuller and more consistent transcription, glossing and annotations of borrowings
    • Vowel length is marked in mp tier in baːzoʔ ‘again’, büːzʼe ‘man’ and saːgər ‘black’
    • Corrections in disambiguation of polysemous or homonymous morphemes: 
      -ziʔ "INS"/"COM", -də "LAT"/"3SG", mo- "can/become/want | мочь/стать/хотеть"
    • Possessive suffix unmarked for case: "NOM/GEN/ACC" > "POSS"
    • Glosses for personal pronouns were changed to uniform labels: "I | я" > "PRO1SG", "we | мы" > "PRO1PL", "you | ты" > "PRO2SG", "you.PL | вы" > "PRO2PL"
    • Fuller annotations of code-switching and calques (CS tier)
  • Added ELAN *.eaf as a supplementary end-user file format for all transcripts

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements

  • Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA).

  • Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by the Institute for the Languages of Finland archive, Helsinki (KOTUS).

  • Scanned pages from the Kai Donners Kamassisches Wörterbuch (Joki 1944) containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society.

  • The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy.

Files (4.7 GB)
Name Size
kamas-2.0-documentation.pdf
md5:be551320e8e3f9f09ff95843c8da92d8
229.2 kB Download
kamas-2.0-mp3only.zip
md5:35631d0a5c5ecdb7f186829f5e87c6fd
492.7 MB Download
kamas-2.0-noaudio.zip
md5:145417dbd05f5304f9fc5a487352f95c
84.9 MB Download
kamas-2.0.zip
md5:d09850583132ebe49983c98957c3c4cd
4.1 GB Download

Cite record as