The International Comparable Corpus: Challenges in building multilingual spoken and written comparable corpora

Ann Čermáková; Jarmo Jantunen; Tommi Jauhiainen; John Kirk; Michal Křen; Marc Kupietz; Elaine Uí Dhonnchadha

doi:10.32714/ricl.09.01.06

Titel

The International Comparable Corpus: Challenges in building multilingual spoken and written comparable corpora

Autor*in

Ann Čermáková

Charles University

Jarmo Jantunen

University of Jyväskylä

Tommi Jauhiainen

University of Helsinki

... show all

Abstract

This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.

Stichwort

ICC corpuscontrastive linguisticscomparable corpusICE corpusdata sustainabilitycopyright

Objekt-Typ

journal article

Sprache

Englisch [eng]

Persistent identifier

https://phaidra.univie.ac.at/o:1602684

DOI

10.32714/ricl.09.01.06

Erschienen in

Titel

Research in Corpus Linguistics

Band

Ausgabe

ISSN

2243-4712

Erscheinungsdatum

2021

Seitenanfang

Seitenende

103

Verlag

Research in Corpus Linguistics

Version

version of record (published version)

Erscheinungsdatum

2021

Zugänglichkeit

open access

Lizenz

CC BY 4.0 International

Rechteangabe

Herunterladen

PDF

Herunterladen

Nutzungsstatistiken

Informationen über ...

Meistgesuchte Services ...