GerCo: German Adjective-Noun Collocations Datase
Description
The dataset contains 4732 adjective-noun pairs extracted from the DWDS corpora [1] with the application Wortprofil [2]. All the phrases have been annotated by two experts as collocations vs non-collocations. The non-collocations have been further classified by one of the annotators as free phrases, idioms, named entities, and terms.
If you want to use this dataset for research purposes, please refer to the following paper:
- Yana Strakatova, Neele Falk, Isabel Fuhrmann, Daniela Rossmann, Erhard Hinrichs. All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German. 2019.
References:
[1]: DWDS – Digitales Wörterbuch der deutschen Sprache. Das Wortauskunftssystem zur deutschen Sprache in Geschichte und Gegenwart, hrsg. v. d. Berlin-Brandenburgischen Akademie der Wissenschaften.
[2]: DWDS-Wortprofil, erstellt durch das Digitale Wörterbuch der deutschen Sprache.
Abstract (German)
Gegenstand des Projektes ist die lexikalisch-semantische Modellierung und Beschreibung unterschiedlicher Aspekte von Kollokationen. Die wesentlichen Forschungsfragen in diesem Projekt lauten: a) auf welche Weise und in welchem theoretischen Rahmen die lexikalisch-semantische Gruppierung von Kollokanten zu einer Kollokationsbasis gruppiert und beschrieben werden können; b) ob und in welcher Weise Kollokanten, die mehreren Kollokationsbasen auf einer paradigmatischen Achse gemeinsam sind, zu einer generalisierten Beschreibung dieser Kollokationsbasen beitragen können. Im Rahmen dieses Projekts werden die Meaning-Text Theorie von Igor Melauk, besonders die Lexikalischen Funktionen, und die Theorie des Generativen Lexikons von James Pustejovsky, besonders die Qualiarollen, als gemeinsamer Rahmen zusammengeführt. Es wird untersucht, ob eine angemessene Modellierung oben beschriebener Aspekte von Kollokationen mit einer Synthese der beiden genannten theoretischen Ansätze möglich ist. Ein zentrales Ergebnis des Projektes wird eine Handreichung für die Modellierung von Kollokationen mit den aus dem Übergreifenden theoretischen Rahmen abgeleiteten Beschreibungsmitteln sein.
Abstract (English)
The project aims at modeling and describing lexical-semantic properties of collocations. The main research topics of the project are: a) how and according to which theoretical framework the lexical-semantic grouping of collocations can be performed and described; b) whether and how it is possible to group collocators by their semantic similarity and to compare and represent (sets of) collocations based on the semantic relatedness of their collocational bases. The research will draw on the "Meaning Text Theory", which has been elaborated by Igor Mel'čuk and which provides the concept of "Lexical Function", and on the theory of the "Generative Lexicon" by James Pustejovsky, providing the concept of "qualia structure". The possibility of synthesizing the two theoretical frameworks to build an appropriate and adequate modelling of the described aspects of collocations will be investigated. A central outcome of the project will be a set of guidelines for modeling collocations based on the resulting specification of the cross-theoretical framework.
Files
Name | Size | Actions |
---|---|---|
md5:ee46d0efa35f8e49ccb1fb3c4027a9d8
|
275.6 KiB | Preview Download |
md5:133cb42265d6d76a2a1f822e2c8a5d4d
|
61.8 KiB | Preview Download |
md5:4da30d2dbeb2a3766e5bd515ed91f0cc
|
45.7 KiB | Preview Download |
md5:107c9178c2b95ec1b349dca03530d984
|
59.0 KiB | Preview Download |
md5:4c39be3a3246ebae18e166f6b486513f
|
130.5 KiB | Preview Download |
md5:31cd8693eeaef24080d270ff247267d4
|
129.7 KiB | Preview Download |
md5:511aa886d8f722f8855dadafbb76e072
|
128.9 KiB | Preview Download |
md5:c5a9fc1af7920052940ce0a27266af10
|
264.7 KiB | Preview Download |
md5:885c37f74798d30362f97ab6aa9801d6
|
131.3 KiB | Preview Download |
md5:c525fccf089dd995664b4a19aef2f387
|
262.8 KiB | Preview Download |
md5:6213f29ba052896d53ca3c44d1695df4
|
129.8 KiB | Preview Download |
md5:d6e41f81d8226afa0f13280976135cab
|
265.5 KiB | Preview Download |
md5:41e2fc72fcd8af92bde77940eb26b68a
|
132.5 KiB | Preview Download |
md5:e1fb9d98bdf7004a70df15ab8e0512c9
|
235.6 KiB | Preview Download |
md5:e7b9d0a684cfde5db5a89c21be3f13b2
|
1.2 MiB | Preview Download |
md5:82141d8d94397a383a9fc44cc1966f69
|
92.7 KiB | Preview Download |
md5:374050eb57dc40c0b459c204df6b52e1
|
256.7 KiB | Preview Download |
md5:b959b635d812f9602d406c38b11caf2c
|
1.1 MiB | Preview Download |
md5:59cded7fda02cc8092285fdb3a159789
|
1.1 MiB | Preview Download |
md5:8f51116dfd101eab5e983872eadbdd68
|
1.1 MiB | Preview Download |
md5:cc4ed0b92fa21d7cd45b333508c9e0c2
|
1.1 MiB | Preview Download |
md5:fd9d4300df5aae2a4b85007f9d9b78c8
|
1.1 MiB | Preview Download |
md5:9a15a7b830e708a54a05adf92a370f62
|
62.4 KiB | Preview Download |
md5:a63ea88997c40e6c738dcf7fb91446f3
|
62.5 KiB | Preview Download |
md5:cc6d74310da48f33b370f4e68656ec11
|
60.8 KiB | Preview Download |
md5:3531498a39f0c8f74ac861ef8337ac1b
|
63.7 KiB | Preview Download |
md5:f3a25e72ea94372a60fbf5836b8a2bca
|
4.7 KiB | Preview Download |
md5:b6f4e78e422d5d6bb76513c1066781f4
|
125.0 KiB | Preview Download |
md5:8095a5ad9f473606fa57d9bd5fdbbcdc
|
131.8 KiB | Preview Download |
md5:64e7c945e343b9b2457fff493ebb0b97
|
124.1 KiB | Preview Download |
md5:c757d3f734080e07eebe1d8230b04853
|
118.4 KiB | Preview Download |
md5:6aad48f8252d29d62a7ffd730012972f
|
112.9 KiB | Preview Download |
md5:7740c1be832d9d1d0bfb7bb3b603b77f
|
128.3 KiB | Preview Download |
md5:2ae1a5e381b6ac7c00ec80f0f15c0ed6
|
553.6 KiB | Preview Download |
md5:40b429aab475755c13d554e1c6279fbb
|
552.6 KiB | Preview Download |
md5:c054737a858b7ceb9295f8ddf57de692
|
547.8 KiB | Preview Download |
md5:a170310f2897adb81b93fec8ca49a915
|
559.6 KiB | Preview Download |
md5:6250ae5891a862918c74a40830aa5281
|
565.0 KiB | Preview Download |
md5:347edb364b3fffa09ff425e659e3918f
|
553.2 KiB | Preview Download |
Additional details
- Accuracy
Not specified.
- Completeness
Not specified.
- Conformity
Not specified.
- Consistency
Not specified.
- Credibility
Not specified.
- Processability
Not specified.
- Relevance
Not specified.
- Timeliness
Not specified.
- Understandability
Not specified.