Bestgen, Yves
[UCL]
In textual statistics, as in natural language processing and corpus linguistics, the study of sequences of contiguous words that occur together more often than by chance is a major topic of interest. In the case of pairs of words, Fisher’s exact test is becoming the reference index to identify them. The objective of this study is to propose a generalization of this index to the analysis of trigrams and longer sequences using a Monte-Carlo procedure. The results of an initial evaluation suggest that this approach could complement other indices, but also that it has a major drawback: a large number of trigrams get a maximum score of collocation.
Bibliographic reference |
Bestgen, Yves. Extraction automatique de collocations : Peut-on étendre le test exact de Fisher à des séquences de plus de 2 mots?.JADT 2014 (Inalco (Paris), du 03/06/2014 au 06/06/2014). In: Actes de JADT 2014, 2014, p. 79-90 |
Permanent URL |
http://hdl.handle.net/2078.1/162121 |