Word length and the principle of least effort: language as an evolving, efficient code for information transfer
View/ Open
Date
02/07/2018Author
Kanwal, Jasmeen Kaur
Metadata
Abstract
In 1935 the linguist George Kingsley Zipf made a now classic observation about the
relationship between a word’s length and its frequency: the more frequent a word is,
the shorter it tends to be. He claimed that this “Law of Abbreviation” is a universal
structural property of language. The Law of Abbreviation has since been documented
in a wide range of human languages, and extended to animal communication systems
and even computer programming languages. Zipf hypothesised that this universal
design feature arises as a result of individuals optimising form-meaning mappings
under competing pressures to communicate accurately but also efficiently—his famous
Principle of Least Effort.
In this thesis, I present a novel set of studies which provide direct experimental evidence
for this explanatory hypothesis. Using a miniature artificial language learning
paradigm, I show in Chapter 2 that language users optimise form-meaning mappings
in line with the Law of Abbreviation only when pressures for accuracy and efficiency
both operate during a communicative task. These results are robust across different
methods of data collection: one version of the experiment was run in the lab, and
another was run online, using a novel method I developed which allows participants
to partake in dyadic interaction through a web-based interface.
In Chapter 3, I address the growing body of work suggesting that a word’s predictability
in context may be an even stronger determiner of its length than its frequency
alone. For instance, Piantadosi et al. (2011) show that shorter words have a
lower average surprisal (i.e., tend to appear in more predictive contexts) than longer
words, in synchronic corpora across many languages. We hypothesise that the same
communicative pressures posited by the Principle of Least Effort, when acting on
speakers in situations where context manipulates the information content of words,
can give rise to these lexical distributions. Adapting the methodology developed in
Chapter 2, I show that participants use shorter words in more predictive contexts only
when subject to the competing pressures for accurate and efficient communication. In
a second experiment, I show that participants are more likely to use shorter words for
meanings with a lower average surprisal. These results suggest that communicative
pressures acting on individuals during language use can lead to the re-mapping of a
lexicon to align with “Uniform Information Density”, the principle that information
content ought to be evenly spread across an utterance, such that shorter linguistic
units carry less information than longer ones.
Over generations, linguistic behaviour such as that observed in the experiments
reported here may bring entire lexicons into alignment with the Law of Abbreviation
and Uniform Information Density. For this to happen, a diachronic process which
leads to permanent lexical change is necessary. However, crucial evidence for this
process—decreasing word length as a result of increasing frequency over time—has
never before been systematically documented in natural language. In Chapter 4,
I conduct the first large-scale diachronic corpus study investigating the relationship
between word length and frequency over time, using the Google Books Ngrams corpus
and three different word lists covering both English and French. Focusing on words
which have both long and short variants (e.g., info/information), I show that the
frequency of a word lemma may influence the rate at which the shorter variant gains
in popularity. This suggests that the lexicon as a whole may indeed be gradually
evolving towards greater efficiency.
Taken together, the behavioural and corpus-based evidence presented in this thesis
supports the hypothesis that communicative pressures acting on language-users are at
least partially responsible for the frequency-length and surprisal-length relationships
found universally across lexicons. More generally, the approach taken in this thesis
promotes a view of language as, among other things, an evolving, efficient code for
information transfer.