Modeling Allophonic Rule Learning with Distributional and Phonetic Factors

Warren, Mariah

View/Open

WarrenMSc2012.pdf (423.5Kb)

Date

28/11/2012

Item status

Restricted Access

Author

Warren, Mariah

Metadata

Show full item record

Abstract

A fundamental task faced by infants during language acquisition is acquiring the phonological structure of the native language, including the abstract phonemic categories and the phonological rules relating the underlying categories to phonetic surface forms. Allophones are phonetic variants of phonemes which appear in select, non-overlapping phonetic contexts. Acquiring allophonic rules is necessary for infants to fully construct their phonological inventory. Statistical learning, including a sensitivity to the distribution of phonetic segments and their contexts, may play a role in infant acquisition of allophones. Two computational models of allophonic rule induction were explored. First, a statistical algorithm by Peperkamp et al (2006) was re-implemented. It detects allophones by searching for segments in complementary distribution using the Kullback-Leibler measure. The model was tested with an artificial language and a corpus of phonemically-transcribed speech, and linguistic filters were used to discard falsely identified allophones. The model successful detected allophones in noise and with varying numbers of rules and contexts, but did not scale well. A novel phonetic distance filter was applied with superior results. Secondly, a novel model was developed which represents segments as vectors in a context space, and identifies allophonic pairs based on contextual non-overlap, segment frequency, and sonority. The model detected 5 (out of 7) allophones with very high accuracy from a corpus of phonemically-transcribed speech. The results from both models indicate that both distributional and phonetic information are required for allophonic rule learning, and that segment frequency, corpus size, and context size interact to affect the performance of both models. Finally, the limitations and assumptions of the models are discussed.

URI

http://hdl.handle.net/1842/8496

Collections

Linguistics and English Language Masters thesis collection