Abstract :
[en] This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two levels of annotation are defined with HMM-based speech synthesis in view. Local labels are assigned to all syllables and refer to accentual phenomena. Global labels classify sequences of words into five distinct subgenres, defined in terms of valence and arousal. The objective of the study is to provide a set of labels both related to a specific function and characterized by a distinct acoustic realization. The consideration of these constraints should allow for an automatic prediction of the labels both from the text or from the speech signal. Reasonable inter-annotator scores are achieved for both annotation levels. A prosodic analysis of all labels also shows that they can usually be distinguished by specific acoustic realizations. The integration of this new annotation protocol within HMM-based speech synthesis shows promising results.
Scopus citations®
without self-citations
1