A New Prosody Annotation Protocol for Live Sports Commentaries

Brognaux, Sandrine; Picart, Benjamin; Drugman, Thomas

Request a copy

Paper published in a journal (Scientific congresses and symposiums)

A New Prosody Annotation Protocol for Live Sports Commentaries

Brognaux, Sandrine; Picart, Benjamin; Drugman, Thomas

2013

Permalink
https://hdl.handle.net/20.500.12907/41581

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

interspeech2013_proso_sbbptd.pdf

Author postprint (612.09 kB)

Request a copy

All documents in ORBi UMONS are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

[en] Expressive speech synthesis; [en] Prosody; [en] Sports Commentaries

Abstract :

[en] This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two levels of annotation are defined with HMM-based speech synthesis in view. Local labels are assigned to all syllables and refer to accentual phenomena. Global labels classify sequences of words into five distinct subgenres, defined in terms of valence and arousal. The objective of the study is to provide a set of labels both related to a specific function and characterized by a distinct acoustic realization. The consideration of these constraints should allow for an automatic prediction of the labels both from the text or from the speech signal. Reasonable inter-annotator scores are achieved for both annotation levels. A prosodic analysis of all labels also shows that they can usually be distinguished by specific acoustic realizations. The integration of this new annotation protocol within HMM-based speech synthesis shows promising results.

Disciplines :

Electrical & electronics engineering

Author, co-author :

Brognaux, Sandrine

Picart, Benjamin ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Drugman, Thomas ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Language :

English

Title :

A New Prosody Annotation Protocol for Live Sports Commentaries

Publication date :

24 May 2013

Event name :

Interspeech 2013

Event place :

Lyon, France

Event date :

2013

Research unit :

F105 - Information, Signal et Intelligence artificielle

Research institute :

R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Available on ORBi UMONS :

since 23 January 2014

Statistics

Number of views

3 (0 by UMONS)

Number of downloads

0 (0 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

N. Campbell, "Conversational speech synthesis and the need for some laughter, " IEEE Transactions on Acoustics Speech and Signal Processing, vol. 14(4), pp. 1171-1179, 2006.
J. Trouvain, "Between excitement and triumph - live football commentaries in radio vs. TV, " in 17th International Congress of Phonetic Sciences (ICPhS XVII), 2011.
F. Kern, Prosody in Interaction. John Benjamins, 2010, Speaking Dramatically, The Prosody of Live Radio Commentary of Football Matches, pp. 217-237.
S. Audrit, T. Psir, A. Auchlin, and J.-P. Goldman, "Sport in the media: A contrasted study of three sport live media reports with semi-automatic tools, " in Speech Prosody, 2012.
J. Trouvain and W. Barry, "The prosody of excitement in horse race commentaries, " in ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, 2000, pp. 86-91.
N. Obin, V. Dellwo, A. Lacheret, and X. Rodet, "Expectations for discourse genre identification, " in Interspeech, 2010.
R. Odgen, "We speak prosodies and we listen to them, " in Symposium on Prosody and Interaction, 2001.
J.-P. Goldman, "Prosodyn: A graphical representation of macroprosody for phonostylistic ambiance change detection, " in Speech Prosody, 2012.
K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert, and J. Hirschberg, "Tobi: A standard for labeling english prosody, " in International Conference on Spoken Language Processing (ICSLP), 1992, pp. 867-870.
P. Mertens, "L'intonation du français. de la description linguistique la reconnaissance automatique." Ph.D. dissertation, Univ. Leuven (Belgium), 1987.
J.-P. Goldman, "Easyalign: An automatic phonetic alignment tool under Praat, " in Interspeech, 2011, pp. 3233-3236.
S. Brognaux, S. Roekhaut, T. Drugman, and R. Beaufort, "Train & Align: A new online tool for automatic phonetic alignments, " in IEEE Workshop on Spoken Language Technologies, 2012.
V. Colotte and R. Beaufort, "Linguistic features weighting for a text-to-speech system without prosody model, " in Interspeech, 2005, pp. 2549-2552.
A. Di Cristo, "Vers une modelisation de l'accentuation du francais : deuxieme partie, " Journal of French Studies, vol. 10, pp. 27-44, 2000.
T. Drugman, J. Kane, and C. Goble, "Resonator-based creaky voice detection, " in Interspeech, 2012.
A. Mehrabian and J. A. Russel, An Approach to Environmental Psychology. MIT Press, 1974.
P. Boersma and D. Weenink. (2009, May) Praat: doing phonetics by computer (version 5.1.05) [computer program]. [Online]. Available: http://www.praat.org.
J.-P. Goldman, A. Auchlin, S. Roekhaut, A. C. Simon, and M. Avanzi, "Prominence perception and accent detection in french. A corpus-based account." in Speech Prosody, 2010.
J. Cohen, "A coefficient of agreement for nominal scales, " Educational and Psychological Measurement, vol. 20(1), pp. 37-46, 1960.
P. Mertens, "The prosogram: Semi-automatic transcription of prosody based on a tonal perception model, " in Speech Prosody, 2004.
G. Peeters, "A large set of audio features for sound description (similarity and classification) in the cuidado project, " 2003.
T. Drugman and A. Alwan, "Joint robust voicing detection and pitch estimation based on residual harmonics, " in Interspeech, 2011.
A. Lacheret-Dujour and F. Beaugendre, La prosodie du francais. Paris: CNRS Editions, 1999.
A. Seguinot, "L'accent d'insistance en francais standard, " Studia Phonetica, vol. 12, 1976.
J.-P. Goldman, M. Avanzi, A. Lacheret-Dujour, A. C. Simon, and A. Auchlin, "A methodology for the automatic detection of perceived prominent syllables in spoken french, " in Interspeech, 2007, pp. 98-101.
J.-P. Goldman, M. Avanzi, A. Auchlin, and A. C. Simon, "A continuous prominence score based on acoustic features, " in Interspeech, 2012.
H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51(11), pp. 1039-1064, 2009.
Hmm-based speech synthesis system (hts). [Online]. Available: http://hts.sp.nitech.ac.jp.
K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Melgeneralized cepstral analysis - A unified approach to speech spectral estimation, " in International Conference on Spoken Language Processing (ICSLP), 1994, pp. 1043-1046.
T. Drugman and T. Dutoit, "The deterministic plus stochastic model of the residual signal and its applications, " IEEE Transactions on Audio, Speech and Language Processing, vol. 20(3), pp. 968-981, 2012.