Head motion synthesis: evaluation and a template motion approach
Date
27/06/2016Author
Braude, David Adam
Metadata
Abstract
The use of conversational agents has increased across the world. From providing automated
support for companies to being virtual psychologists they have moved from
an academic curiosity to an application with real world relevance. While many researchers
have focused on the content of the dialogue and synthetic speech to give the
agents a voice, more recently animating these characters has become a topic of interest.
An additional use for character animation technology is in the film and video game industry
where having characters animated without needing to pay for expensive labour
would save tremendous costs.
When animating characters there are many aspects to consider, for example the way
they walk. However, to truly assist with communication automated animation needs to
duplicate the body language used when speaking. In particular conversational agents
are often only an animation of the upper parts of the body, so head motion is one of
the keys to a believable agent. While certain linguistic features are obvious, such as
nodding to indicate agreement, research has shown that head motion also aids understanding
of speech. Additionally head motion often contains emotional cues, prosodic
information, and other paralinguistic information.
In this thesis we will present our research into synthesising head motion using only
recorded speech as input. During this research we collected a large dataset of head
motion synchronised with speech, examined evaluation methodology, and developed a
synthesis system.
Our dataset is one of the larger ones available. From it we present some statistics
about head motion in general. Including differences between read speech and story
telling speech, and differences between speakers. From this we are able to draw some
conclusions as to what type of source data will be the most interesting in head motion
research, and if speaker-dependent models are needed for synthesis.
In our examination of head motion evaluation methodology we introduce Forced Canonical
Correlation Analysis (FCCA). FCCA shows the difference between head motion
shaped noise and motion capture better than standard methods for objective evaluation
used in the literature. We have shown that for subjective testing it is best practice to
use a variation of MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA)
based testing, adapted for head motion. Through experimentation we have developed
guidelines for the implementation of the test, and the constraints on the length.
Finally we present a new system for head motion synthesis. We make use of simple
templates of motion, automatically extracted from source data, that are warped to
suit the speech features. Our system uses clustering to pick the small motion units,
and a combined HMM and GMM based approach for determining the values of warping
parameters at synthesis time. This results in highly natural looking motion that
outperforms other state of the art systems. Our system requires minimal human intervention
and produces believable motion. The key innovates were the new methods
for segmenting head motion and creating a process similar to language modelling for
synthesising head motion.
Collections
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International
Related items
Showing items related by title, author, creator and subject.
-
Drag reduction by passive in-plane wall motions in turbulent wall-bounded flows
Józsa, Tamás István (The University of Edinburgh, 2018-11-29)Losses associated with turbulent flows dissipate a significant amount of generated energy. Such losses originate from the drag force, which is often described as the sum of the pressure drag and the friction drag. This ... -
Motion-deblurring mechanisms of human visual perception
Paakkonen, Ari Kullervo (The University of Edinburgh, 1993) -
An investigation into interactional synchrony in infants, using motion-capture video technology
Yeaman, Margaret (2009-07-02)Synchrony is a construct that has been applied across the field of interpersonal relations. Condon and Sander (1974a, 1974b) first found a relationship between the movements of young infants and the prosodic patterns of ...