Multistream Articulatory Feature-Based Models for Visual Speech Recognition

Saenko, K.; Livescu, K.; Glass, J.; Darrell, T.

Author(s)

Glass, James R.; Saenko, Ekaterina; Livescu, Karen; Darrell, Trevor J.

DownloadSaenko-2009-Multistream Articulatory Feature-Based Models for Visual Speech Recognition.pdf (837.4Kb)

PUBLISHER_POLICY

Terms of use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Metadata

Show full item record

Abstract

We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DBN)-based models consisting of multiple sequences of hidden states, each corresponding to an articulatory feature (AF) such as lip opening (LO) or lip rounding (LR). A bank of discriminative articulatory feature classifiers provides input to the DBN, in the form of either virtual evidence (VE) (scaled likelihoods) or raw classifier margin outputs. We present experiments on two tasks, a medium-vocabulary word-ranking task and a small-vocabulary phrase recognition task. We show that articulatory feature-based models outperform baseline models, and we study several aspects of the models, such as the effects of allowing articulatory asynchrony, of using dictionary-based versus whole-word models, and of incorporating classifier outputs via virtual evidence versus alternative observation models.

Date issued

2009-09

URI

http://hdl.handle.net/1721.1/60293

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Journal

IEEE Transactions on Pattern Analysis and Machine Intelligence

Publisher

Institute of Electrical and Electronics Engineers

Citation

Version: Final published version

Other identifiers

INSPEC Accession Number: 10773214

ISSN

0162-8828

Collections

MIT Open Access Articles

DSpace@MIT