Paper published in a journal (Scientific congresses and symposiums)
Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
Pironkov, Gueorgui; Dupont, Stéphane; Dutoit, Thierry
2016
 

Files


Full Text
pironkov2016spk_ivecMTL.pdf
Publisher postprint (254.79 kB)
Download

All documents in ORBi UMONS are protected by a user license.

Send to



Details



Abstract :
[en] Overfitting is a commonly met issue in automatic speech recognition and is especially impacting when the amount of training data is limited. In order to address this problem, this article investigates acoustic modeling through Multi-Task Learning, with two speaker-related auxiliary tasks. Multi-Task Learning is a regularization method which aims at improving the network's generalization ability, by training a unique model to solve several different, but related tasks. In this article, two auxiliary tasks are jointly examined. On the one hand, we consider speaker classification as an auxiliary task by training the acoustic model to recognize the speaker, or find the closest one inside the training set. On the other hand, the acoustic model is also trained to extract i-vectors from the standard acoustic features. I-Vectors are efficiently applied in the speaker identification community in order to characterize a speaker and its acoustic environment. The core idea of using these auxiliary tasks is to give the network an additional inter-speaker awareness, and thus, reduce overfitting. We investigate this Multi-Task Learning setup on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.
Disciplines :
Electrical & electronics engineering
Library & information sciences
Author, co-author :
Pironkov, Gueorgui ;  Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle
Dupont, Stéphane  ;  Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle
Dutoit, Thierry ;  Université de Mons > Faculté Polytechnique > Service Information, Signal et Intelligence artificielle
Language :
English
Title :
Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
Publication date :
04 December 2016
Event name :
International Conference on Pattern Recognition
Event place :
Cancun, Mexico
Event date :
2016
Research unit :
F105 - Information, Signal et Intelligence artificielle
Research institute :
R300 - Institut de Recherche en Technologies de l'Information et Sciences de l'Informatique
R450 - Institut NUMEDIART pour les Technologies des Arts Numériques
Available on ORBi UMONS :
since 16 January 2017

Statistics


Number of views
1 (0 by UMONS)
Number of downloads
6 (0 by UMONS)

Scopus citations®
 
7
Scopus citations®
without self-citations
3

Bibliography


Similar publications



Contact ORBi UMONS