Maintenance prévue aujourd'hui de 11h30 à 12h00 (EST). Veuillez éviter les soumissions pendant cette période. Nous nous excusons pour tout inconvénient | Scheduled maintenance today from 11:30 to 12:00 PM (EST). Please avoid submissions during this time. Apologies for any inconvenience.
Repository logo
 

A Real-Time and Automatic Ultrasound-Enhanced Multimodal Second Language Training System: A Deep Learning Approach

Loading...
Thumbnail Image

Date

2020-05-08

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa / University of Ottawa

Abstract

The critical role of language pronunciation in communicative competence is significant, especially for second language learners. Despite renewed awareness of the importance of articulation, it remains a challenge for instructors to handle the pronunciation needs of language learners. There are relatively scarce pedagogical tools for pronunciation teaching and learning, such as inefficient, traditional pronunciation instructions like listening and repeating. Recently, electronic visual feedback (EVF) systems (e.g., medical ultrasound imaging) have been exploited in new approaches in such a way that they could be effectively incorporated in a range of teaching and learning contexts. Evaluation of ultrasound-enhanced methods for pronunciation training, such as multimodal methods, has asserted that visualizing articulator’s system as biofeedback to language learners might improve the efficiency of articulation learning. Despite the recent successful usage of multimodal techniques for pronunciation training, manual works and human manipulation are inevitable in many stages of those systems. Furthermore, recognizing tongue shape in noisy and low-contrast ultrasound images is a challenging job, especially for non-expert users in real-time applications. On the other hand, our user study revealed that users could not perceive the placement of their tongue inside the mouth comfortably just by watching pre-recorded videos. Machine learning is a subset of Artificial Intelligence (AI), where machines can learn by experiencing and acquiring skills without human involvement. Inspired by the functionality of the human brain, deep artificial neural networks learn from large amounts of data to perform a task repeatedly. Deep learning-based methods in many computer vision tasks have emerged as the dominant paradigm in recent years. Deep learning methods are powerful in automatic learning of a new job, while unlike traditional image processing methods, they are capable of dealing with many challenges such as object occlusion, transformation variant, and background artifacts. In this dissertation, we implemented a guided language pronunciation training system, benefits from the strengths of deep learning techniques. Our modular system attempts to provide a fully automatic and real-time language pronunciation training tool using ultrasound-enhanced augmented reality. Qualitatively and quantitatively assessments indicate an exceptional performance for our system in terms of flexibility, generalization, robustness, and autonomy outperformed previous techniques. Using our ultrasound-enhanced system, a language learner can observe her/his tongue movements during real-time speech, superimposed on her/his face automatically.

Description

Keywords

Image Processing, Machine Learning, Deep Learning, Ultrasound Image Analysis, Tongue Imaging, Ultrasound Tongue Contour Tracking and Extraction, Automatic Object Tracking, Image Segmentation, Convolutional Networks, Computer Vision, Real-time Image Processing, Ultrasound-Enhanced Pronunciation Training, Pattern Recognision, Data Mining

Citation