Comparative effectiveness of language modeling algorithms on acoustic level error samples

Wirsz, Steven

Masters Thesis

Comparative effectiveness of language modeling algorithms on acoustic level error samples

Various language models are used by speech recognition programs to improve the accuracy of converting sound to text. Because acoustical interpretation has significant flaws that are unlikely to be resolved anytime soon, the language model as a secondary step of analysis remains very influential in determining the overall accuracy of the speech recognition process. The earliest speech recognition programs used tri-gram language models, later refined to n-grams and hidden Markov models, to improve accuracy. The latest advances in speech recognition have been made by deemphasizing the n-gram model to utilize deep learning and deep neural network models. The goal of this research is to show whether placing more weight on language modeling, as demonstrated with simple tri-gram language modeling applied after the neural network model, can significantly improve the accuracy of voice dictation. To examine this, experiments were performed to classify voice dictation errors into several categorical types, and then apply language modeling trained with different language sources, from very general to very specific. Error sentences were created using Dragon NaturallySpeaking 12.5 by logging the errors which occurred during dictation of sample English language corpii of different types. These language files were then analyzed by the older bi-gram and tri-gram language models to determine which ones produced the greatest statistical difference between incorrect and correct sentences. An analysis of the mistakes made by the output of Dragon NaturallySpeaking 12.5 shows that tri-gram modeling favors correct sentences over the error sentences. Without access to the alternative choices rejected by Dragon NaturallySpeaking, no conclusion can be drawn to the degree that tri-gram modeling might introduce new error, but test results show that utilizing a simple tri-gram language model in addition to neural network and language model analysis already being performed would significantly reduce the number of false positives.

Date

7/5/2016

Resource Type

Masters Thesis

Creator

Wirsz, Steven

Advisor

Barnes, George Michael

Committee Member

Campus

Northridge

Department

Computer Science

Publisher

California State University, Northridge

Degree Level

Masters

Degree Name

M.S.

Subjects

Date Copyright

2016

Date Submitted

2016-05

Date Accessioned

2016-07-05T22:05:03Z

Handle

http://hdl.handle.net/10211.3/173361

["Made available in DSpace on 2016-07-05T22:05:03Z (GMT). No. of bitstreams: 1 Wirsz-Steven-thesis-2016.pdf: 1949723 bytes, checksum: bb3210508a6953ade9b820cfe249bca2 (MD5) Previous issue date: 2016-07-05", "Submitted by Graduate Studies (gradstudies@csun.edu) on 2016-07-05T22:05:03Z No. of bitstreams: 1 Wirsz-Steven-thesis-2016.pdf: 1949723 bytes, checksum: bb3210508a6953ade9b820cfe249bca2 (MD5)"]

Language

English

Statement of Responsibility

by Steven Wirsz

Notes

California State University, Northridge. Department of Computer Science.
Includes bibliographical references (pages 57-58)

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Wirsz-Steven-thesis-2016.pdf	2020-09-18	Public	Download

Downloadable Content

Comparative effectiveness of language modeling algorithms on acoustic level error samples