Speech processing using digital MEMS microphones
View/Open
Date
28/11/2013Author
Zwyssig, Erich Paul
Metadata
Abstract
The last few years have seen the start of a unique change in microphones for consumer
devices such as smartphones or tablets. Almost all analogue capacitive microphones
are being replaced by digital silicon microphones or MEMS microphones.
MEMS microphones perform differently to conventional analogue microphones. Their
greatest disadvantage is significantly increased self-noise or decreased SNR, while
their most significant benefits are ease of design and manufacturing and improved sensitivity
matching.
This thesis presents research on speech processing, comparing conventional analogue
microphones with the newly available digital MEMS microphones. Specifically, voice
activity detection, speaker diarisation (who spoke when), speech separation and speech
recognition are looked at in detail.
In order to carry out this research different microphone arrays were built using digital
MEMS microphones and corpora were recorded to test existing algorithms and devise
new ones. Some corpora that were created for the purpose of this research will be
released to the public in 2013.
It was found that the most commonly used VAD algorithm in current state-of-theart
diarisation systems is not the best-performing one, i.e. MLP-based voice activity
detection consistently outperforms the more frequently used GMM-HMM-based VAD
schemes. In addition, an algorithm was derived that can determine the number of active
speakers in a meeting recording given audio data from a microphone array of known
geometry, leading to improved diarisation results.
Finally, speech separation experiments were carried out using different post-filtering
algorithms, matching or exceeding current state-of-the art results.
The performance of the algorithms and methods presented in this thesis was verified
by comparing their output using speech recognition tools and simple MLLR adaptation
and the results are presented as word error rates, an easily comprehensible scale.
To summarise, using speech recognition and speech separation experiments, this thesis
demonstrates that the significantly reduced SNR of the MEMS microphone can be
compensated for with well established adaptation techniques such as MLLR. MEMS
microphones do not affect voice activity detection and speaker diarisation performance.