Dysarthric Speech Recognition Using Pseudo-Labeling, Self-Supervised Feature Learning, and a Joint Multi-Task Learning Approach

Takashima, Ryoichi; Sawa, Yuya; Aihara, Ryo; Takiguchi, Tetsuya; Imai, Yoshie

https://hdl.handle.net/20.500.14094/0100488375

このアイテムのアクセス数:17件（2024-05-21 13:10 集計）

閲覧可能ファイル

ファイル	フォーマット	サイズ	閲覧回数	説明
0100488375 (fulltext)	pdf	1.19 MB	18

メタデータ

ファイル出力

メタデータID	0100488375
アクセス権	open access
出版タイプ	Version of Record
タイトル	Dysarthric Speech Recognition Using Pseudo-Labeling, Self-Supervised Feature Learning, and a Joint Multi-Task Learning Approach
著者	Takashima, Ryoichi ; Sawa, Yuya ; Aihara, Ryo ; Takiguchi, Tetsuya ; Imai, Yoshie
著者ID A2510 研究者ID 1000050846102 KUID https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=df7a61d0afafcfc6520e17560c007669 著者名 Takashima, Ryoichi 髙島, 遼一タカシマ, リョウイチ所属機関名都市安全研究センター
著者名 Sawa, Yuya
著者名 Aihara, Ryo
著者ID A1279 研究者ID 1000040397815 ORCID 0000-0001-5005-7679 KUID https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=b3ec2a1710d8267b520e17560c007669 著者名 Takiguchi, Tetsuya 滝口, 哲也タキグチ, テツヤ所属機関名都市安全研究センター
著者名 Imai, Yoshie
収録物名	IEEE Access
巻(号)	12
ページ	36990-36999
出版者	Institute of Electrical and Electronics Engineers (IEEE)
刊行日	2024-03-07
公開日	2024-04-02
抄録	In this paper, we investigate the use of the spontaneous speech of dysarthric people for training an automatic speech recognition (ASR) model for them. Although the spontaneous speech of dysarthric people can be collected relatively easily compared to script-reading speech, which is obtained by having them read a prepared script, labeling the spontaneous speech of dysarthric people is very difficult and costly. For training an ASR model using unlabeled speech data, pseudo-labeling and self-supervised feature learning have been studied as effective approaches; however, the effectiveness of these approaches has not been clear when they are applied to the unlabeled dysarthric speech. In addition, pseudo-labeling may not be effective since the pseudo-labels of dysarthric speech include many errors and are not reliable. In this paper, we evaluate the above two approaches for the dysarthric speech recognition, and we propose a multi-task learning approach, which combines these approaches to train an ASR model that is robust against the errors in the pseudo-labels. Experimental results using Japanese and English datasets demonstrated that all approaches are effective, but among them, the proposed multi-task learning approach showed the best performance.
キーワード	Speech recognition
	dysarthria
	pseudo-labeling
	self-supervised feature learning
カテゴリ	都市安全研究センター
カテゴリ	学術雑誌論文
権利	© 2024 The Authors.
権利	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

資源タイプ	journal article
言語	English (英語)
eISSN	2169-3536　OPACで所蔵を検索　 CiNiiで学外所蔵を検索
関連情報	DOI https://doi.org/10.1109/ACCESS.2024.3374874

閲覧可能ファイル

メタデータ

詳細を表示