RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

End-to-end deep learning models have pushed forward significantly many tasks of Natural Language Processing (NLP). However, most of these models are trained for languages providing many resources (such as English), and their behaviour is hardly studied in other languages due to resource shortage. To cope with these situations, it is common practice to employ transfer learning. With this work, we wanted to explore the cross-language transferability of a Text-to-Speech (TTS) architecture and the re-usability of the surrounding components that complete a speech synthesis pipeline. To do so, we fine-tuned an English version of the Tacotron 2 TTS, with speaker conditioning, to Italian (hence ITAcotron 2). The human evaluation –carried on 70 subjects– showed that the language adaptation was indeed successful.

ITAcotron 2: Transfering English Speech Synthesis Architectures and Speech Features to Italian

Anna Favaro;Licia Sbattella;Roberto Tedesco;Vincenzo Scotti

2021-01-01

Abstract

End-to-end deep learning models have pushed forward significantly many tasks of Natural Language Processing (NLP). However, most of these models are trained for languages providing many resources (such as English), and their behaviour is hardly studied in other languages due to resource shortage. To cope with these situations, it is common practice to employ transfer learning. With this work, we wanted to explore the cross-language transferability of a Text-to-Speech (TTS) architecture and the re-usability of the surrounding components that complete a speech synthesis pipeline. To do so, we fine-tuned an English version of the Tacotron 2 TTS, with speaker conditioning, to Italian (hence ITAcotron 2). The human evaluation –carried on 70 subjects– showed that the language adaptation was indeed successful.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2021
		
	Titolo del libro
	
			Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)
		
	ISBN (International Standard Book Number)
	
			978-1-955917-18-6
		
	Parole chiave
	
			Italian; Tacotron 2; Natural Language Processing; Text-to-Speech; Conditioned Generation; Voiceprint
		
	Appare nelle tipologie:
	
			04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ITAcotron 2 Transfering English Speech Synthesis.pdf accesso aperto : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 457.61 kB Formato Adobe PDF Visualizza/Apri	457.61 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1187307

Citazioni

ND

4

ND

social impact