End-to-end deep learning models have pushed forward significantly many tasks of Natural Language Processing (NLP). However, most of these models are trained for languages providing many resources (such as English), and their behaviour is hardly studied in other languages due to resource shortage. To cope with these situations, it is common practice to employ transfer learning. With this work, we wanted to explore the cross-language transferability of a Text-to-Speech (TTS) architecture and the re-usability of the surrounding components that complete a speech synthesis pipeline. To do so, we fine-tuned an English version of the Tacotron 2 TTS, with speaker conditioning, to Italian (hence ITAcotron 2). The human evaluation –carried on 70 subjects– showed that the language adaptation was indeed successful.

ITAcotron 2: Transfering English Speech Synthesis Architectures and Speech Features to Italian

Licia Sbattella;Roberto Tedesco;Vincenzo Scotti
2021-01-01

Abstract

End-to-end deep learning models have pushed forward significantly many tasks of Natural Language Processing (NLP). However, most of these models are trained for languages providing many resources (such as English), and their behaviour is hardly studied in other languages due to resource shortage. To cope with these situations, it is common practice to employ transfer learning. With this work, we wanted to explore the cross-language transferability of a Text-to-Speech (TTS) architecture and the re-usability of the surrounding components that complete a speech synthesis pipeline. To do so, we fine-tuned an English version of the Tacotron 2 TTS, with speaker conditioning, to Italian (hence ITAcotron 2). The human evaluation –carried on 70 subjects– showed that the language adaptation was indeed successful.
2021
Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)
978-1-955917-18-6
Italian; Tacotron 2; Natural Language Processing; Text-to-Speech; Conditioned Generation; Voiceprint
File in questo prodotto:
File Dimensione Formato  
ITAcotron 2 Transfering English Speech Synthesis.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 457.61 kB
Formato Adobe PDF
457.61 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1187307
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact