Sequence-to-Sequence Emotional Voice Conversion With Strength Control

Cited 13 time in webofscience Cited 0 time in scopus
  • Hit : 287
  • Download : 0
This paper proposes an improved emotional voice conversion (EVC) method with emotional strength and duration controllability. EVC methods without duration mapping generate emotional speech with identical duration to that of the neutral input speech. In reality, even the same sentences would have different speeds and rhythms depending on the emotions. To solve this, the proposed method adopts a sequence-to-sequence network with an attention module that enables the network to learn attention in the neutral input sequence should be focused on which part of the emotional output sequence. Besides, to capture the multi-attribute aspects of emotional variations, an emotion encoder is designed for transforming acoustic features into emotion embedding vectors. By aggregating the emotion embedding vectors for each emotion, a representative vector for the target emotion is obtained and weighted to reflect emotion strength. By introducing a speaker encoder, the proposed method can preserve speaker identity even after the emotion conversion. Objective and subjective evaluation results confirm that the proposed method is superior to other previous works. Especially, in emotion strength control, we achieve in getting successful results.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Issue Date
2021-03
Language
English
Article Type
Article
Citation

IEEE ACCESS, v.9, pp.42674 - 42687

ISSN
2169-3536
DOI
10.1109/ACCESS.2021.3065460
URI
http://hdl.handle.net/10203/282296
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 13 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0