This paper introduces a new unsupervised and score-informed
method for the segmentation of singing voice into syllables. The
main idea of the proposed method is to detect the syllable onset
on a probability density function by incorporating a priori syllable
duration derived from the score. Firstly, intensity profiles
are used to exploit the characteristics of singing voice depending
on the Mel-frequency regions. Then, the syllable onset probability
density function is obtained by selecting ...
This paper introduces a new unsupervised and score-informed
method for the segmentation of singing voice into syllables. The
main idea of the proposed method is to detect the syllable onset
on a probability density function by incorporating a priori syllable
duration derived from the score. Firstly, intensity profiles
are used to exploit the characteristics of singing voice depending
on the Mel-frequency regions. Then, the syllable onset probability
density function is obtained by selecting candidates over
the intensity profiles and weighted for the purpose of emphasizing
the onset regions. Finally, the syllable duration distribution
shaped by the score is incorporated into Viterbi decoding to determine
the optimal sequence of onset time positions. The proposed
method outperforms conventional methods for the segmentation
of syllable on a jingju (also known as Peking or Beijing opera) a
cappella dataset. An analysis is conducted on precision errors to
provide direction for future improvement.
+