[ English | Japanese ]

VocaListener2: A Singing Synthesis System Able to Mimic a User's Singing in Terms of Voice Timbre Changes As Well As Pitch and Dynamics

This project is proposed and researched by Tomoyasu Nakano and Masataka Goto.

Abstract:

This paper presents a singing synthesis system, VocaListener2, that can automatically synthesize a singing voice by mimicking the timbre changes of a userfs singing voice. The system is an extension of our previous VocaListener system which deals with only pitch and dynamics. Most previous techniques for manipulating voice timbre have focused on voice conversion and voice morphing, and they cannot deal with the timbre changes during singing. To develop VocaListener2, we constructed a voice timbre space on the basis of various singing voices that are synchronized under pitch, dynamics, and phoneme by using VocaListener. In this space, the timbre changes can be reflected in the synthesized singing voice. The system was evaluated by the Euclidean distance in the space between an estimated result and a ground-truth under closed/open conditions.

Figure 1. Overview of VocaListener2

Demonstrations:

VocaListener2 Demo

[VocaListener2 Demonstration: synthesized results]

This video shows examples of synthesized singing voices
by comparing the proposed VocaListener2 with the previous VocaListener1.

Song: Tairyo Bune (RWC-MDB-G-2001 No.91)
Singing synthesis software: Hatsune Miku (Vocaloid2) and MIKU Append (Vocaloid2)
(YouTube)

VocaListener2 Demo 02

[VocaListener2 Demonstration: constructed voice timbre space]

This video shows the 3-dimensional voice timbre space automatically constructed by VocaListener2
and plays back 7 different voices while indicating each location in this space.

Song: Tairyo Bune (RWC-MDB-G-2001 No.91)
Singing synthesis software: Hatsune Miku (Vocaloid2) and MIKU Append (Vocaloid2)
(YouTube)

Acknowledgments:

This research utilized the RWC Music Database "RWC-MDB-G-2001" (Music Genre). In our current implementation, VocaListener2 uses commercial singing synthesis software based on Yamaha's Vocaloid or Vocaloid2 technology.

This research was supported in part by CrestMuse, CREST, JST.

Reference:

Masataka Goto, Tomoyasu Nakano, Shuuji Kajita, Yosuke Matsusaka, Shin'ichiro Nakaoka, and Kazuhito Yokoi:
"VocaListener and VocaWatcher: Imitating a Human Singer by Using Signal Processing",
In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012),
pp.5393-5396, March 2012.
[PDF]
Tomoyasu Nakano and Masataka Goto.
VocaListener2: A Singing Synthesis System Able to Mimic a User's Singing in Terms of Voice Timbre Changes As Well As Pitch and Dynamics.
In Proceedings of the 36th International Conference on Acoustics, Speech and Signal Processing (ICASSP2011).
[Paper PDF] pp.453-456, May 2011.

Tomoyasu Nakano and Masataka Goto