[ English | Japanese ]
This paper presents a singing synthesis system, VocaListener2, that can automatically synthesize a singing voice by mimicking the timbre changes of a userfs singing voice. The system is an extension of our previous VocaListener system which deals with only pitch and dynamics. Most previous techniques for manipulating voice timbre have focused on voice conversion and voice morphing, and they cannot deal with the timbre changes during singing. To develop VocaListener2, we constructed a voice timbre space on the basis of various singing voices that are synchronized under pitch, dynamics, and phoneme by using VocaListener. In this space, the timbre changes can be reflected in the synthesized singing voice. The system was evaluated by the Euclidean distance in the space between an estimated result and a ground-truth under closed/open conditions.
This video shows examples of synthesized singing voices
by comparing the proposed VocaListener2 with the previous VocaListener1.
This video shows the 3-dimensional voice timbre space automatically constructed by VocaListener2
and plays back 7 different voices while indicating each location in this space.
This research utilized the RWC Music Database "RWC-MDB-G-2001" (Music Genre). In our current implementation, VocaListener2 uses commercial singing synthesis software based on Yamaha's Vocaloid or Vocaloid2 technology.
This research was supported in part by CrestMuse, CREST, JST.