Research Demonstration for ISMIR 2004

The PDF files of this paper and slides are available at this link.

Speech-Recognition Interfaces for Music Information Retrieval: ``Speech Completion'' and ``Speech Spotter''

Masataka Goto†, Katunobu Itou‡, Koji Kitayama††, and Tetsunori Kobayashi††

† National Institute of Advanced Industrial Science and Technology (AIST) Ibaraki 305-8568, Japan
‡ Nagoya University. Aichi 464-8603, Japan
†† Waseda University. Tokyo 169-8555, Japan

Paper abstract

This paper describes music information retrieval (MIR) systems featuring automatic speech recognition. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. We propose two different speech-recognition interfaces for MIR, speech completion and speech spotter, and describe two MIR-based hands-free jukebox systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. The first is a music-retrieval system with the speech-completion interface that is suitable for music stores and car driving situations. When a user can remember only part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface which can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music-playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Our experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces.

Video Clips

Demonstration of Music-Retrieval System with the Speech-Completion Interface
In this video, a user can retrieve a musical piece or a list of musical pieces by an artist even if the user can remember only part of the name of the piece or artist.
Demonstration of Music-Retrieval System with the Speech-Completion Interface
(11,506,124 bytes, 1 min 3 sec, MPEG-1 file)

(Short excerpt version: 3,058,384 bytes, 16 sec, MPEG-1 file)

[Video caption]
Forward Speech Completion: Music retrieval by uttering part of artist's name

Michael- (Michael, uh...)

(*) A pop-up window containing completion candidates appears.

Jackson

(*) A pop-up window containing a list of musical pieces appears.

No. 1

(*) The first song is highlighted and played back.

Forward Speech Completion: Music retrieval by uttering part of musical-piece title

The Way- (The Way, er...)

(*) A pop-up window containing completion candidates appears.

No. 1

(*) The song of the selected title is played back.

Backward Speech Completion: Music retrieval by uttering part of artist's name

Something- (wildcard keyword)

(*) A pop-up window with colorful flying decorations appears.

Jackson

(*) A pop-up window containing completion candidates appears.

No. 1

(*) A pop-up window containing a list of musical pieces appears.

No. 3

(*) The third song is highlighted and played back.

This demonstration featured RWC-MDB-G-2001 No.10, 24, 26 from the RWC Music Database (Music Genre).
Demonstration of Music Playback System with the Speech-Spotter Interface
In this video, a user can listen to background music by uttering the name of a musical piece or artist while talking to another person. The video shows that users can share music playback on the telephone as if they were talking in the same room with background music.
Demonstration of Music Playback System with the Speech-Spotter Interface
(12,793,620 bytes, 1 min 10 sec, MPEG-1 file)

(Short excerpt version: 3,760,232 bytes, 20 sec, MPEG-1 file)

[Video caption]
B calls A on the telephone.

A:
Yes...

B:
Hello?

A:
Uh..., what's up?

B:
Thanks for all your help last time.

A:
No problem. How have you been since?

B:
Whew! I've been super busy writing that paper... I'm beat.

(Several minutes later)

A:
Uh..., that reminds me, the song called ``Fly Away'' that we heard at that place, wasn't that good?

B:
Oh, what song was that?

A:
Shall we try listening to it?

B:
What? We can hear it now?

A:
Sure. This is a phone with a music-playback system. We can listen to that song like this... Er..., ``Fly Away''!

(*) The system plays the song of that name on both of their handsets.

B:
Wow, amazing! You can listen to a song by just saying its name! Um..., this is a good song.

A:
That's right!

(**) In this caption, underlining indicates that the pitch of the underlined words is intentionally raised.
This demonstration featured RWC-MDB-P-2001 No.28 from the RWC Music Database (Popular Music).

Screen Snapshots of Music-Retrieval System with the Speech-Completion Interface

Forward Speech Completion

A user who does not remember the last part of a name can invoke this completion by uttering the first part while intentionally lengthening its last syllable (making a filled pause).

[Entering the phrase ``maikeru jakuson'' (``Michael Jackson'') when its last part (``jakuson'') is uncertain.]

Uttering ``maikeru--.''
A pop-up window containing completion candidates appears.
Uttering ``No. 2.''
The second candidate is highlighted and bounces.
The selected candidate ``maikeru jakuson'' is determined as the recognition result.

Backward Speech Completion

A user who does not remember the first part of a name can invoke this completion by uttering the last part after intentionally lengthening the last syllable of a predefined special keyword --- called the wildcard keyword.

[Entering the phrase ``maikeru jakuson'' (``Michael Jackson'') when its first part (``maikeru'') is uncertain.]

Uttering ``nantoka--.'' (wildcard keyword)
A pop-up window with colorful flying decorations appears.
Uttering ``jakuson.''
A window containing completion candidates appears.
Uttering ``No. 1.''
The first candidate ``maikeru jakuson'' is determined as the recognition result.

Music Playback

After the artist's name is identified by either the forward or backward speech completion, the system shows a numbered list of titles for the specified artist in a music database, and a user can select an appropriate title by uttering either the title or its number. When the musical piece is identified, the system plays back its sound file.

[Playing back a musical piece of the artist ``maikeru jakuson'' (``Michael Jackson'') whose name is determined by the speech-completion interface.]

Continued from the above figures.
A pop-up window containing a list of musical pieces appears.
Uttering ``No. 1.''
The first musical piece is highlighted and played back.

Acknowledgments:

This research utilized the RWC Music Database "RWC-MDB-P-2001" (Popular Music) and "RWC-MDB-G-2001" (Music Genre).

Back to:

Masataka Goto's Home Page

Masataka GOTO <m.goto [at] aist.go.jp>

Research Demonstration for ISMIR 2004

Speech-Recognition Interfaces for Music Information Retrieval: ``Speech Completion'' and ``Speech Spotter''

Masataka Goto†, Katunobu Itou‡, Koji Kitayama††, and Tetsunori Kobayashi††

† National Institute of Advanced Industrial Science and Technology (AIST) Ibaraki 305-8568, Japan
‡ Nagoya University. Aichi 464-8603, Japan
†† Waseda University. Tokyo 169-8555, Japan

Menu

Paper abstract

Video Clips

Screen Snapshots of Music-Retrieval System with the Speech-Completion Interface

Acknowledgments:

Back to:

All pages are copyrighted by the author. Unauthorized reproduction is strictly prohibited.

Research Demonstration for ISMIR 2004

Speech-Recognition Interfaces for Music Information Retrieval: ``Speech Completion'' and ``Speech Spotter''

Masataka Goto†, Katunobu Itou‡, Koji Kitayama††, and Tetsunori Kobayashi††

† National Institute of Advanced Industrial Science and Technology (AIST) Ibaraki 305-8568, Japan ‡ Nagoya University. Aichi 464-8603, Japan †† Waseda University. Tokyo 169-8555, Japan

Menu

Paper abstract

Video Clips

Screen Snapshots of Music-Retrieval System with the Speech-Completion Interface

Acknowledgments:

Back to:

All pages are copyrighted by the author. Unauthorized reproduction is strictly prohibited.

† National Institute of Advanced Industrial Science and Technology (AIST) Ibaraki 305-8568, Japan
‡ Nagoya University. Aichi 464-8603, Japan
†† Waseda University. Tokyo 169-8555, Japan