Japanese version is
here.
Speech completion
is a novel speech interface function
that helps a user enter a word or phrase
by completing (filling in the rest of)
a phrase fragment uttered by the user.
Although the concept of completion is widely used
in text-based interfaces,
there have been no reports of completion being effectively applied to speech.
By using a filled pause,
we enable a user to effortlessly invoke the speech-completion function
which helps the user recall uncertain phrases
and saves labor when the input phrase is long.
When a user hesitates by lengthening a vowel
(a filled pause is uttered)
during a phrase,
our system immediately displays completion candidates
whose beginnings acoustically resemble the uttered fragment
so that the user can select the correct one.
In our experiments with a system that included a filled-pause detector and
a speech recognizer capable of listing candidates,
the effectiveness of speech completion was confirmed.
Video Clips
Demonstration of Speech Completion
This is a graphics output of the implemented speech-completion system.
In this video,
a user enters the names of musicians and songs in the Japanese style.
When a foreign name like ``Michael Jackson'' is
written or pronounced in Japanese,
the Japanese style is used: ``maikeru jakuson.''
The speech completion in the Japanese style can be invoked by saying
``maikeru--.''
Video Clip of Speech Completion (MPEG-1 file)
"ICSLP2002demo.mpg"
(17,578,736 bytes)
Video Clip of Speech Completion (MPEG-1 file)
"ICSLP2002demo_short.mpg"
(6,760,516 bytes)
(Short Version)
Video Clip of Speech Completion (MPEG-1 file)
"ICSLP2002demo_jp.mpg"
(6,583,892 bytes)
(Japanese Version)
Demonstration of Filled-Pause Detector
This is a graphically represented output of our filled-pause detector.
In this video,
the red box indicates the detected filled-pause period.
The rotating object also becomes big during filled pauses.
Application of Speech Completion [Music Jukebox System]
We have also developed a speech-capable music jukebox system,
which is an application of our speech-completion system. This jukebox
system can play back a song whose title is determined through speech
recognition with the speech-completion function.
The system has been demonstrated at Japanese exhibitions and
received much publicity from the press.
Snapshots of Japanese exhibitions
Screen Snapshots
Forward Speech Completion
A user who does not remember the last
part of a word or phrase can invoke this completion by uttering
the first part while intentionally lengthening its last syllable
(making a filled pause).
[Entering the phrase ``maikeru jakuson'' (``Michael
Jackson'') when its last part (``jakuson'') is uncertain.]
Uttering ``maikeru--.''
A pop-up window containing completion candidates appears.
Uttering ``No. 2.''
The second candidate is highlighted and bounces.
The selected candidate ``maikeru jakuson'' is determined as the recognition result.
Backward Speech Completion
A user who does not remember the first
part of a word or phrase can invoke this completion
by uttering the last part
after intentionally lengthening the last syllable of
a predefined special keyword --- called wildcard keyword.
[Entering the phrase ``maikeru jakuson'' (``Michael Jackson'')
when its first part (``maikeru'') is uncertain.]
Uttering ``nantoka--.''
(wildcard keyword)
A pop-up window with colorful flying decorations appears.
Uttering ``jakuson.''
A window containing completion candidates appears.
Uttering ``No. 1.''
The first candidate ``maikeru jakuson'' is determined as the recognition result.
References:
Masataka Goto, Katunobu Itou, and Satoru Hayamizu:
Speech Completion: On-demand Completion Assistance
Using Filled Pauses for Speech Input Interfaces,
Proceedings of
the 7th International Conference on Spoken Language Processing
(ICSLP-2002),
pp.1489-1492, September 2002.
Masataka Goto, Katunobu Itou, Tomoyosi Akiba, and Satoru Hayamizu:
Speech Completion:
New Speech Interface with On-demand Completion Assistance,
Proceedings of HCI International 2001,
Vol.1, pp.198-202, August 2001.
Implementation
The speech-completion system can be executed on a workstation
or a personal computer
and has been ported on the following operating systems:
SGI Irix 6.5 (Octane, Octane 2)
Linux 2.2/2.4 (Pentium 3/4, Alpha)
Microsoft Windows 95/98/2000/XP (Pentium 3/4)
Reports on Newspaper, Television, and Magazine
Our speech-completion system was reported
in the morning edition of the Japanese daily newspaper
"Nihon Keizai Shimbun"
on September 21, 2001.
Our speech-completion system was reported
in the Japanese daily newspaper
"Nikkan Kogyo Shimbun"
on September 24, 2001.
Our speech-completion system was reported
on the Japanese television program "WORLD BUSINESS SATELLITE"
of Television TOKYO.
The program was broadcast during 23:00-23:45 on September 25, 2001.