Speech Completion: New Speech Interface with On-demand Completion Assistance Using Filled Pauses



This project is proposed and researched by Masataka Goto and Katunobu Itou.

Japanese version is here.
Japanese version
Speech completion is a novel speech interface function that helps a user enter a word or phrase by completing (filling in the rest of) a phrase fragment uttered by the user. Although the concept of completion is widely used in text-based interfaces, there have been no reports of completion being effectively applied to speech. By using a filled pause, we enable a user to effortlessly invoke the speech-completion function which helps the user recall uncertain phrases and saves labor when the input phrase is long. When a user hesitates by lengthening a vowel (a filled pause is uttered) during a phrase, our system immediately displays completion candidates whose beginnings acoustically resemble the uttered fragment so that the user can select the correct one. In our experiments with a system that included a filled-pause detector and a speech recognizer capable of listing candidates, the effectiveness of speech completion was confirmed.


Video Clips

exhibition 1 exhibition 2 exhibition 3 exhibition 4
Snapshots of Japanese exhibitions


Screen Snapshots

Forward Speech Completion
A user who does not remember the last part of a word or phrase can invoke this completion by uttering the first part while intentionally lengthening its last syllable (making a filled pause).

[Entering the phrase ``maikeru jakuson'' (``Michael Jackson'') when its last part (``jakuson'') is uncertain.]
  1. Uttering ``maikeru--.''
  2. A pop-up window containing completion candidates appears.
    Speech Completion Snapshot
  3. Uttering ``No. 2.''
  4. The second candidate is highlighted and bounces.
    Speech Completion Snapshot
  5. The selected candidate ``maikeru jakuson'' is determined as the recognition result.
    Speech Completion Snapshot

Backward Speech Completion
A user who does not remember the first part of a word or phrase can invoke this completion by uttering the last part after intentionally lengthening the last syllable of a predefined special keyword --- called wildcard keyword.

[Entering the phrase ``maikeru jakuson'' (``Michael Jackson'') when its first part (``maikeru'') is uncertain.]
  1. Uttering ``nantoka--.'' (wildcard keyword)
  2. A pop-up window with colorful flying decorations appears.
    Speech Completion Snapshot
  3. Uttering ``jakuson.''
  4. A window containing completion candidates appears.
    Speech Completion Snapshot
  5. Uttering ``No. 1.''
  6. The first candidate ``maikeru jakuson'' is determined as the recognition result.
    Speech Completion Snapshot

References:

  1. Masataka Goto, Katunobu Itou, and Satoru Hayamizu: Speech Completion: On-demand Completion Assistance Using Filled Pauses for Speech Input Interfaces, Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP-2002), pp.1489-1492, September 2002.
    PDF Poster PDF
  2. Masataka Goto, Katunobu Itou, Tomoyosi Akiba, and Satoru Hayamizu: Speech Completion: New Speech Interface with On-demand Completion Assistance, Proceedings of HCI International 2001, Vol.1, pp.198-202, August 2001.

Implementation

The speech-completion system can be executed on a workstation or a personal computer and has been ported on the following operating systems:

Reports on Newspaper, Television, and Magazine


Back to:


Masataka GOTO <m.goto [at] aist.go.jp>

All pages are copyrighted by the author. Unauthorized reproduction is strictly prohibited.
last update: September 24, 2002