Shoulder tapping illusion by vibrator in HyperMirror


Osamu MORIKAWA
National Institute of Bioscience and Human-Technology
1-1,Higashi, tsukuba, ibaraki 305-8566 Japan
e-mail: morikawa@nibh.go.jp


1. Introduction

In an existing video-mediated communication environment, local and remote users communicate via a CRT or a screen. In addition to a visual image of the remote users, the screen also displays the surroundings of the remote users. Likewise, the remote users' screen displays the visual image of the local users and their surroundings. However, both users are in different spaces, and the display is the boundary that divides their worlds.

Face-to-face communication, on the other hand, is not merely the words that are spoken, but also involves nonverbal information like gazing and gestures (according to one study, such information makes up 60% to 90% of a message [1]). The meaning of nonverbal information is understood either independent of other factors or determined by the relative positional relationship of peripheral objects. Take the example of a gesture in which someone looks at a clock hanging on a wall. That gesture is a typical example of nonverbal information that has meaning only if one first knows where the clock is positioned relative to the speaker (Fig.1). Since the speaker's space and the listener's space are not shared in a existing video communication environment, the nonverbal information of the speaker is not properly communicated.

However, in a existing video communication system, since voice and visual images are presented in a form similar to that in face-to-face communication, users don't notice how that information may be transformed, nor do they participate in video communication with the same expectations as face-to-face communication. As a result, users find video communication difficult to use, and often tire of it. It also makes users feel as though the other users are far away.

Much research and improvements in video systems have been directed toward addressing the differences between video and face-to-face communication, striving to enable video communication that is equivalent to face-to-face communication. The present research aims to create a suitable video communication system wherein users permit such differences. The authors accordingly propose a new video communication system, called HyperMirror, wherein one's own image is displayed on the screen (Fig.2 and Fig.3). [2], [3], [4], [5].

Fig.1 Some of nonverbal information is not properly communicated at videophones

Fig.2 An Example of HyperMirror conversation.

2. HyperMirror Creates a Feeling of Being in the Same Room

HyperMirror enables participants to communicate using the same "video wall"--making it "What I See Is What You See" (WISIWYS). Multiple communications sites are set up with users in front of the system. Images are mirror reflections projecting users on a single screen apparently in the same room regardless of physical location. HyperMirror puts communicators and viewers all in the same space. Users can tap on the partner's shoulder or can pat his/her head.

Furthermore, objects on the screen are displayed at a position relative to one's own image, regardless of where the object is physically located, which means that objects can be used in the process of communication (Fig.4 and Fig.5). For example, if one user points to an object on the screen, all users can identify the object pointed to, no matter where that object physically exists. This enables a speaker's gestures to be understood by all participants.

Misunderstanding caused, for example, by inappropriate pointing, can be understood by the speaker from the listener's viewpoint because of HyperMirror's WISIWYS feature (Fig.4-left). Speakers soon learn how to point using HyperMirror (Fig.4-center), similarly to accepting the telephone as a medium different from a face-to-face conversation. That is, it is clear not to transmit reflection in the telephone. When a speaker wants to talk about a phenomenon witch happened in a talker's presence, s/he will try to make the situation a word and tell it to the partner. No one can talk on the phone assuming that partner can see what they are seeing.

Fig.3 An example of HyperMirror system.

Fig.4 An advantage of WISIWYS. He knows that she has not properly understood his ordinary pointing (left).

He finds HyperMirror pointing (center). Then he can point properly (right).

3. Attracting Attention

In daily life, people present in the same room are not always carrying on a conversation. Indeed, they may each be concentrating on their own work. When a conversation is started among people in the same room, the parties first mutually verify their state of activity, and then find some opportunity to initiate the conversation. There may also be cases in which one party wants to interrupt another's work to resume a conversation. HyperMirror communication provides an environment in which all participants feel as if they are in the same room, thus making them want to act as if they are in the same room (Fig.5).

Ordinarily, if two people are seated separately, one may glance at the other to attract attention. If this is not enough, the person may wave. If this fails, the person may call out. If seated at arm's-length from each other or stood to move freely, the person may approach and talk to and tap the other's shoulder. HyperMirror users can act similarly but these action do not work as face-to-face.

People are usually quite sensitive to a glance from others, even in peripheral vision. They easily notice a wave or someone approaching by the ancillary information such as the movement of air or shadow even when concentrating on something else. In HyperMirror, glances are not noticed the same as face-to-face, and ancillary information is not transmitted (Fig.6).

When no longer looking at the screen, the user leaves the HyperMirror environment. A relative position on the screen effectively used to talk disappears. The function of attracting attention decreases so that the physical spaces outside of HyperMirror may enter the peripheral vision. Even if the partner talks, the voice from the laud speaker does not include the information of direction nor distance. Listening to the voice, the user must identify the speaker by the tone and reconfirm on the screen. Moreover, when there are listeners, sound is needed for additional information such as a name. Even rousing warning of the stranger, the HyperMirror user should distinguish seeing the screen every time if it is not so.

The authors attempted to give HyperMirror a means of attracting attention corresponded to a shoulder tap.

Fig.5 An example of shoulder tapping. She will avoid from a tapping man soon after understanding what happens on the screen.

4. Tapping Someone on the Shoulder

Let us consider a case in which one person approaches another and taps her/him on the shoulder. In other words, the person who taps is extremely close to the part of the person being tapped. The person tapped receives the tactile sensation of being tapped, and receives information like the sense that someone is approaching, the flow of air that accompanies the action of tapping, and the sound of the tapping itself. This all comprises ancillary information necessarily generated by the action of tapping. We are able to understand the nature of tapping through the effective use of this ancillary information (Fig.7). Furthermore, the interpretation of the meaning of shoulder tapping in communication is determined by the context of the communication at that time.

Inasmuch as physical limitations do not exist in the world of artifacts in remote communication systems, ancillary information in remote communication is not generated unless intentionally added. Since humans employ a process of understanding that uses ancillary information, the perception of the world of artifacts in remote communication is difficult and unpleasant unless such ancillary information is added.

Fig.6 A person tapped doesn't notice on the screen when s/he doesn't watch it. S/he needs tactile or auditory simultanious information for atracting attention.

In a situation where there may be more than one person attracting attention, as in a cooperative work situation, it is necessary that each person attracting attention identify themselves. Because of physical limitations, in real space the person tapping is obviously next to the person being tapped. Consequently, it is easy to identify the person tapping, and therefore no problems arise. However, just because a remote communication system with a shoulder tapping haptic device added poses no physical limitations doesn't mean that it's easy to identify the person tapping. Even if the person tapped can sense the tapping, that information cannot be effectively used in the communication unless the person who is doing the tapping can be identified.

Accordingly, to effectively utilize shoulder tapping in a remote communication system, it is not sufficient just to present haptic information. Namely, it is essential to simultaneously present information equivalent to the ancillary information generated in accordance with the physical limitations of the real world in order to identify the person tapping.

Fig.7 People easily notice someone approaching using ancillary information caused by moving, the moment of the are, sound of steps.

5. HyperMirror With Shoulder Vibrators

Participants install a vibrator on each of their shoulders (Fig.8). The HyperMirror system recognizes the shoulder tapping action on the screen, operates the vibrator installed on each shoulder tapped, thereby providing stimuli to the persons tapped. This mechanism has constraints that are easy for people to understand the same as the ancillary information generated in accordance with the physical limitations in the real world.

Fig.8 The tactile displays on the shoulders

6. Experiment

The authors conducted an experiment under the conditions below to verify the effectiveness of the attention-attracting function of a shoulder-tapping stimulus in video communication, and to verify the acceptance of the constrained shoulder-tapping stimulus.

Two experimenters randomly called out to two subjects (Fig.9). The called subject identified the calling experimenter and said the experimenter's name. The experimenter then instructed the subject to point to an object.

The subjects were given a main task of proofreading a document. To eliminate the influence of the contents of the work, each subject was assigned a document in a different subject area. Visual information (V), auditory information (A) and tactile information (T) were prepared as attention-attracting actions, and the experiment was carried out under seven nominal conditions: V, A, T, V+A, V+T, A+T, and V+A+T. After the experiment, each subject was subjectively evaluated for the following under each of the conditions: degree of concentration on main task, ease of attracting attention, degree of interference, and preferences. We also asked the subjects to comment freely on their observations.

The subjects were from 30 to 59 years of age, and included two men and eight women. The experiment was performed with groups of two people who were mutually acquainted in the group.

The shoulder tapping action was carried out via HyperMirror. For one attention-attracting action, the HyperMirror tapped the shoulder twice. After the subject identified the experimenter, an instruction was given.

Conventionally, the presentation of information is timed so that the haptic display device is operated based on image recognition of the HyperMirror system. In the present experiment however, an experiment assistant provided auditory and haptic stimuli synchronized to the shoulder-tapping image of the experimenter.

Fig.9 Experiment. Two experimenters randomly attract attention of two subjects.

7. Results of Experiment

7.1 Work Efficiency of Main Task

To eliminate the influence of the difference among individuals, the score was standardized. And to eliminate the influence of the difficulty of each work, the score was standerdized again.

Compared with the six other conditions, the V condition (visual only) showed a significant decline in the efficiency of the main task. There were no significant differences under the other conditions (Fig.10).

Fig.10 Work efficiency. 50 score means total average.

Only visual condition (V) shows a significant difference from other six conditions (p < 5%)

7.2 Degree of Concentration on Main Task, and Related Impressions

Under the V condition, the subjects had to keep their attention constantly focused on the screen, and gave the impression that they could not concentrate on the main task. Under the other conditions, the subjects were interrupted by sound or vibration, and were able to concentrate on the main task by passively dividing their attention. Interestingly, a subject who pointed out numerous corrections to the document reported that they could not concentrate on the main task under the A condition (audio only) as well. The subjects knew they were being called just by the vibration, but identifying the calling person was troublesome.

7.3 Ease of Attracting Attention

Under the T condition (tactile only) of a short duration, the subjects lacked confidence in their determination. Under the A condition, there were cases in which the voice was too low and the subjects could not hear. There were also cases of uncertainty with respect to who was called with T+V and A+V conditions. However, it was favorable that subjects gained confidence by taking a quick look at the screen. When a subject thought they were called, the visual image was extremely useful to determine that they were in fact not called.

7.4 Degree of Interference With Main Task

Under the V condition, attracting the attention of others did not directly interfere with the main task; however, there was interference because the subjects had to constantly pay attention to the screen. Under the A condition, subjects expressed concern that the volume was set at a level at which they could not discriminate the name. However, the volume in the present experiment did not interfere with the main task. Under the A, T and A+T conditions (no visual image), subjects wasted time from the point when they mistakenly looked at the screen even when they were not called, until the time when they realized that in fact they were not called.

7.5 Ease of Familiarity

Although the A and V conditions were tolerated, the T condition felt unnatural, since the subjects did not know whom they were being tapped by. Under the T and A+T conditions, subjects reported that it felt unnatural when the calling person was at a distance or on the side opposite the shoulder that was vibrated. In such cases, it was felt that no vibration would be better.

Insofar as this was an experiment, subjects immediately knew they were being called, even when the duration of vibration was short. In actual use, however, a vibration of short duration would be somehow felt by the user, but it would be difficult for the user to immediately know that they were being called. We thought it better to use a vibration duration long enough so that the user can make a clear determination.

Subjects also reported that it felt unpleasant when an unfamiliar person tapped them on the shoulder. In such cases, voice only is preferable.

7.6 Sense of Reality of Vibration

When the duration of vibration was increased, the feeling of vibration strengthened, but the sense of shoulder tapping decreased. It also felt unnatural when the timing of the vibration and the image shifted even slightly. Vibration was also effective in attracting attention, but it felt unnatural if the position where the stimulus was received and the position of the person giving the stimulus did not coincide on the screen. The timing between the voice and the vibration does not present much of a problem if the participants are in the same time zone.

7.7 Sense of Security

In addition to the attention-attracting function of shoulder tapping, another effect can be expected with this system, although it could not be verified in the experiment. Even in face-to-face communication, shoulder tapping is not frequently observed and, depending on the case, may not be observed at all. However, a substantial difference between existing video communication and face-to-face communication has been that video communication did not allow attracting attention by shoulder tapping. The difference lies in whether communication can be conducted with the sense of security that comes from the ability to attract attention by shoulder tapping when needed. During existing video communication in which there is no attention-attracting function, there in an unspoken rule that one must always give one's attention to the screen, or there is a sense that some self-regulating mechanism is at work. Further, when we want to make the remote user speak up with a louder voice, we tend ourselves to select a mode of communication that evokes that behavior as much as possible.

It is still unclear whether it is more comfortable to attract someone's attention by tapping them on the shoulder or by calling out their name in a loud voice. However, we believe the sense of security engendered by the continuity of communication with another person, even if they are concentrating on their work at hand and not paying attention to the screen has a significant impact on cooperative work.

8. Discussion

8.1 Utility of Visual Images in Shoulder Tapping

When we receive some kind of stimulus, we search for its cause. If our shoulder is tapped accompanied by a visual image in HyperMirror communication, we search for a causal relationship whereby we realize that we feel the vibration on our shoulder because another user on the screen tapped the shoulder of our own image on the screen. Although this causal relationship is of course not in accord with the physical world, it is easy for us to understand it because it does not deviate much from the process of understanding we use in daily life.

Vibration is also used as a signal in portable telephones, however, the meaning of the vibration signal in a portable telephone is different from the T condition (tactile only) in HyperMirror. Since it is obvious that the caller sending the signal is not visible in the case of a portable telephone, we do not look for them. There is sufficient functionality with vibration alone.

In HyperMirror communication however, the caller that sends the signal is obviously present on the screen, and therefore we look for them on the screen. If the person sending the vibration on the screen is not acting appropriately, the causal relationship is not established, which creates an extremely unnatural feeling. In contrast, if a causal relationship between the behavior of the caller and the vibration is found, that causal relationship does not necessarily have to be in strict accordance with the framework of the physical world. Indeed, in a V condition experiment, even when the position of the person who initiated the shoulder tapping on the screen was quite separated from the position of the vibrator in the present experiment, the subjects did not much feel that it was unnatural, and perceived them to be causally connected.

8.2 Differences in the Meaning of Shoulder Tapping

Shoulder tapping in face-to-face communication is not much observed among Japanese, at least in the area where the authors live. Further, even among those who tap the shoulders of others, only a few do so to attract attention. In other words, we easily notice a speaker in face-to-face communication just by their sending a glance, waiving a hand or their approach, based on our peripheral vision and the movement of air and shadows. Even in the case where those actions are not sufficient, we can nearly always succeed in attracting someone's attention by calling out to them.

Fig.11 A user who distinguishes pass and approach on HyperMirror screen

On the other hand, a glance in HyperMirror communication is not transmitted in the same manner as in face-to-face communication. Even when a person moves, the attendant movement of air and shadows is not transmitted. Further, the approach of another person on the screen in HyperMirror communication does not function as an attention attractor in the same manner as in face-to-face communication. This is because, when another person approaches on the screen, the user must discriminate whether the other person is approaching or just passing by. We observed that some users attempted to solve this confusion by saying something like "excuse me" and walking with a quicker step while bending slightly at the waist when passing by another user on the screen (Fig.11).

In face-to-face communication, we can perceive even the slightest movement of another person via our peripheral vision, since they are physically close, even if we are not looking directly at them. Furthermore, the movement of air and shadows are also helpful to detect the action of the other person. The movement of another person cannot be perceived in HyperMirror communication unless continuous attention is given to the screen.

Tapping the shoulder of another person is only one of many ways to attract attention in face-to-face communication, but is one of only a few ways to attract the attention of another user not looking at the screen in HyperMirror communication (Fig.12). Accordingly, the meaning of shoulder tapping differs between the two modes of communication wherein physical actions are performed. In HyperMirror communication, shoulder tapping functions as a useful signal to attract attention.

Fig.12 The meaning of shoulder tapping in HyperMirror differs from in face-to-face communication.

8.3 Tactile Illusion

Another object of the present experiment was to verify the hypothesis that, although people appear to make judgments and take action based on information from a single information source, they actually make appropriate judgments and take action by combining information from various sources. As a result, even a stimulus at a level in which meaning could not be clearly discerned from a single information source can be used effectively in communication. When confronted with this type of ambiguous information, we reach a consistent interpretation by combining the ambiguous information with other information presented simultaneously. In the present experiment, we predicted that, if a stimulus were given for a duration too short to clearly discriminate the vibration, it would create the illusion of a shoulder-tapping stimulus. Unfortunately, subjects did not report experiencing the illusion of real shoulder tapping. However, the high affinity for stimuli of short duration supported the hypothesis.

9. Conclusion

This report examined the use of the tactile sense as an attention-attracting signal in HyperMirror communication. The use of the tactile sense alone was sufficient in a framework where single signals are sent. When using the tactile sense in HyperMirror communication, however, we considered that an organic relationship between the tactile sense and visual images was necessary. Accordingly, we hypothesized a scene in which shoulder tapping is one method of attracting attention that is accompanied by the sense of touch in conversation. We attached a vibrator to each shoulder of each subject and measured the effect of visual images and voice when sending a shoulder-tapping signal.

There is a remarkable difference in the ease of understanding and acceptance between a completely new rule and a rule that is the extension of an existing rule in our daily life. Surely, the latter type of rule is readily accepted by most people without consideration of its logic. Likewise, there is a risk that the addition of rules that are not readily accepted will cause deterioration of the social order with respect to communication, instead of supporting the dissemination of HyperMirror communication.

The present experiment demonstrated the possibility that shoulder tapping by the use of a vibrator in HyperMirror communication can be understood as an extension of rules in daily life by the addition of visual images. Furthermore, the experiment also showed that, if a causal relationship is found between the vibration and the action of another person, that causal relationship does not have to be strictly in accord with the framework of the physical world. Indeed, a feeling of unnaturalness was not experienced very much even when the position of the person who initiated the shoulder tapping on the screen was quite separated from the position of the vibrator in the present experiment, and they were perceived to be causally connected. The presence of sound was not observed to affect the sense of unnaturalness. This also supports the fact that the user understands the situation based on an extension of rules in daily life.

Likewise, the meaning of shoulder tapping in HyperMirror communication differs from that in daily life, where there are many ways to attract attention. Consequently, shoulder tapping in HyperMirror communication is an important function as one means for attracting attention that can be used comfortably.

REFERENCES

1. Birdwhistell R.L.;Kinesics and Context: Essays on Body Motion Communication, Univ. of Pennsylvania Press(1970)

2. Morikawa, O. and Maesako, T., HyperMirror: a Video-Mediated communication system, CHI'97 extended abstracts, 317-318(1997)

3. Morikawa, O., Effects of displaying the reflected image of users (in Japanese), Proc. of the 14th Annual Meeting of the Japanese Cognitive Science Society, 240-245(1997)

4. Morikawa, O. and Maesako, T., "HyperMirror : Toward Pleasant-to-use Video Mediated Communication System", CSCW'98, pp.149-15(1998)

5. Morikawa. O., Effect of adding self image reflection on reality of video partner (in Japanese), Human Interface Society, Vol.1,1, pp.61-68(1999)