Skip Navigation


Journal of Deaf Studies and Deaf Education Advance Access originally published online on July 6, 2005
The Journal of Deaf Studies and Deaf Education 2005 10(4):390-401; doi:10.1093/deafed/eni037
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
10/4/390    most recent
eni037v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Muir, L. J.
Right arrow Articles by Richardson, I. E. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Muir, L. J.
Right arrow Articles by Richardson, I. E. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org

Empirical Articles

Perception of Sign Language and Its Application to Visual Communications for Deaf People

Laura J. Muir and Iain E. G. Richardson

Image Communication Technology Group, The Robert Gordon University, Aberdeen, United Kingdom

Correspondence should be sent to Laura J. Muir or Iain E. G. Richardson, Image Communication Technology Group, The Robert Gordon University, Schoolhill, Aberdeen, United Kingdom (e-mail: l.muir{at}rgu.ac.uk or i.g.richardson{at}rgu.ac.uk).

Received May 18, 2004; revised November 1, 2004; accepted March 14, 2005

Video communication systems for deaf people are limited in terms of quality and performance. Analysis of visual attention mechanisms for sign language may enable optimization of video coding systems for deaf users. Eye-movement tracking experiments were conducted with profoundly deaf volunteers while watching sign language video clips. Deaf people are found to fixate mostly on the facial region of the signer to pick up small detailed movements associated with facial expression and mouth shapes. Lower resolution, peripheral vision is used to process information from larger, rapid movements of the signer in the video clips. A coding scheme that gives priority to the face of the signer may be applied to improve perception of video quality for sign language communication.


    Introduction
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 
Visual perception is the process of acquiring knowledge about environmental objects and events by extracting information from the light they emit or reflect (Palmer, 2002Go). How we "see" remains an active research challenge for vision scientists and specialists. Understanding the detection, recognition, and interpretation of visual information could have a tremendous impact on how we present and use visual information and on the design of information systems. The challenge is to understand how visual information can be presented so that its use can be optimized for the observer.

Of all the senses, vision is relied on most heavily for sensory input about the environment (Hendee & Wells, 1997Go). This is particularly true for deaf people who rely on visual communication of information using sign language and/or lip reading. The aim of this research is to investigate how deaf people see sign language. The rationale for this is that an understanding of how deaf people observe sign language could enable video communication systems, for example, video conferencing, to be optimized.

In this study, we examine the influence of sign language video content on the attention mechanisms of deaf viewers and the implications for design of video communications systems for deaf people. We review the quality requirements for sign language video communication and what is known about scene perception and the eye gaze of deaf people observing sign language. An experiment is presented, using eye tracking, to investigate how deaf people perceive sign language video and this is discussed in the context of improving sign language video communication quality.


    British Sign Language and Video Quality Requirements
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 
Sign language is a complex combination of facial expressions, mouth/lip shapes, hand and body movements, and finger spelling. Visual communication of information between deaf people during freely expressed sign language conversation is detailed and rapid. Movements of the hands during a period of finger spelling can be observed to be blurred even when captured at 25 frames per second. The International Telecommunication Union—Telecommunication Standardisation Sector (ITU-T) draft profile (ITU-T SG16, 1998Go) details the quality requirements for sign language video communication including a minimum of common intermediate format (CIF) resolution (i.e., 352 x 288 displayed pixels) and frame rate of at least 25 frames per second. Visual perception of sign language video requires sufficient spatial and temporal resolution to capture the detailed movements of the signer. Reasonable visual quality and frame rates can be obtained using current video compression coding standards such as H.263 (ITU-T H.263, 1998Go) at high bit rates. At bit rates below 200 kilobits per second (kb/s), real-time video communication is characterized by low frame rates, small picture sizes, and/or poor picture quality (Richardson, 2003Go). Deaf people using videophones have to make modifications to try to overcome these problems, for example, using slow exaggerated movements. This can prove to be tiring and frustrating to the user and limits the usefulness of video technology to the Deaf Community. Even the improved video compression efficiency of the new H.264 coding standard (ITU-T H.264, 2003) may not be acceptable for accurate sign language communication at low bit rates.

Deaf people are enthusiastic about the use of technology for personal communication at a distance but frustrated by the current poor performance at low bit rates characterized by poor picture quality and jerky movements (McCaul, 1997Go). There is therefore a requirement to optimize video communication systems for deaf users and this motivates the study of perceptual behavior of deaf people described in this paper.


    Visual Perception of Sign Language
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 
This section reviews what is known about human visual processing and the perception of sign language by deaf people.

It is generally accepted that the examination of a visual stimulus involves parallel preattentive processing (first glimpse to give a global impression of the stimulus) and focal attention (Palmer, 2002Go). Focal attention involves serial scanning of an image using eye movements. Information is processed in detail from the foveal area of the eye (which is ~2.5° of visual angle around the center of the visual field) and in reduced detail from the larger peripheral area around the fovea. Movements of the head and eye direct the foveal region of high visual acuity to visually sample selected areas of the stimulus.

A saccade is a rapid eye movement, which is used to visually scan the scene and bring different areas to fall on the fovea (Cumming, 1978Go). A saccade requires approximately 150–200 ms for planning and execution and reaches an angular velocity of up to 900°/s. Fixations occur between saccades, during which the eye dwells on an object for a variable period of time. The average duration of a fixation is 300 ms (Palmer, 2002Go).

Perceiving a realistic visual scene generally requires a sequence of many different fixations (Findlay & Gilchrist, 2003Go). Foveal information is clear and fully chromatic whereas peripheral information is blurry and weak in color to a degree depending on the distance from the fovea. In order to obtain high-resolution information about the spatial and or chromatic attributes, the visual scene must be explored using eye movements to place different information in the fovea at different times.

The process of saccadic exploration of complex images was investigated using crude equipment by Yarbus (1967)Go. He recorded fixations and saccades observed while viewing objects and scenes. By superimposing eye movements on the stimulus picture, he was able to determine which parts of the image observers found most informative. He observed that the way in which the eyes explored a complex image depended on the task. More sophisticated eye-movement tracking equipment has allowed researchers to determine the specific sequence of fixations that observers execute when exploring a visual stimulus/scene.

Voluntary eye movements are the main instruments of selective attention. Attention is global (to the whole scene) to a selected object or set of objects, to a specific part of an object, or to the property of an object (e.g., color). Mack and Rock (1998)Go proposed that attention is required for conscious perception of anything at all. Accurate measurement of where an observer is looking is not always a measure of attention (Shepherd, Findlay, & Hockey, 1986Go). The human response to a visual stimulus depends on many factors but is ultimately task-specific (Findlay & Gilchrist, 2003Go; Gale, 1997Go), that is, how we see depends on the task being performed.

Land, Mennie, and Rusted (1999)Go used eye tracking to investigate eye movements during active tasks, including driving, table tennis, piano playing, and tea making. The results demonstrated that gaze is directed to the points of the scene where most information can be extracted and that the eye anticipates movement rather than follows it. The cognitive aspect of the task was demonstrated to have an important effect on viewing behavior.

The task of sending and receiving sign language signals was explored by Siple (1978)Go. Siple proposed that because sign language is received and initially processed by the visual system, then we would expect that the rules for forming signs would be constrained by the limits of that system. She observed that subjects viewing sign language look at the face, with small excursions around the face, of the signer. This behavior demonstrates the importance of the face in giving clues to the meaning of gestures. Her paper studied the development of the sign system to maximize the information that the eye can gather. In sign language production, small detailed motions were observed to occur in and around the face and upper body region where the receiver (looking at the signer's face) can observe gestures in high acuity. Large, less detailed gestures are produced in the peripheral region of view and therefore observed by the receiver at low visual acuity. These large motions tend to be in the vertical and horizontal axes where acuity is greater than for other orientations. Siple also described the use of redundancy to further maximize the information that can be conveyed in the peripheral region of view. The conclusion of the study by Siple was that efficient communication of sign language between deaf people has developed within the constraints of the Human Visual System.

The nature of a task is fundamental to viewing behavior. Eye movements studied in parallel with an articulated theory of cognitive activity for the task in question can provide useful information about visual perception (Vivianni, 1990Go). Our research investigates the eye movements of deaf people receiving sign language and proposes how video communication systems may be optimized to take account of what is known about the production and observation of signs within human visual limits. We postulate that a deaf person observing sign language is carrying out a specific task that produces a characteristic and consistent pattern of visual attention response that can be exploited to optimize video communication systems such as video telephony and video conferencing systems.


    Video Communication of Sign Language—Previous Research
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 
Previous work on video communication of sign language has been limited by not addressing temporal and spatial quality requirements and visual perception mechanisms and by a lack of consultation and testing with deaf people. The effect of frame rate and spatial resolution on speech reading (mouth/lip shapes), finger spelling, and gestures was investigated by Woelders, Frowein, Nielsen, Questa, and Sandini (1997)Go. They demonstrated that frame rate had a significant effect on the communication of mouth shapes in particular. Video coding schemes have been developed to give priority to designated regions of interest in video communications (Eleftheriadis & Jacquin, 1995Go; Schumeyer, Heredia, & Barner, 1997Go). Saxe and Foulds (2002)Go developed a method of identifying and segmenting the face and hands based on skin color. The segmented regions were given priority in the coding scheme. The video compression algorithm assumed that the hands must be transmitted at the same (temporal or spatial) quality as the face and produced distorted images that were not subject to quality testing by the target end user.

Geisler and Perry (1998)Go demonstrated the potential to exploit the decrease in spatial resolution of the Human Visual System away from the point of gaze using a foveated imaging approach. None of the research available in the literature prior to 2002 considers the perceptual responses of deaf people watching sign language. Our initial gaze tracking experiments with eight deaf volunteers (Muir & Richardson, 2002Go; Muir, Richardson, & Leaper, 2003Go) established that sign language users exhibit a consistent characteristic eye movement response to sign language video. The results of these experiments, confirmed independently by Agrafiotis et al. (2003)Go, support the theory by Siple (1978)Go that deaf people perceive the face in high visual resolution and that hand gestures are viewed in peripheral, lower resolution, vision.


    Experimental Design and Rationale
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 
The investigation presented in this paper builds on our previous work (Muir et al., 2003Go) and extends the study to include more participants, a wider range of sign language video material, and more detailed analysis of the factors that influence the attention of deaf people watching sign language video.

Eye tracking was used to explore the visual response of deaf participants to video stories that were selected to include a wide range of fine and gross sign language movements and gestures. Eye tracking is a fast and accurate method of capturing and processing gaze data. It permits investigation of on-line processing of full-screen video images and does not disrupt normal viewing.

In the experiment, profoundly deaf adult participants, who used British Sign Language (BSL) as their first language, observed three short video clips with the task of understanding the signed stories in each clip. Eye movement data were captured and compared for each subject and clip.

Experiments were conducted under controlled conditions in a room with 100% artificial, overhead lighting. The subject was positioned at a comfortable viewing distance (four to six times the screen height) from the monitor.

Each subject was given instruction in BSL through a qualified interpreter. All communications with the subject were in BSL, the subjects' first language, no printed instructions or feedback forms in English language were used.

The results of the eye-movement tracking experiments were analyzed, for each participant, by playing back the video clip and plotting the recorded (x, y) eye position coordinates on each video frame. The gaze points were examined frame by frame with respect to the designated areas of the video image. The selected areas were as follows: upper and lower face, hands, fingers, upper body, lower body, background, and object (a camera on a tripod in Video Clip 1). These were chosen so that the researcher could identify the most important regions of the scene for sign language communication. The distinction between upper and lower face was made to determine if the region around the eyes (upper face) or around the mouth (lower face) was more significant for understanding the sign language. The distinction between hands and fingers was made to test whether wide movements of the hands and detailed movement of the fingers (e.g., during finger spelling) were followed by the viewer. The upper body area was defined as the area below the chin and above the waist of the signer, and the lower body was defined as the area below the waist. A fixation was recorded as a gaze of duration of 0.02 s or more (Palmer, 2002Go). In cases where the regions overlap (e.g., when the hands were over the face region), the sequence of eye movements before and after this occurrence was observed to estimate which region was being followed by the eye. The data were analyzed in two ways. First, the total fixation time on each of the designated regions was recorded to determine which region was most important to the viewer. Each subject's fixation time was expressed as a percentage of the total viewing time for each clip to allow comparison between viewers and to compare the results for each video clip. Secondly, a timeline was produced, for each subject, which recorded the location of fixations during each of the videos with respect to video content. The data were examined on a frame-by-frame basis and the gaze point noted with respect to the sign language action in the video. Figure 1 includes an extract from the timeline for Video Clip 2. It shows the gaze locations of each of the 10 subjects for the first 5 s of the video clip. The gestures are noted along the top row of the timeline and color-coded to match the colors used to represent the designated regions of the image. Sample frames from the video clip are included in Figure 1 to illustrate the video content.



View larger version (37K):
[in this window]
[in a new window]
 
Figure 1  An extract from the timeline produced to record the location of fixations of each subject (1–10) with respect to the content of Video Clip 2. The sign language contents (gestures) are described in the top row of the timeline, illustrated with sample frames from the video clip, and color-coded to match the colors used to represent designated regions of the video image in the legend.

 

    Method
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 
Subjects
Eye-movement tracking experiments were conducted with 17 profoundly deaf-from-birth volunteers from the Aberdeen Deaf Social and Sports Club (ADSSC). For each subject, BSL was their first language and English was their second language. For this reason, all communications were in BSL, aided by a local BSL interpreter who was known to the participants.

Seven of the participants were excluded from the experiment, as we were unable to obtain consistent accurate tracking of their eye movements during calibration. This was mainly due to head movements. Of the 10 subjects proceeding to the experiment, seven were male and three were female, and ages ranged from 30 to 82 years.

Apparatus
Eye movements were captured by a ViewPoint eye tracker from Arrington Research Ltd. (Cambridge, UK) incorporating an infrared light source and camera mounted on a clamp with a nose bridge and chin rest for comfortable and secure positioning of the subject's head. The infrared light source illuminates the eye and provides reflection from the smooth cornea. The camera captures the video signal, reflected light from the eye, which is digitized by a video capture device in the personal computer (PC). Image segmentation algorithms are applied to the digitized image to locate the dark pupil of the eye. Eye-position signals are transformed to produce eye-movement coordinates. Data gathered from a calibration routine, before the test begins, are used to calculate the point of regard.

Video clips were displayed to the viewer on a 17 in. monitor (Monitor A) with true color, 32-bit display connected to a DELL Pentium IV PC with PCI Video Capture Card installed. A second monitor (Monitor B) was connected to the PC (not visible to the subject) for the researcher to control and monitor the experiment.

Materials
The sign language video material for the experiment was captured at 25 frames per second on a Sony VX200E Digital Video camera, under controlled artificial lighting in the university video recording studio, using two profoundly deaf volunteers. The volunteers were from the same geographical area, the northeast of Scotland, and used the same version of BSL as the subjects participating in the experiment. It is worth noting that BSL has regional variations analogous to speech dialects. The signers in the video related short stories from their own experience using their own natural style and expression of signing. Three video clips were selected to ensure that the test material contained a wide range of sign language movements, expressions, and gestures (including finger spelling) as described later.

The first clip (22.08 s) displays a close view of the signer (from the waist upwards). The signer used facial expression, lip movement, and gestures but limited body movement around the scene that also included a camera and ventilation shaft. These background objects were included to test whether they would prove to be a distraction for the viewer. The story told in this clip is of the signer's experience of communication between deaf and hearing members of her family. An English translation of her story is "A long time ago, when I was young, I would ask my mother what everyone was saying. Now when my children speak with no voice, my mother asks me what they are saying. I remind her that she wouldn't tell me what was being said until they were finished and so she will just have to wait too. She realises this now."

In the second clip (27.20 s), the signer (same as in Video Clip 1) is at a greater distance from the camera and seen above knee height. The signer used facial expression, lip movement, wide gestures, and detailed finger spelling but limited body movement around the scene that had no distracting objects. The story told in this clip is of the signer's experience as a child at school learning to use her voice. The English translation of her story is "When I was at school, a long time ago, when I was small, my speech was hopeless. They tried to teach me to speak but it just went over my head. The teacher said it was a bit of a problem. She said some people are good but you are not good, your speaking is not good. So I had to lie down on the floor and say ‘Ah’. She put a darning needle in my mouth to make the ‘A’ sound. My heart was throbbing."

In the third clip (46.64 s), the signer used facial expression, lip movement, finger spelling, wide gestures, and movement around the scene to tell the story of his experience on holiday. The English translation of his story is "We have been to America, three times, and also to Spain. We met deaf people in America. The sign language was different but we could catch certain things by gesturing and so on. Things like ‘walking’, ‘hot’, ‘drinking’, and ‘good’ we could communicate, and also by writing things down. When I was a boy, I played football and so I could make conversation about that. The language was different, it was interesting."

Procedure
The eye tracker camera was set up so that the video image of the subject's pupil (dominant eye where appropriate) was in the center of the control display window in Monitor B. The tracking system was adjusted in set-up mode (temporal resolution = 30 Hz and internal processing = 340 x 240) so that the threshold area of the dark pupil of the eye and the white corneal reflection was obtained in the search area. The scan density was adjusted to obtain the minimum number of points that would correctly locate the dark pupil for maximum possible accuracy. Following the set-up stage, the equipment was calibrated for the individual subject to obtain coefficients for internal mathematical mapping. Calibration was performed at temporal resolution of 30 Hz and internal processing of 640 x 480 to obtain the highest possible degree of accuracy. The subject was instructed to foveate on each of the 16 calibration points on Monitor A until they disappeared from the screen and avoiding anticipation of the next point. The researcher controlled and monitored the calibration routine on Monitor B, checking the success of calibration and re-presenting the stimuli as required. Once calibrated, the video stimuli (three videos separated by further calibration markers) were presented full screen to the subject on Monitor A. Eye movements were processed at temporal resolution of 60 Hz and internal processing of 340 x 240 and monitored by the researcher on Monitor B. The (x, y) coordinates of the captured gaze data were saved to a unique data file for each subject.

The total time for the experiment with an individual participant was approximately 20 min. At the end of the experiment, subjects were asked if there was anything in the sign language video that could not be understood, was not clear, or needed to be repeated. The rationale for an open-ended question unrelated to the video content was that the researchers wished to test ease of relaxed, natural sign language communication to the subject rather than test comprehension, which might have influenced the way the video clips were regarded.


    Results
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 
All subjects reported ease of sign language communication with no requests for clarification or repetition. Conversations with the subjects after the experiment, through the BSL interpreter, demonstrated understanding of, and interest in, the content of the video clips used.

Fixation on Regions of Importance
The total fixation time (seconds) in separate designated regions of the video image was recorded for each of the 10 subjects. Total fixation times for each subject vary depending on the number of saccades during viewing. The total and percentage fixation times for each subject, for each of the test video sequences, are given for each region of the video image in Tables 1, 2, and 3. The tables also show the average total and percentage time spent looking at each of the designated image regions.


View this table:
[in this window]
[in a new window]
 
Table 1  Total and percentage fixation times on different regions of Video Clip 1

 

View this table:
[in this window]
[in a new window]
 
Table 2  Total and percentage fixation times on different regions of Video Clip 2

 

View this table:
[in this window]
[in a new window]
 
Table 3  Total and percentage fixation times on different regions of Video Clip 3

 
The average percentage fixation times are plotted in Figure 2 to allow comparison of the results obtained for the three video clips used in the experiment.



View larger version (19K):
[in this window]
[in a new window]
 
Figure 2  Average percentage fixation time on each of the designated regions of the sign language video image in Video Clips 1–3.

 
The results for Video Clip 1 (Table 1) demonstrate that, on average, most of the time was spent looking at the face (88.31%) and in particular the upper face region (72.22%) of the video. Subjects 1, 4, 5, 7, and 9 (shown in bold typeface in Table 1) displayed a very similar pattern of viewing times and looked almost exclusively at the upper face during this video clip (96.22% with low standard deviation). Subjects 6 and 8 exhibited behavior similar to this group in terms of the time spent looking at the face although their gaze fell on the lower face more than the rest of the group (21.09% and 10.34%, respectively). The average fixation time on the lower body of the signer and the background object (camera) was less than the threshold time for a fixation. The subjects spent an average of 0.5 s (2.54%) of the total viewing time looking at the hands. An average of 1.88 s (8.77%) of the total viewing time was spent looking at the upper body region.

The results for Video Clip 2 (Table 2) show that most of the fixation time (82.05%) was on the face region. Subjects 5, 6, and 7 exhibited similar behavior to that for Video Clip 1 (average of 90.42% of fixation time on the upper face), shown in boldface type in Table 2. Subjects 1, 4, 8, 9, and 10 spent more time (an average of 55.09% fixation time) looking at the lower face region. Subjects 2 and 3 showed a similar viewing pattern to that shown for Video Clip 1, that is, fixating more on the upper body region.

Results for Video Clip 3 are shown in Table 3 (data for Subject 7 are excluded as he was the signer in the video clip). The average time spent looking at the face in this test was 60.38% of the fixation time. More of the fixation time, 36.64% on average, was spent looking at the upper body region that includes the area just below the face and the chest of the signer. Three of the subjects (Subjects 2, 4, and 10) spent most of their fixation time on the upper body (in contrast to the behavior of Subjects 4 and 10 during Video Clips 1 and 2).

Plotting the average fixation times on the designated areas, for each of the clips, Figure 2 shows similar curves (patterns of viewing behavior) but in varying proportions. The difference in the pattern of results obtained for the three clips is explored further to determine if the data could have come from the same population (null hypothesis) or if the difference between at least two of the data sets is statistically significant.

Statistical Comparison of Viewing Behavior for Three Video Clips
A nonparametric Friedman test was conducted to determine whether there was a statistically significant difference in the percentage fixation times on the specified regions of each of the test video clips by the subjects in the sample at the 5% significance level. A nonparametric test is applied to ordinal or interval data, is distribution free, and tests whether population locations differ (Keller & Warrack, 2003Go). The eye location data are intervals (percentage fixation times) and are not normally distributed. The null hypothesis for the test is that the data for all the three clips could have come from the same population and are not significantly different.

The Friedman test ranks the results (percentage fixation times) for the subjects for each video clip and uses chi-square distributions to determine whether at least two of the data sets differ. The SPSS output is shown in Tables 4 and 5:


View this table:
[in this window]
[in a new window]
 
Table 4  Friedman test ranking of percentage fixation times for each video clip

 

View this table:
[in this window]
[in a new window]
 
Table 5  Friedman test statistics

 
The test significance result is a .097 probability that there is no significant difference in the results obtained for the three videos. This is greater than the level of significance (.05 probability). The Friedman test indicates that, for this sample, there was no statistically significant difference in the viewing behavior of subjects for the three different types of video sequence used in the experiments.

Fixation in Relation to Video Content
Further examination of the raw data was conducted to explore the motivating factors for eye movements. The sequence of fixations for each subject was examined with respect to the sign language content of each video clip. A timeline (similar to the example shown in Figure 1) was produced for each subject.

For the first video clip, the short excursions to the hands exhibited by Subjects 2, 3, 8, 9, and 10 were found to be associated with movement of the hands near to (to one side of) the face region in the sign language video, possibly because the hands were close enough to "draw" the eyes away from the face but still allow the face to be seen at high resolution. Two of the subjects (Subjects 2 and 3) spent a greater percentage of their total fixation time looking at the upper body region (30.35% and 51.78%, respectively) that included the area just below the face. Examination of the timeline suggested that the gaze of these subjects was closer to the location of the hands than the other participants.

Motivating factors taking gaze away from the face in Video Clip 2 were investigated by examining the timelines for each subject. Gaze away from the face (mostly to the upper body region) occurred during pauses in sign language and when gestures and movements were located in the lower body region of the signer. None of the subjects followed the hands or fingers during the periods of finger spelling in the video. Gaze was found to be in the (upper or lower) face region during finger spelling in all cases.

Examination of the timelines for Video Clip 3 indicated that factors influencing gaze in the upper body region were large gestures (in the lower body region of the signer) and movement of the signer around the scene, particularly towards the end of the clip.

The results imply that the face is the center of attention for a deaf person observing sign language, particularly for sequences where the signer uses a range of gestures and finger spelling but without wide ranging body movements (Video Clips 1 and 2). Gaze is mostly in the upper face region for Video Clip 1 (in which there is a closer view of the signer) and more time is spent on the lower face in Video Clip 2 (in which the signer is further away from the camera, makes wider gestures, and uses more detailed finger spelling). Hand gestures close to the face, expansive gestures in the lower body region of the signer, and movement of the signer around the video scene were found to act as "drivers" (motivating factors), taking the subject's gaze away from the face region.


    Discussion
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 
The aim of this investigation was to explore how profoundly deaf people view sign language video content and the application of this to the design of video communication systems.

In the introduction to this paper, we identified the importance of the task and the nature of the sign language material on gaze patterns. The work of Siple (1978)Go was important for understanding the relationship between the Human Visual System and the development and production of sign language. What is seen in clear foveal vision and the information that can be gathered from peripheral vision can be used to guide the development of systems (sign systems or video systems) that work optimally within the limitations of human vision.

Our eye-movement tracking experiment was designed to test the responses of deaf viewers to a wide range of sign language movements and gestures and to investigate viewing patterns that might be exploited in the design of optimized video communication systems. Our results demonstrate that the most important region of the sign language video image is the face of the signer. This is particularly evident in the results obtained for Video Clip 1 where the signer is closer to the camera than in the other video clips. Fixations are mainly on the upper face region with no visual excursions to the distracter objects in the background. Gaze is more on the lower face region for Video Clip 2 where the signer is further from the camera and the face region is therefore smaller. Participants were found not to follow the movements of the hands or detailed movements of the fingers during periods of finger spelling, suggesting that manual sign information was observed in peripheral (lower resolution) vision. Short excursions to the hands were noted only when the hands of the signer were close to the face. The hands were close enough for the face to remain in foveal (high resolution) vision. The wider, more rapid gestures and movements of the signer in Video Clip 3 seemed to cause gaze to fall more on the upper body region of the signer for some viewers. There was no statistically significant difference in the patterns of viewing behavior across the three videos tested, as determined by the Friedman test. This leads us to conclude that the same viewing strategies are applied by viewers to different aspects of sign language video regardless of the background, distance of the signer from the camera, and movement of the signer around the scene.

These findings are supported by vision theory and published research, in particular, the previously mentioned work of Siple in relation to sign language. Human perception of motion is an important factor that may influence the way deaf people view sign language. It has been demonstrated that temporal properties of vision are similar across the human visual field (Virsu, Rovamo, Laurenen, & Nasenen, 1982Go). As discussed earlier in this paper, the same is not true for spatial vision. Foveal vision (corresponding to a visual angle of 2.5° from the point of fixation) is an area of acute vision. It is the most spatially sensitive part of the visual field, providing high-resolution vision.

Extrafoveal, or peripheral, low-resolution vision has been shown to have an important role to play in the perception of motion. A study of "eccentricity dependence" of motion perception by Baker and Braddick (1985)Go concluded that peripheral vision is superior for processing visual motion. They studied the ability of subjects to report the direction of apparent motion when an array of random dots was displaced in relation to retinal eccentricity factors. They found that peripheral vision is specialized for motion perception. They also established that the range of velocities that can be processed increases greatly in peripheral vision, whereas in central foveal vision, only a very restricted range of velocities could stimulate a vision response.

From our results, detailed spatial vision of the face region was found to be important for comprehension of sign language. Assuming that the hands of the signer play a significant part in sign language communication, it must be the case that they are observed in peripheral vision when they are not close enough to the face to be captured by the fovea of the eye. Peripheral vision was found to be adequate for the gross and rapid sign language movements of the hands and body that occurred away from the face region of the signer in our experiment.

We conclude from this that a deaf viewer fixates mostly on the facial region of the signer to pick up small detailed movements, associated with facial expression and lip shapes, that are known to convey important sign language information to the receiver. Small movements of the hands in front of or near to the face can be observed in the foveated region of view, but more detailed movement near to the face was found to draw the eyes of some subjects away from the face for a short time. During this time, the face was still close enough to be seen in high visual acuity. A deaf person uses peripheral vision to process information from larger, rapid movements of the signer. Fixation on the upper body region (including the area below the face) by some subjects may have occurred to permit a range of smaller movements to be processed at the edge of the foveal area while still keeping the lower part of the face in high-resolution foveal vision.

These results have a number of implications for visual communication systems. A deaf person requires high spatial resolution in the face region of the signer while temporal resolution is maintained across the entire video scene. This indicates that there is scope for prioritized transmission of sign language video, for example, by coding different parts of the scene with varying image quality. It may be possible to reduce the quality of the peripheral region, including body and hands (when away from the face), in a coded video sequence while maintaining perceived video quality. For example, popular video coding standards such as MPEG-4 Visual, H.263, and H.264 achieve compression by a process of motion compensated prediction followed by transform coding, quantization, and entropy coding (Richardson, 2003Go). The coding process is "lossy," that is, there is some loss of quality in the decoded video sequence. A large quantizer step size produces high compression and poor decoded quality and vice versa. Prioritized coding of sign language video could be achieved by reducing the quantizer step size in the face region and increasing the step size further away from the face, resulting in higher compression of the regions that are perceived in peripheral vision. Extending this priority region to just below the face could enable viewers who need to increase their region of detection of small movements, while maintaining detail for oral sign language signals, could be achieved. The region of clarity for small slow movements could be set for the individual user to allow customization, as it is clear from the results that content is not always viewed in precisely the same way. Video compression, optimized in this way to meet the needs of the user, would improve perceived video quality at low bit rates, that is, less than 200 kb/s. Standard systems with bit rates of 256 kb/s currently giving "good quality" quarter-screen (CIF) video could be optimized to provide good full-screen DVD quality video images.

Further work has been conducted to quantify the relative requirements for image quality in the regions of a coded sign language video sequence. Tests have been conducted to determine the effect of selective coding of sign language video content on perception of quality by deaf people (Muir, Richardson, & Hamilton, 2005Go). A part of this work includes the development of a suitable method of measuring subjective quality because standardized methods, such as ITU-T P.910 (1999)Go, may not be appropriate for the task-specific nature of sign language video quality assessment.

The findings presented in this paper demonstrate the potential to exploit the viewing behavior of deaf people in the design or adaptation of video communication systems for this user group. Selective prioritization of important regions of the video image may enable more efficient transmission and improve the perceived quality of sign language video content by deaf people.


    Acknowledgments
 
The authors would like to acknowledge the help and support of Jim Hunter who acted as BSL interpreter and organized volunteers for the experiments. Special thanks to Edith Ewen and the deaf people at the Aberdeen Deaf Social and Sports Club for their continued interest and support and for taking part in the eye-movement tracking experiments.


    References
 TOP
 Introduction
 British Sign Language and...
 Visual Perception of Sign...
 Video Communication of Sign...
 Experimental Design and...
 Method
 Results
 Discussion
 References
 

    Agrafiotis, D., Canagarajah, N., Bull, D., Dye, M., Twyford, H. E., Kyle, J. G., et al. (2003). Optimized sign language video coding based on eye-tracking analysis. Proceedings of Visual Communications and Image Processing, July, University of Italian Switzerland, Lugano, Switzerland.

    Baker, C. L., & Braddick, O. J. (1985). Eccentricity-dependent scaling of the limits of short-range motion perception. Vision Research, 25, 803–812.[CrossRef][ISI][Medline]

    Cumming, G. D. (1978). Eye movements and visual perception. In E. C. Carterette & M. P. Friedman (Eds.), Handbook of perception (pp. 221–255). Massachusetts: Academic Press.

    Eleftheriadis, A., & Jacquin, A. (1995). Automatic face location, detection and tracking for model-assisted coding of video teleconferencing sequences at low bit rates. Signal Processing: Image Communication, 7 (3), 231–248.[CrossRef]

    Findlay, J. M., & Gilchrist, I. D. (2003). Active vision: The psychology of looking and seeing. Oxford: Oxford University Press.

    Gale, A. G. (1997). Human response to visual stimuli. In W. R. Hendee & P. N. T. Wells (Eds.), The perception of visual information. New York: Springer-Verlag.

    Geisler, W. S., & Perry, J. S. (1998). A real-time foveated multi-resolution system for low bandwidth video communication. SPIE Proceedings, Vol. 3299.

    Hendee, W. R., & Wells, P. N. T. (Eds.). (1997). The perception of visual information (2nd ed.). New York: Springer-Verlag.

    ISO/IEC 14496-10 and ITU-T Rec. H.264. (2003). Advanced video coding. Geneva: ITU-T.

    ITU-T Rec. H.263. (1998). Video coding for low bit rate communication. Geneva: ITU-T.

    ITU-T Rec. P.910. (1999). Subjective video quality assessment methods for multimedia applications. Geneva: ITU-T.

    ITU-T SG16. (1998). Draft Application profile: Sign language and lip reading real time conversation usage of low bit rate video communication. Geneva: ITU-T.

    Keller, G., & Warrack, B. (2003). Statistics for management and economics (pp. 591–594). Thomson Learning.

    Land, M. F., Mennie, N., & Rusted, J. (1999). The roles of vision and eye movements in the control of activities of everyday living. Perception, 28, 1311–1328.[CrossRef][ISI][Medline]

    Mack, A., & Rock, I. (1998). Inattentional blindness. Massachusetts: MIT Press.

    McCaul, T. (1997). Video-based telecommunications technology and the deaf community. Report of Australian Communication Exchange. Queensland: Australian Communication Exchange Ltd.

    Muir, L. J., & Richardson, I. E. G. (2002). Video telephony for the deaf: Analysis and development of an optimised video compression product. Proceedings of the ACM Multimedia Conference, December, Juan Les Pins.

    Muir, L. J., Richardson, I. E. G., & Hamilton, K. (2005). Visual perception of content-prioritised sign language video quality. Proceedings of Visual Information Engineering, University of Glasgow, Glasgow.

    Muir, L. J., Richardson, I. E. G., & Leaper S. (2003). Gaze tracking and its application to video coding. Proceedings of the International Picture Coding Symposium, April, Saint-Malo.

    Palmer, S. E. (2002). Vision science: Photons to phenomenology. Cambridge, MA: MIT Press.

    Richardson, I. E. G. (2003). H.264 and MPEG-4 video compression. Chichester: Wiley.

    Saxe, D. M., & Foulds, R. A. (2002). Robust region of interest coding for improved sign language telecommunication. IEEE Transactions on Information Technology in Biomedicine, 6 (4), 310–316.[CrossRef][ISI][Medline]

    Schumeyer, R., Heredia, E., & Barner, K. (1997). Region of interest priority coding for sign language videoconferencing. Proceedings of the IEEE Workshop on Multimedia Signal Processing, June, Princeton.

    Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye movements and spatial attention. Quarterly Journal of Experimental Psychology, 38A, 475–491.

    Siple, P. (1978). Visual constraints for sign language communication. Sign Language Studies, 19, 95–110.

    Virsu, V., Rovamo, J., Laurenen, P., & Nasenen, R. (1982). Temporal contrast sensitivity and the cortical magnification factor. Vision Research, 22, 1211–1217.[CrossRef][ISI][Medline]

    Vivianni, P. (1990). Eye movements in visual search. Cognitive, perceptual and motor control aspects. In E. Kowler (Ed.), Eye movements and their role in visual and cognitive processes (pp. 353–393). Amsterdam: Elsevier.

    Woelders, W. W., Frowein, H. W., Nielsen, J., Questa, P., & Sandini, G. (1997). New developments in low-bit rate videotelephony for people who are deaf. Journal of Speech, Language and Hearing Research, 40, 1425–1433.[Abstract/Free Full Text]

    Yarbus, A. L. (1967). Eye movements and vision. In S. E. Palmer (Ed.), Vision science: Photons to phenomenology. Massachusetts: MIT Press.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Deaf Stud Deaf EducHome page
K. Emmorey, R. Thompson, and R. Colvin
Eye Gaze During Comprehension of American Sign Language by Native and Beginning Signers
J. Deaf Stud. Deaf Educ., October 1, 2008; (2008) enn037v1.
[Abstract] [Full Text] [PDF]


Home page
J. Cogn. Neurosci.Home page
C. M. Capek, D. Waters, B. Woll, M. MacSweeney, M. J. Brammer, P. K. McGuire, A. S. David, and R. Campbell
Hand and mouth: cortical correlates of lexical processing in british sign language and speechreading english.
J. Cogn. Neurosci., July 1, 2008; 20(7): 1220 - 1234.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
10/4/390    most recent
eni037v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Muir, L. J.
Right arrow Articles by Richardson, I. E. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Muir, L. J.
Right arrow Articles by Richardson, I. E. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?