TY - GEN
T1 - Video and audio data integration for conferencing
AU - Pappas, Thrasyvoulos N.
AU - Hinds, Raynard O.
PY - 1995/1/1
Y1 - 1995/1/1
N2 - In videoconferencing applications the perceived quality of the video signal is affected by the presence of an audio signal (speech). To achieve high compression rates, video coders must compromise image quality in terms of spatial resolution, grayscale resolution, and frame rate, and may introduce various kinds of artifact.s We consider tradeoffs in grayscale resolution and frame rate, and use subjective evaluations to assess the perceived quality of the video signal in the presence of speech. In particular we explore the importance of lip synchronization. In our experiment we used an original grayscale sequence at QCIF resolution, 30 frames/second, and 256 gray levels. We compared the 256-level sequence at different frame rates with a two-level version of the sequence at 30 frames/sec. The viewing distance was 20 image heights, or roughly two feet from an SGI workstation. We used uncoded speech. To obtain the two-level sequence we used an adaptive clustering algorithm for segmentation of video sequences. The binary sketches it creates move smoothly and preserve the main characteristics of the face, so that it is easily recognizable. More importantly, the rendering of lip and eye movements is very accurate. The test results indicate that when the frame rate of the full grayscale sequence is low (less than 5 frames/sec), most observers prefer the two-level sequence.
AB - In videoconferencing applications the perceived quality of the video signal is affected by the presence of an audio signal (speech). To achieve high compression rates, video coders must compromise image quality in terms of spatial resolution, grayscale resolution, and frame rate, and may introduce various kinds of artifact.s We consider tradeoffs in grayscale resolution and frame rate, and use subjective evaluations to assess the perceived quality of the video signal in the presence of speech. In particular we explore the importance of lip synchronization. In our experiment we used an original grayscale sequence at QCIF resolution, 30 frames/second, and 256 gray levels. We compared the 256-level sequence at different frame rates with a two-level version of the sequence at 30 frames/sec. The viewing distance was 20 image heights, or roughly two feet from an SGI workstation. We used uncoded speech. To obtain the two-level sequence we used an adaptive clustering algorithm for segmentation of video sequences. The binary sketches it creates move smoothly and preserve the main characteristics of the face, so that it is easily recognizable. More importantly, the rendering of lip and eye movements is very accurate. The test results indicate that when the frame rate of the full grayscale sequence is low (less than 5 frames/sec), most observers prefer the two-level sequence.
UR - http://www.scopus.com/inward/record.url?scp=0029229381&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0029229381&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0029229381
SN - 0819417580
T3 - Proceedings of SPIE - The International Society for Optical Engineering
SP - 120
EP - 127
BT - Proceedings of SPIE - The International Society for Optical Engineering
PB - Society of Photo-Optical Instrumentation Engineers
T2 - Human Vision, Visual Processing, and Digital Display VI
Y2 - 6 February 1995 through 8 February 1995
ER -