TY - GEN
T1 - Predicting human gaze using low-level saliency combined with face detection
AU - Cerf, Moran
AU - Harel, Jonathan
AU - Einhäuser, Wolfgang
AU - Koch, Christof
PY - 2008
Y1 - 2008
N2 - Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model's predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses.
AB - Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model's predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses.
UR - http://www.scopus.com/inward/record.url?scp=85138484183&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138484183&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85138484183
SN - 160560352X
SN - 9781605603520
T3 - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
BT - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
PB - Curran Associates Inc.
T2 - 21st Annual Conference on Neural Information Processing Systems, NIPS 2007
Y2 - 3 December 2007 through 6 December 2007
ER -