Visual Attention

What is attention? Why would I even ask the question? Attention has been so well studied over the past century. Attention has been viewed as early selection (Broadbent 1958), using attenuator theory (Treisman 1964), as a late selection process (Norman, 1968, Deutsch and Deutsch 1963), as a two-part process, pre-attentive followed by attentive processing (Neisser 1967), as a result of neural synchrony (Milner 1974), using the metaphor of a spotlight (Shulman et al. 1979), within Feature Integration Theory (Treisman and Gelade 1980), as an object-based phenomenon (Duncan 1984), as a shrink-wrap phenomenon (Moran and Desimone 1985), using the zoom lens metaphor (Eriksen and St. James 1986), as a pre-motor theory subserving eye movements (Rizzolatti et al. 1987), as Guided Search (Wolfe et al. 1989), as Biased Competition (Desimone and Duncan 1995), as feature similarity gain (Treue and Martinez-Trujillo 1999), and more. but which is right? None of them? All of them? Many have said that the enormous mass of literature on attention is a bit difficult to interpret (e.g. Sutherland 1998).

Helmholtz first demonstrated that covert fixations - shifts of attention over a scene without eye movements - were real (Helmholtz 1896). Still, eye movements are the most obvious external manifestation of a change of visual attention. One may ask, of the many kinds of eye movements:

-saccades (voluntary jump-like movements),
-vestibular-ocular reflex (movements that stabilize visual image on retina by causing compensatory changes in eye position as the head moves),
-nystagmus (compensatory eye movements can reach limits of the orbit and must be reset by a primitive saccade),
-optokinetic nystagmus (movements that stabilize gaze during sustained, low-frequency rotations at constant velocity),
-smooth pursuit (voluntary tracking of moving stimuli),
-vergence (coordinated movement of both eyes, converging for objects moving towards and diverging for objects moving away from the eyes), and
-torsion (coordinated rotation of the eyes around optical axis, dependent on head tilt and eye elevation),

which are part of an attentional system and how are they coordinated? Several are clearly part of an attentional system. It is not enough to move gaze to a point on an image (as so many focus on). When we change gaze we do so to fixate on something in the world, and fixation requires a saccade to the location in retinal coordinates but also vergence and torsional movements to allow for the object to be fixated in 3D, to be viewed so that the image is in focus in each eye and also in focus binocularly (within Panum’s fusional area, both horizontally as it is classically defined, but also vertically - see Howard 1982 and Tyler 1991). It is also not difficult to argue for smooth pursuit movements being part of an attention system.

Is attention a solely bottom-up (data-driven, or exogenous) process as many seem to think? What is the role of the salience of the visual stimuli observed (see Wolfe 1998)? Just about everything someone may have studied can be considered a feature or can capture attention. Wolfe presents the kinds of features that humans can detect ‘efficiently’ and thus might be considered salient within an image: color, orientation, curvature, texture, scale, vernier offset, size, spatial frequency, and scale, motion, shape, onset/offset, pictorial depth cues, and stereoscopic depth. For most, subjects can ‘select’ feature or feature values to attend in advance. Data-driven saliency has played a key role in many models of attention, most prominently those of Koch & Ullman (1985) and Itti et al. (1998).

However, Yarbus’ classic work (1979) showed how task requirements affected fixation scan paths for an image. In fact, this was an excellent extension of the basic Posner cueing paradigm - the cue does affect perception - that has played such a large role in experimental work (Posner et al. 1978). Posner (1980) suggested how the overt and covert attentional fixations may be related by proposing that attention had three major functions: i) providing the ability to process high priority signals or alerting; ii) permitting orienting and overt foveation of a stimulus; and, iii) allowing search to detect targets in cluttered scenes. This is the Sequential Attention Model: eye movements are necessarily preceded by covert attentional fixations. Other views have also appeared. Klein put forth another hypothesis (Klein 1980), advocating the Independence Model: Covert and overt attention are independent and co-occur because they are driven by same visual input. Finally, the aforementioned Pre-motor Theory of Attention also has an opinion: Covert attention is the result of activity of the motor system that prepares eye saccades and thus attention is a by-product of the motor system (Rizollati et al. 1987). However, as Klein writes (Klein 2004), the evidence points to three conclusions: that overt orienting is preceded by covert orienting; that overt and covert orienting are exogenously (by external stimuli) activated by similar stimulus conditions; and, that endogenous (due to internal activity) covert orienting of attention is not mediated by endogenously generated saccadic programming.

Within all of these different viewpoints and considerations, the only real constant - something that everyone seems to believe and thus the only logical substitute for James’ original statement – is that attentional phenomena seem to be due to inherent limits in processing capacity in the brain (Tsotsos 1990). But if we seek an explanation of attentional processing, even this does not constrain the possible solutions. Even if we all agree that there is a processing limit, what is its nature? How does it lead to the mechanisms in the brain that produce the phenomena observed experimentally?

Perhaps the bulk of all perceptual research has focused on how the brain decomposes the visual signal into manageable components. Individual neurons are selective for oriented bars, for binocular disparity, for speed of translational motion, for color opponency, and so on. We know that individual neurons also exist that are tuned to particular faces, or other known objects. But how can we deal with unknown scenes and objects? The neural decomposition of a visual scene gives the brain many, many pieces of information about a scene. It is in effect, a Humpty-Dumpty-like problem – we see how the visual image is decomposed but how is it re-assembled into percepts that we can use to guide our day-to-day lives? It is here where the combinatorial explosion has greatest impact. As seen in Tsotsos (1990), there are some general approximations and optimizations that are possible to help deal with the combinatorics that apply to the whole visual system; but they are fixed. Attention, in my view, is the dynamic mechanism that completes the attack on combinatorics: attention is a set of mechanisms that help optimize the search processes inherent in moment-to-moment perception and cognition. What are those mechanisms? Several have already been mentioned, and the full group can be classified as being of three main types with several specializations within each:

Selection
spatio-temporal region of interest
world/task/object/event model
gaze/viewpoint
best interpretation/response
Restriction
task relevant search space pruning
location cues
fixation points
search depth control
Suppression
spatial/feature surround inhibition
inhibition of return
suppress task-irrelevant computations

The goal of our research is to demonstrate how these mechanisms may be realized, how they interact and how they contribute to visual perception.

To find out more ......

References

•Broadbent, D. (1958). Perception and communication, Pergamon Press, NY.
•Desimone, R., Duncan, J. (1995). Neural Mechanisms of Selective Attention, Annual Review of Neuroscience 18, p193 - 222.
•Deutsch, J., Deutsch, D. (1963). Attention: Some theoretical considerations, Psych. Review 70, 80-90.
•Duncan, J., 1984. Selective attention and the organization of visual information. J. Exp. Psychol. Gen. 113 (4), 501–517.
•Helmholtz, H. von (1896/1989). Physiological Optics (1896 - 2nd German Edition, translated by M. Mackeben, from Nakayama and Mackeben, Vision Research 29:11, 1631 - 1647, 1989)
•Howard, I. (1982). Human Visual Orientation. John Wiley and Sons Ltd., Chichester, NY.
•Itti, L., Koch, C., Niebur, E. (1998). A model for saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Analysis and Machine Intelligence 20, 1254-1259.
•Klein, R., (1980). Does oculomotor readiness mediate cognitive control of visual attention?, in R. Nickerson (ed.), Attention and Performance , Vol. 8, p259-276, New York, Academic Press.
•Klein, R.M. (2004). On the Control of Visual Orienting, in Cognitive Neuroscience of Attention, ed. by M.I. Posner, p29-44, The Guilford Press, New York London.
•Koch, C., Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry, Human Neurobiology 4, 219–227.
•Moran,J., Desimone,R. (1985). Selective Attention Gates Visual Processing in the Extrastriate Cortex, Science 229, 782-784.
•Norman, D. (1968). Toward a theory of memory and attention, Psych. Review 75, 522-536.
•Posner, M. I. (1980). ‘Orienting of Attention’, Quarterly Journal of Experimental Psychology 32, 1, 3–25.
•Posner, M. I., Nissen, M., Ogden, W., (1978). Attended and unattended processing modes: The role of set for spatial locations, in Pick & Saltzmann, eds., Modes of Perceiving and Processing Information, 137-158, Hillsdale, NJ: Erlbaum.
•Rizzolatti, G., Riggio, L., Dascola, I., Umilta, C., 1987. Reorienting attention across the horizontal and vertical meridians — evidence in favor of a premotor theory of attention. Neuropsychologia 25, 31–40.
•Shulman, G.L., Remington, R., McLean, J.P. (1979). Moving Attention through Visual Space, J. Experimental Psychology 92, p428-431.
•Sutherland, S. (1998). Book Reviews, Nature 392 (26), p350.
•Treisman, A. (1964). The effect of irrelevant material on the efficiency of selective listening, American J. Psychology 77 533-546.
•Treisman, A., Gelade, G. (1980). A feature integration theory of attention, Cognitive Psychology 12, p97-136.
•Treue, S., Martinez-Trujillo, J., 1999. Feature-based attention influences motion processing gain in macaque visual cortex. Nature 399 (6736), 575–579.
•Tsotsos, J.K. (1990). Analyzing Vision at the Complexity Level, Behavioral and Brain Sciences 13-3, p423 - 445.
•Tyler, C.W. (1991). The horoptor and binocular fusion. In D. Regan, editor, Binocular Vision, pages 19-37. CRC Press, Boca Raton, FL.
•Wolfe, J., Cave, K., Franzel, S., 1989. Guided search: an alternative to the feature integration model for visual search. J. Exp. Psychol. Hum. Percept. Perform. 15, 419–433.
•Wolfe, J. (1998). Visual Search, in Attention (ed. Pashler, H.), 13–74, University College London, London.
•Yarbus, A.L. (1967). Eye Movements and Vision. New York: Plenum.