How is ST Distinct from Other Models?


There are several dimensions along which we can compare models. Five of these are shown here. Make sure you see the Selective Tuning page first in order to best appreciate the following.

ST does not use a Single Saliency Map

The view that attention is a set of mechanisms, as shown on the Visual Attention page, is the main distinguishing feature between this model and others, particularly those that are direct descendants of the original 1985 saliency map model of Koch and Ullman, which itself was an attempt to realize the Feature Integration Theory presented in 1980 by Treisman and Gelade. The original saliency map model dealt only with the selection of a single spatial point of interest, of maximum conspicuity. The first implementation and simulation of it was due to Pete Sandon (1989).  Later, Laurent Itti developed a new and expanded implementation (Itti et al. 1998). He and his group have added task bias, motion, and other elements to it. Throughout this development, the original form of the saliency model has remained intact and does not deal with attentive effects in any other representation of visual processing. The Selective Tuning view of attention includes (in fact predicted, in Tsotsos, 1990; Figures 7 and 8, p440) effects due to attentive processing throughout the visual processing network. The statement there is even stronger; attention controls the sub-network that feeds neurons at the top of the visual processing hierarchy. Recently, Gilbert and Sigman (2007) have suggested exactly the same.

ST uses Recurrent Tracing of Connections to achieve Localization

The idea of tracing back connections in a top-down fashion was present in part, in the NeoCognitron model of Fukushima (1986) and suggested even earlier by Milner (1974). It also appeared later in the Reverse Hierarchy Model of Ahissar and Hochstein (1997). Within the Selective Tuning model, it was first described in Tsotsos (1991, 1993), with accompanying details and proofs in Tsotsos et al. (1995). Only Neocognitron and Selective Tuning provide realizations; otherwise, the two differ in all details. Fukushima's model included a maximum detector at the top layer to select the highest responding cell and all other cells were set to their rest state. Only afferent paths to this cell are facilitated by action from efferent signals from this cell. In contrast, neural inhibition is the only action of ST, with no facilitation. The NeoCognitron competitive mechanism is lateral inhibition at the highest and intermediate levels that finds strongest single neurons thus assuming all scales are represented explicitly, while ST finds regions of neurons removing this unrealistic assumption. For ST, units losing the competition at the top are left alone and not affected at all. ST's inhibition is only within afferent sets to winning units. This, now broadly supported prediction of a space-limited suppressive surround, firmly distinguishes the two approaches, and places ST ahead of NeoCognitron in terms of biological plausibility.  Finally, Fukushima assumes that so-called grandmother cells populate the top layer whereas ST makes no such assumption. Overall, the NeoCognitron model and its enhancements cannot scale and would suffer from representational and search combinatorics (Tsotsos, 1990).

ST uses a Recurrent Maximum Decision Function

The use of a top-down max operation is in direct opposition to the feedforward max operation used in many current models of object recognition with claims of biological plausibility (Riesenhuber and Poggio 1998, and its many derivatives)  The experimental evidence against a feed-forward maximum operation is overwhelming. The majority of studies that have examined responses with two non-overlapping stimuli in the classic receptive field have found that the firing rate evoked by the pair is typically lower than the response to the preferred of the two presented alone, inconsistent with a max rule (Miller et al., 1993; Reynolds et al., 1999; Missal et al., 1999; Recanzone et al., 1997; Reynolds and Desimone, 1998; Chelazzi, et al., 1998; Rolls and Tovee, 1995; Zoccolan, et al., 2005). Additional studies have found the response to the preferred stimulus changes when presented along with other stimuli, a pattern inconsistent with a feedforward max operation (Sheinberg and Logothetis, 2001; Rolls et al., 2003). A theoretical argument may also be made against a feed-forward max using the equivalence conditions between relaxation labeling processes and max selection (Zucker et al., 1981). They prove that relaxation labelling (a formal, general form of lateral interaction among units) is equivalent to max selection only under two conditions: that values are ordered and there is no process that might change that ordering. Considering the role of lateral processes in vision (Ben-Shahar et al., 2003), the latter condition is never true; lateral interactions are ubiquitous in visual processing in the brain. Time course also matters. It has been observed that most V1 response increases due to lateral interactions seem to occur in the latter parts of the response profile. This hints that lateral interaction takes extra time to take effect with V1 responses continuing until about 300 ms after stimulus onset (Kapadia et al., 1995), well after the first feed-forward traversal has completed. Recent papers by Roelfsema (2006) and DiLollo (in press) also stress the importance of recurrence in vision. Recurrence was part of ST from the earliest papers.

ST Employs Surround Suppression in the Spatial and Feature Domains to improve Signal/Noise in Neural Response

Koch & Ullman’s (1985) saliency map features a prediction that locations near the focus of attention are facilitated for next focus choice; the model includes  a proximity effect which would favor shifts to neighbouring locations (see  page 224 of their paper, the section titled “Proximity Preference”). In fact, Christof Koch and Francis Crick went to great pains to show how an attentive suppressive surround was not biologically feasible, going through the options of attentive enhancement, suppression and their combination in the development, explicitly rejecting the suppressive penumbra idea (Crick and Koch 1990; p. 959).  In Selective Tuning, this differs. Consider the following. Neurons seem to have a preferred tuning, that is they are selective for certain kinds of stimuli. A preferred stimulus within its receptive field is the ‘signal’ it seeks, while whatever else may lie within the receptive field is not of interest; it can be considered ‘noise’. The overall response of the neuron is clearly a function of both the signal and the noise. The response to the signal would be maximized if the noise were suppressed; a Gaussian-shaped inhibition on the surround was described in (Tsotsos 1990, p439-440). This stood as a prediction for years but now enjoys a great deal of support. See a list of supporting experiments for the suppressive surround at Predictions. But also, this attentive suppression has an interesting side-effect: an apparent enhancement of neural response, because the ‘noise’ is removed (this later became the foundation of the biased competition model of attention Desimone & Duncan 1995). Thus, ST predicts that there is no active process or neural enhancement.

ST is Not Just Biased Competition

The Biased Competition (BC) model of visual attention has provided inspiration for many other studies and models (Desimone and Duncan 1995; Reynolds et al. 1999). It is based on an equation that shows how an attentional bias may affect the response of a neuron that receives input from excitatory and inhibitory neuron pools.  As a  result, it shows how a receptive field might shrink-wrap a stimulus when it is attended. It also does provides a good fit to the firing patterns of V4 neurons in macaque. Does ST have anything beyond this to offer? ST is not a single neuron theory; BC is, as evidenced by the only formal description of it in Reynolds et al. (1999). BC does not provide a method for how a particular neuron may be selected nor for how attentional influence might arrive there. The mathematics of the model makes the assumption that the excitatory and inhibitory neuron pools are statistically independent - this is clearly untrue.  ST provides a network-based mechanism, shows how selection is accomplished and how the selection is communicated through the network. In BC, attention is just a multiplier; in ST attention is a broad set of mechanisms, more closely connecting to the range of observations made by so many studies over the years. Moreover, if one looks at point one of a list half-way down page 441 of Tsotsos (1990), you see that the enhanced response effect due to attention in a single neuron that is the cornerstone of the BC model is described as part of the original ST model.

ST is Distinct from other Models and has made many Testable Predictions

Thus, the Selective Tuning model - whether for visual attention or more broadly for vision - is distinct in important ways from other models, both past and present. A final body of support for ST comes from its predictive power.  Selective Tuning offers many predictions about biological vision (as opposed to explanations for already discovered phenomena). A list of predictions, the papers in which they appeared, and supporting experimental evidence where available, is provided.


  1. Ahissar, M., Hochstein, S., 1997. Task difficulty and the specificity of perceptual learning. Nature 387, 401–406.

  2. Ben-Shahar, O., Huggins, P., Izo, T., Zucker, S.W., 2003. Cortical connections and early visual function: intra- and inter-columnar processing. J. Physiol. (Paris) vol. 97 (No 2), 191–208.

  3. Crick, F., Koch, C. (1990). Some Reflections on Visual Awareness, in Cold Spring Harbor Symposia on Quantitative Biology, Vol. LV, Cold Spring Harbor Laboratory Press, p. 953 - 962.

  4. Desimone, R., Duncan, J. (1995).  Neural Mechanisms of Selective Attention, Annual Review of Neuroscience 18, p193 - 222.

  5. DiLollo, V. (in press). Iterative Reentrant Processing: A Conceptual Framework for Perception and Cognition in Coltheart, V. (Ed.), Tutorials in Visual Cognition, N.Y.  Psychology Press.

  6. Fukushima, K., 1986. A neural network model for selective attention in visual pattern recognition. Biol. Cybern. vol  55 (1), 5–15.

  7. Gilbert, C.D., Sigman, M. (2007). Brain States: Top-Down Influences in Sensory Processing, Neuron 54, 677-696.

  8. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1254–1259.

  9. Kapadia, M., Ito, M., Gilbert, G., Westheimer, G., 1995. Improvement in Visual Sensitivity by Changes in Local Context: Parallel Studies in Human Observers and in V1 of Alert Monkeys. Neuron 15, 843–856.

  10. Koch, C., Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry, Human Neurobiology 4, 219–227.

  11. Miller, E.K., Gochin, P.M., Gross, C.G., 1993. Suppression of visual responses of neurons in inferior temporal cortex of the awake macaque by addition of a second stimulus, Brain Res. 616 (1–2), 25–29.

  12. Milner, P.M., 1974. A model for visual shape recognition. Psychol. Rev. 81–6, 521–535.

  13. Missal, M., Vogels, R., Li, C-Y., Orban, G., 1999. Shape Interactions in Macaque Inferior Temporal Neurons, J.  Neurophysiol. 82 (No 1), pp. 131–142.

  14. Recanzone, G., Wurtz, R., Schwarz, U., 1997. Responses of MT and MST Neurons to One and Two Moving Objects in the Receptive Field, J. Neurophysiol. 78 (No. 6), 2904–2915.

  15. Riesenhuber, M., Poggio, T. (1999). Hierarchical models of object recognition in cortex, Nature Neuroscience 2(11), p.  1020 - 1025.

  16. Reynolds, J., Chelazzi, L., Desimone, R., 1999. Competitive Mechanisms Subserve Attention in Macaque Areas V2 and V4,J. Neurosci. 19 (5), 1736–1753.

  17. Rolls, E., Aggelopoulos, N., Zheng, F., 2003. The Receptive Fields of Inferior Temporal Cortex Neurons in Natural Scenes.J. Neurosci., 23 (1), 339–348.

  18. Rolls, E., Tovee, M., 1995. The responses of single neurons in the temporal visual cortical areas of the macaque when more than one stimulus is present in the receptive field. J. Exp. Brain Res. 103 (No 3), 409–420.

  19. Sandon, P. (1989). Simulating visual attention. Journal of Cognitive Neuroscience, 2(3):213-231.

  20. Sheinberg, D.L., Logothetis, N.K., 2001. Noticing familiar objects in real world scenes: the role of temporal cortical neurons in natural vision. J. Neurosci. 21 (4), 1340–1350 February 15, 2001.

  21. Treisman, A.M., Gelade, G., 1980. A feature-integration theory of attention. Cogn. Psychol. 12 (1), 97–136.

  22. Tsotsos, J.K., 1990. A complexity level analysis of vision. Behavi. Brain Sci. 13, 423–455.

  23. Tsotsos, J.K., 1991 Localizing Stimuli in a Sensory Field Using an Inhibitory Attentional Beam, October 1991, RBCV-TR-91-37.

  24. Tsotsos, J.K., 1993. An inhibitory beam for attentional selection. In: Harris, L., Jenkin, M. (Eds.), Spatial Vision in Humans and Robots. Cambridge Univ. Press, pp. 313–331.

  25. Tsotsos, J.K., Culhane, S.,Wai,W., Lai, Y., Davis, N., Nuflo, F., 1995. Modeling visual attention via selective tuning. Artif. Intell. 78 (1–2), 507–547.

  26. Zoccolan, D., Cox, D., DiCarlo, J., 2005. Multiple Object Response Normalization in Monkey Inferotemporal  Cortex. J. Neurosci. 25 (36), 8150–8164.

  27. Zucker, S.W., Leclerc, Y.,Mohammed, J., 1981. Continuous relaxation and local maxima selection — conditions for equivalence (in complex speech and vision understanding systems). IEEE Trans. Pattern Anal. Mach. Intell. vol. PAMI-3, 117–127.