Abstract
Introduction. Visual saliency refers to the perceptual quality of location in a visual scene, which manifests itself subjectively in its attractiveness to the observer and objectively in the probability of shifting attention or fixating eye movements on it. This quality arises from the integration of visual feature maps and is modulated by several central mechanisms. It is important to distinguish between the terms saliency and conspicuity; in a theoretical context, these are not the same. This review, for the first time, combines the results of computer modeling of visual saliency with a detailed discussion of the theoretical background for creating such models. The theory of feature integration proposed by A. M. Treisman is examined, along with its advantages and limitations, which provided the way for the three-level model of visual attention developed by C. Koch and S. Ullman. According to this theory, focal attention is governed by a “winner-takes-all” mechanism, which relies on a saliency map encoding the attractiveness of each fragment of the visual scene. The original theory did not describe how the saliency map is formed, and this question remains the focus of research using computer modeling. Results and Discussion. The results of studies on modeling visual saliency are reviewed. In particular, the early computational model by L. Itti, C. Koch, and E. Niebur, which laid the foundation for many subsequent developments, is described in detail. Approaches to modeling that preceded the advent of modern high-performance neural networks are examined, and a range of contemporary models based on deep learning technologies is presented, together with their characteristic properties. This is the first comprehensive review of saliency models published in Russian. Researchers have developed several models of practical utility, and the paper discusses their potential for real-world application.
References
Ангельгардт, А. Н., Макаров, И. М., & Горбунова, Е. С. (2021). Роль уровня категории при решении задачи гибридного зрительного поиска. Вопросы психологии, 2, 148–158. https://www.elibrary.ru/item.asp?id=46548586
Величковский, Б. М. (2006). Когнитивная наука. Основы психологии познания (Т. 1). Смысл, Издательский центр «Академия».
Воронин, И. А., Захаров, И. М., Табуева, А. О., & Мерзон, Л. А. (2020). Диффузная модель принятия решения: оценка скорости и точности ответов в задачах выбора из двух альтернатив в исследованиях когнитивных процессов и способностей. Теоретическая и экспериментальная психология, 13(2), 6–24. https://www.elibrary.ru/item.asp?id=48627007
Горбунова, Е. С. (2023). Механизмы построения репрезентации в категориальном поиске: роль внимания и рабочей памяти. Российский психологический журнал, 20(3), 116–130. https://doi.org/10.21702/rpj.2023.3.6
Гусев, А. Н., & Уточкин, И. С. (2012). Влияние вероятности подсказки на эффективность пространственной локализации зрительного стимула. Известия Иркутского государственного университета. Серия: Психология, 1(1), 34–39. https://elibrary.ru/item.asp?id=18040183
Кочурко, В. А., Мадани, К., Сабуран, К., Головко, В. А., & Кочурко, П. А. (2015). Обнаружение объектов системами компьютерного зрения: подход на основе визуальной салиентности. Доклады Белорусского государственного университета информатики и радиоэлектроники, 91(5), 47–53. https://www.elibrary.ru/item.asp?id=29737620
Крускоп, А. С., Лунякова, Е. Г., Дубровский, В. Е., & Гарусев, А. В. (2023). Особенности движений глаз в задаче зрительного поиска в зависимости от вербализуемости и симметричности стимулов. Вестник Московского Университета. Серия 14: Психология, 46(4), 88–111. https://doi.org/10.11621/LPJ-23-40
Мартынова, О. В., & Балаев, В. В. (2015). Возрастные изменения в функциональной связанности сетей состояния покоя. Психология. Журнал Высшей школы экономики, 12(4), 33–47. https://psy-journal.hse.ru/2015-12-4/167444789.html
Мински, М., & Пейперт, С. (1971). Персептроны. Мир.
Подладчикова, Л. Н., Самарин, А. И., Шапошников, Д. Г., Колтунова, Т. И., Петрушан, М. В., & Ломакина, О. В. (2017). Современные представления о механизмах зрительного внимания. Южный федеральный университет. https://www.elibrary.ru/item.asp?id=32259454
Рожкова, Г. И., Белокопытов, А. В., & Иомдина, Е. Н. (2019). Современные представления о специфике периферического зрения человека. Сенсорные системы, 33(4), 305–330. https://doi.org/10.1134/S0235009219040073
Сапронов, Ф. А., & Горбунова, Е. С. (2025). Сравнение сгенерированных ИИ стимулов и фото: исследование зрительного поиска. Вестник Московского Университета. Серия 14: Психология, 48(2), 109–131. https://doi.org/10.11621/LPJ-25-14
Сапронов, Ф. А., Макаров, И. М., & Горбунова, Е. С. (2023). Категоризация в гибридном поиске: исследование с использованием регистрации движений глаз. Экспериментальная психология, 16(3), 121–138. https://doi.org/10.17759/exppsy.2023160308
Сеченов, И. М. (1942). Рефлексы головного мозга. Издательство АН СССР.
Уточкин, И. С., & Фаликман, М. В. (2006). Торможение возврата внимания. Часть 1. Виды и свойства. Психологический журнал, 27(3), 42–48. https://elibrary.ru/item.asp?id=9212401
Фаликман, М. В. (2015). Структура и динамика зрительного внимания при решении перцептивных задач: конструктивно-деятельностный подход: дис. ... докт. психол. наук: 19.00.01 [МГУ имени М. В. Ломоносова]. https://www.elibrary.ru/item.asp?id=54440859
Фаликман, М. В., Уточкин, И. С., Марков, Ю. А., & Тюрина, Н. А. (2019). Нисходящая регуляция зрительного поиска: есть ли она у детей? 513–517. https://www.elibrary.ru/item.asp?id=40843935
Хохлова, Т. В. (2012). Современные представления о зрении млекопитающих. Журнал общей биологии, 73(6), 418–434. https://elibrary.ru/item.asp?id=18121671
Шевель, Т. М., & Фаликман, М. В. (2022). «Подсказка взглядом» как ключ к механизмам совместного внимания:основные результаты исследований. Культурно-историческая психология, 18(1), 6–16. https://doi.org/10.17759/chp.2022180101
Шолле, Ф. (2023). Глубокое обучение на Python (2nd ed.). Питер.
Ярбус, А. Л. (1965). Роль движений глаз в процессе зрения (Н. Д. Нюберг, Ed.). Наука.
Arun, N. T., Gaw, N., Singh, P., Chang, K., Hoebel, K. V., Patel, J., Gidwani, M., & Kalpathy-Cramer, J. (2020, May 29). Assessing the validity of saliency maps for abnormality localization in medical imaging. http://arxiv.org/abs/2006.00063
Ballard, D. H., Hayhoe, M. M., & Pelz, J. B. (1995). Memory Representations in Natural Tasks. Journal of Cognitive Neuroscience, 7(1), 66–80. https://doi.org/10.1162/jocn.1995.7.1.66
Ballard, D. H., Hayhoe, M. M., Pook, P. K., & Rao, R. P. N. (1997). Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences, 20(4), 723–742. https://doi.org/10.1017/s0140525x97001611
Baluja, S., & Pomerleau, D. (1994). Using a Saliency Map for Active Spatial Selective Attention: Implementation & Initial Results. Proc. Advances in Neural Information Processing Systems, 451–458.
Bashinski, H. S., & Bacharach, V. R. (1980). Enhancement of perceptual sensitivity as the result of selectively attending to spatial locations. Perception & Psychophysics, 28(3), 241–248. https://doi.org/10.3758/bf03204380
Bergen, J. R., & Julesz, B. (1983–29C.E.). Parallel versus serial processing in rapid pattern discrimination. Nature, 303(5919), 696–698. https://doi.org/10.1038/303696a0
Borji, A. (2021). Saliency Prediction in the Deep Learning Era: Successes and Limitations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2), 679–700. https://doi.org/10.1109/TPAMI.2019.2935715
Borji, A. (2019, May 24). Saliency Prediction in the Deep Learning Era: Successes, Limitations, and Future Challenges. http://arxiv.org/abs/1810.03716
Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 185–207. https://doi.org/10.1109/TPAMI.2012.89
Bruce, N. D. B., & Tsotsos, J. K. (2005). Saliency Based on Information Maximization. NIPS’05: Proceedings of the 18th International Conference on Neural Information Processing Systems, 155–162.
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., & Durand, F. (2017, April 6). What do different evaluation metrics tell us about saliency models? http://arxiv.org/abs/1604.03605
Campbell, F. W., & Robson, J. G. (1968). Application of Fourier analysis to the visibility of gratings. The Journal of Physiology, 197(3), 551–566. https://doi.org/10.1113/jphysiol.1968.sp008574
Cannon, M. W., & Fullenkamp, S. C. (1996). A model for inhibitory lateral interaction effects in perceived contrast. Vision Research, 36(8), 1115–1125. https://doi.org/10.1016/0042-6989(95)00180-8
Dahou Djilali, Y. A., McGuinness, K., & O’Connor, N. (2024). Learning Saliency From Fixations. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 382–392. https://doi.org/10.1109/WACV57701.2024.00045
Ding, G., İmamoğlu, N., Caglayan, A., Murakawa, M., & Nakamura, R. (2022). SalFBNet: Learning pseudo-saliency distribution via feedback convolutional networks. Image and Vision Computing, 120, 104395. https://doi.org/10.1016/j.imavis.2022.104395
Droste, R., Jiao, J., & Noble, J. A. (2020). Unified Image and Video Saliency Modeling. arXiv:2003.05477 [Cs], 12350, 419–435. https://doi.org/10.1007/978-3-030-58558-7_25
Engel, F. L. (1971). Visual conspicuity, directed attention and retinal locus. Vision Research, 11(6), 563–576. https://doi.org/10.1016/0042-6989(71)90077-0
Engel, F. L. (1974). Visual conspicuity and selective background interference in eccentric vision. Vision Research, 14(7), 459–471. https://doi.org/10.1016/0042-6989(74)90034-0
Engel, S., Zhang, X., & Wandell, B. (1997). Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature, 388(6637), 68–71. https://doi.org/10.1038/40398
Eriksen, C. W., & Hoffman, J. E. (1972). Temporal and spatial characteristics of selective encoding from visual displays. Perception & Psychophysics, 12(2), 201–204. https://doi.org/10.3758/bf03212870
Feldman, J. A. (1982). Dynamic connections in neural networks. Biological Cybernetics, 46(1), 27–39. https://doi.org/10.1007/BF00335349
Geiger, G., & Lettvin, J. Y. (1986). Enhancing the Perception of Form in Peripheral Vision. Perception, 15(2), 119–130. https://doi.org/10.1068/p150119
Gitman, Y., Erofeev, M., Vatolin, D., Bolshakov, A., & Fedorov, A. (2014). Semiautomatic visual-attention modeling and its application to video compression. 2014 IEEE International Conference on Image Processing (ICIP), 1105–1109. https://doi.org/10.1109/ICIP.2014.7025220
Hayhoe, M., & Ballard, D. (2005). Eye movements in natural behavior. Trends in Cognitive Sciences, 9(4), 188–194. https://doi.org/10.1016/j.tics.2005.02.009
He, K., Zhang, X., Ren, S., & Sun, J. (2015, December 10). Deep Residual Learning for Image Recognition. https://doi.org/10.48550/arXiv.1512.03385
Helmholtz, H. von. (1896). Handhuch der Physiologischen Optik (Zweite umgearbeitete Auflage). Verlag von Leopold Voss.
Huang, G., Liu, Z., Maaten, L. van der, & Weinberger, K. Q. (2018, January 28). Densely Connected Convolutional Networks. https://doi.org/10.48550/arXiv.1608.06993
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. https://doi.org/10.1109/34.730558
Jampani, V., Ujjwal, Sivaswamy, J., & Vaidya, V. (2012). Assessment of computational visual attention models on medical images. Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP ’12, 1–8. https://doi.org/10.1145/2425333.2425413
Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. 2009 IEEE 12th International Conference on Computer Vision, 2106–2113. https://doi.org/10.1109/ICCV.2009.5459462
Julesz, B. (1986). Texton gradients: The texton theory revisited. Biological Cybernetics, 54(4-5), 245–251. https://doi.org/10.1007/BF00318420
Julesz, B. (1984). A brief outline of the texton theory of human vision. Trends in Neurosciences, 7(2), 41–45. https://doi.org/10.1016/s0166-2236(84)80275-1
Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4(4), 219–227.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25. https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
Kümmerer, M., Bethge, M., & Wallis, T. S. A. (2022). DeepGaze III: Modeling free-viewing human scanpaths with deep learning. Journal of Vision, 22(5), 7. https://doi.org/10.1167/jov.22.5.7
Kümmerer, M., Theis, L., & Bethge, M. (2015, April 9). Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet. http://arxiv.org/abs/1411.1045
Kümmerer, M., Wallis, T. S. A., & Bethge, M. (2022). Using the DeepGaze III model to decompose spatial and dynamic contributions to fixation placement over time. Journal of Vision, 22(14), 3964. https://doi.org/10.1167/jov.22.14.3964
Kümmerer, M., Wallis, T. S. A., Gatys, L. A., & Bethge, M. (2017). Understanding low- and high-level contributions to fixation prediction. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 4789–4798.
Linardos, A., Kümmerer, M., Press, O., & Bethge, M. (2021). DeepGaze IIE: Calibrated Prediction in and Out-of-Domain for State-of-the-Art Saliency Modeling. Proceedings of the IEEE/CVF International Conference on Computer Vision, 12919–12928. https://openaccess.thecvf.com/content/ICCV2021/html/Linardos_DeepGaze_IIE_Calibrated_Prediction_in_and_Out-of-Domain_for_State-of-the-Art_Saliency_ICCV_2021_paper.html
Lou, J., Lin, H., Marshall, D., Saupe, D., & Liu, H. (2022). TranSalNet: Towards perceptually relevant visual saliency prediction. Neurocomputing, 494, 455–467. https://doi.org/10.1016/j.neucom.2022.04.080
Lyudvichenko, V., Erofeev, M., Gitman, Y., & Vatolin, D. (2017). A semiautomatic saliency model and its application to video compression. 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), 403–410. https://doi.org/10.1109/ICCP.2017.8117038
Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos. International Journal of Computer Vision, 82(3), 231–243. https://doi.org/10.1007/s11263-009-0215-3
McCallum, R. (1996). Reinforcement learning with selective perception and hidden state: PhD thesis. University of Rochester.
Medioni, G., & Mordohai, P. (2005). Saliency in Computer Vision. In L. Itti, G. Rees, & J. K. Tsotsos (Eds.), Neurobiology of Attention (pp. 583–585). Academic Press. https://doi.org/10.1016/B978-012375731-9/50099-9
Milanese, R. (1993). Detecting Salient Regions in an Image: From Biological Evidence to Computer Implementation: PhD thesis. Univ. Geneva.
Posner, M. I. (1980). Orienting of attention. The Quarterly Journal of Experimental Psychology, 32(1), 3–25. https://doi.org/10.1080/00335558008248231
Posner, M. I., & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D. G. Bouwhuis (Eds.), Attention and Performance (Vol. 10, pp. 531–556). Erlbaum.
Posner, M. I., Cohen, Y., & Rafal, R. D. (1982). Neural systems control of spatial orienting. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 298(1089), 187–198. https://doi.org/10.1098/rstb.1982.0081
Rao, R. P. N., Zelinsky, G. J., Hayhoe, M. M., & Ballard, D. H. (2002). Eye movements in iconic visual search. Vision Research, 42(11), 1447–1463. https://doi.org/10.1016/s0042-6989(02)00040-8
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59–108. https://doi.org/10.1037/0033-295x.85.2.59
Rayner, K. (2009). The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62(8), 1457–1506. https://doi.org/10.1080/17470210902816461
Remington, R., & Pierce, L. (1984). Moving attention: Evidence for time-invariant shifts of visual selective attention. Perception & Psychophysics, 35(4), 393–399. https://doi.org/10.3758/bf03206344
Saarinen, J., & Julesz, B. (1991). The speed of attentional shifts in the visual field. Proceedings of the National Academy of Sciences of the United States of America, 88(5), 1812–1814. https://doi.org/10.1073/pnas.88.5.1812
Serences, J. T., & Yantis, S. (2006). Selective visual attention and perceptual coherence. Trends in Cognitive Sciences, 10(1), 38–45. https://doi.org/10.1016/j.tics.2005.11.008
Simonyan, K., & Zisserman, A. (2015, April 10). Very Deep Convolutional Networks for Large-Scale Image Recognition. http://arxiv.org/abs/1409.1556
Sun, X., Houssin, R., Renaud, J., & Gardoni, M. (2019). A review of methodologies for integrating human factors and ergonomics in engineering design. International Journal of Production Research, 57(15-16), 4961–4976. https://doi.org/10.1080/00207543.2018.1492161
Tan, M., & Le, Q. V. (2020, September 11). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. https://doi.org/10.48550/arXiv.1905.11946
Theeuwes, J. (2013). Feature-based attention: It is all bottom-up priming. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1628), 20130055. https://doi.org/10.1098/rstb.2013.0055
Treisman, A. M. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology. Human Perception and Performance, 8(2), 194–214. https://doi.org/10.1037//0096-1523.8.2.194
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136. https://doi.org/10.1016/0010-0285(80)90005-5
Tsotsos, J. K., Culhane, S. M., Kei Wai, W. Y., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78(1–2), 507–545. https://doi.org/10.1016/0004-3702(95)00025-9
Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352. https://doi.org/10.1037/0033-295x.84.4.327
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023, August 2). Attention Is All You Need. http://arxiv.org/abs/1706.03762
Wang, W., Shen, J., Xie, J., Cheng, M.-M., Ling, H., & Borji, A. (2021). Revisiting Video Saliency Prediction in the Deep Learning Era. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 220–237. https://doi.org/10.1109/TPAMI.2019.2924417
Wilson, H. R., & Bergen, J. R. (1979). A four mechanism model for threshold spatial vision. Vision Research, 19(1), 19–32. https://doi.org/10.1016/0042-6989(79)90117-2
Wolfe, J. M. (2012). Saved by a Log: How Do Humans Perform Hybrid Visual and Memory Search? Psychological Science, 23(7), 698–703. https://doi.org/10.1177/0956797612443968
Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060–1092. https://doi.org/10.3758/s13423-020-01859-9
Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology. Human Perception and Performance, 15(3), 419–433. https://doi.org/10.1037//0096-1523.15.3.419

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2025 Russian Psychological Journal