Visual Saliency: From Theoretical Assumptions to Modern High-Performance Models
PDF
PDF (Russian)

Keywords

visual system
attention
eye movements
visual search
saliency
eye tracking
computer vision
modeling

Abstract

Introduction. Visual saliency refers to the perceptual quality of location in a visual scene, which manifests itself subjectively in its attractiveness to the observer and objectively in the probability of shifting attention or fixating eye movements on it. This quality arises from the integration of visual feature maps and is modulated by several central mechanisms. It is important to distinguish between the terms saliency and conspicuity; in a theoretical context, these are not the same. This review, for the first time, combines the results of computer modeling of visual saliency with a detailed discussion of the theoretical background for creating such models. The theory of feature integration proposed by A. M. Treisman is examined, along with its advantages and limitations, which provided the way for the three-level model of visual attention developed by C. Koch and S. Ullman. According to this theory, focal attention is governed by a “winner-takes-all” mechanism, which relies on a saliency map encoding the attractiveness of each fragment of the visual scene. The original theory did not describe how the saliency map is formed, and this question remains the focus of research using computer modeling. Results and Discussion. The results of studies on modeling visual saliency are reviewed. In particular, the early computational model by L. Itti, C. Koch, and E. Niebur, which laid the foundation for many subsequent developments, is described in detail. Approaches to modeling that preceded the advent of modern high-performance neural networks are examined, and a range of contemporary models based on deep learning technologies is presented, together with their characteristic properties. This is the first comprehensive review of saliency models published in Russian. Researchers have developed several models of practical utility, and the paper discusses their potential for real-world application.

https://doi.org/10.21702/rpj.2025.3.11
PDF
PDF (Russian)

References

Ангельгардт, А. Н., Макаров, И. М., & Горбунова, Е. С. (2021). Роль уровня категории при решении задачи гибридного зрительного поиска. Вопросы психологии, 2, 148–158. https://www.elibrary.ru/item.asp?id=46548586

Величковский, Б. М. (2006). Когнитивная наука. Основы психологии познания (Т. 1). Смысл, Издательский центр «Академия».

Воронин, И. А., Захаров, И. М., Табуева, А. О., & Мерзон, Л. А. (2020). Диффузная модель принятия решения: оценка скорости и точности ответов в задачах выбора из двух альтернатив в исследованиях когнитивных процессов и способностей. Теоретическая и экспериментальная психология, 13(2), 6–24. https://www.elibrary.ru/item.asp?id=48627007

Горбунова, Е. С. (2023). Механизмы построения репрезентации в категориальном поиске: роль внимания и рабочей памяти. Российский психологический журнал, 20(3), 116–130. https://doi.org/10.21702/rpj.2023.3.6

Гусев, А. Н., & Уточкин, И. С. (2012). Влияние вероятности подсказки на эффективность пространственной локализации зрительного стимула. Известия Иркутского государственного университета. Серия: Психология, 1(1), 34–39. https://elibrary.ru/item.asp?id=18040183

Кочурко, В. А., Мадани, К., Сабуран, К., Головко, В. А., & Кочурко, П. А. (2015). Обнаружение объектов системами компьютерного зрения: подход на основе визуальной салиентности. Доклады Белорусского государственного университета информатики и радиоэлектроники, 91(5), 47–53. https://www.elibrary.ru/item.asp?id=29737620

Крускоп, А. С., Лунякова, Е. Г., Дубровский, В. Е., & Гарусев, А. В. (2023). Особенности движений глаз в задаче зрительного поиска в зависимости от вербализуемости и симметричности стимулов. Вестник Московского Университета. Серия 14: Психология, 46(4), 88–111. https://doi.org/10.11621/LPJ-23-40

Мартынова, О. В., & Балаев, В. В. (2015). Возрастные изменения в функциональной связанности сетей состояния покоя. Психология. Журнал Высшей школы экономики, 12(4), 33–47. https://psy-journal.hse.ru/2015-12-4/167444789.html

Мински, М., & Пейперт, С. (1971). Персептроны. Мир.

Подладчикова, Л. Н., Самарин, А. И., Шапошников, Д. Г., Колтунова, Т. И., Петрушан, М. В., & Ломакина, О. В. (2017). Современные представления о механизмах зрительного внимания. Южный федеральный университет. https://www.elibrary.ru/item.asp?id=32259454

Рожкова, Г. И., Белокопытов, А. В., & Иомдина, Е. Н. (2019). Современные представления о специфике периферического зрения человека. Сенсорные системы, 33(4), 305–330. https://doi.org/10.1134/S0235009219040073

Сапронов, Ф. А., & Горбунова, Е. С. (2025). Сравнение сгенерированных ИИ стимулов и фото: исследование зрительного поиска. Вестник Московского Университета. Серия 14: Психология, 48(2), 109–131. https://doi.org/10.11621/LPJ-25-14

Сапронов, Ф. А., Макаров, И. М., & Горбунова, Е. С. (2023). Категоризация в гибридном поиске: исследование с использованием регистрации движений глаз. Экспериментальная психология, 16(3), 121–138. https://doi.org/10.17759/exppsy.2023160308

Сеченов, И. М. (1942). Рефлексы головного мозга. Издательство АН СССР.

Уточкин, И. С., & Фаликман, М. В. (2006). Торможение возврата внимания. Часть 1. Виды и свойства. Психологический журнал, 27(3), 42–48. https://elibrary.ru/item.asp?id=9212401

Фаликман, М. В. (2015). Структура и динамика зрительного внимания при решении перцептивных задач: конструктивно-деятельностный подход: дис. ... докт. психол. наук: 19.00.01 [МГУ имени М. В. Ломоносова]. https://www.elibrary.ru/item.asp?id=54440859

Фаликман, М. В., Уточкин, И. С., Марков, Ю. А., & Тюрина, Н. А. (2019). Нисходящая регуляция зрительного поиска: есть ли она у детей? 513–517. https://www.elibrary.ru/item.asp?id=40843935

Хохлова, Т. В. (2012). Современные представления о зрении млекопитающих. Журнал общей биологии, 73(6), 418–434. https://elibrary.ru/item.asp?id=18121671

Шевель, Т. М., & Фаликман, М. В. (2022). «Подсказка взглядом» как ключ к механизмам совместного внимания:основные результаты исследований. Культурно-историческая психология, 18(1), 6–16. https://doi.org/10.17759/chp.2022180101

Шолле, Ф. (2023). Глубокое обучение на Python (2nd ed.). Питер.

Ярбус, А. Л. (1965). Роль движений глаз в процессе зрения (Н. Д. Нюберг, Ed.). Наука.

Arun, N. T., Gaw, N., Singh, P., Chang, K., Hoebel, K. V., Patel, J., Gidwani, M., & Kalpathy-Cramer, J. (2020, May 29). Assessing the validity of saliency maps for abnormality localization in medical imaging. http://arxiv.org/abs/2006.00063

Ballard, D. H., Hayhoe, M. M., & Pelz, J. B. (1995). Memory Representations in Natural Tasks. Journal of Cognitive Neuroscience, 7(1), 66–80. https://doi.org/10.1162/jocn.1995.7.1.66

Ballard, D. H., Hayhoe, M. M., Pook, P. K., & Rao, R. P. N. (1997). Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences, 20(4), 723–742. https://doi.org/10.1017/s0140525x97001611

Baluja, S., & Pomerleau, D. (1994). Using a Saliency Map for Active Spatial Selective Attention: Implementation & Initial Results. Proc. Advances in Neural Information Processing Systems, 451–458.

Bashinski, H. S., & Bacharach, V. R. (1980). Enhancement of perceptual sensitivity as the result of selectively attending to spatial locations. Perception & Psychophysics, 28(3), 241–248. https://doi.org/10.3758/bf03204380

Bergen, J. R., & Julesz, B. (1983–29C.E.). Parallel versus serial processing in rapid pattern discrimination. Nature, 303(5919), 696–698. https://doi.org/10.1038/303696a0

Borji, A. (2021). Saliency Prediction in the Deep Learning Era: Successes and Limitations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2), 679–700. https://doi.org/10.1109/TPAMI.2019.2935715

Borji, A. (2019, May 24). Saliency Prediction in the Deep Learning Era: Successes, Limitations, and Future Challenges. http://arxiv.org/abs/1810.03716

Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 185–207. https://doi.org/10.1109/TPAMI.2012.89

Bruce, N. D. B., & Tsotsos, J. K. (2005). Saliency Based on Information Maximization. NIPS’05: Proceedings of the 18th International Conference on Neural Information Processing Systems, 155–162.

Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., & Durand, F. (2017, April 6). What do different evaluation metrics tell us about saliency models? http://arxiv.org/abs/1604.03605

Campbell, F. W., & Robson, J. G. (1968). Application of Fourier analysis to the visibility of gratings. The Journal of Physiology, 197(3), 551–566. https://doi.org/10.1113/jphysiol.1968.sp008574

Cannon, M. W., & Fullenkamp, S. C. (1996). A model for inhibitory lateral interaction effects in perceived contrast. Vision Research, 36(8), 1115–1125. https://doi.org/10.1016/0042-6989(95)00180-8

Dahou Djilali, Y. A., McGuinness, K., & O’Connor, N. (2024). Learning Saliency From Fixations. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 382–392. https://doi.org/10.1109/WACV57701.2024.00045

Ding, G., İmamoğlu, N., Caglayan, A., Murakawa, M., & Nakamura, R. (2022). SalFBNet: Learning pseudo-saliency distribution via feedback convolutional networks. Image and Vision Computing, 120, 104395. https://doi.org/10.1016/j.imavis.2022.104395

Droste, R., Jiao, J., & Noble, J. A. (2020). Unified Image and Video Saliency Modeling. arXiv:2003.05477 [Cs], 12350, 419–435. https://doi.org/10.1007/978-3-030-58558-7_25

Engel, F. L. (1971). Visual conspicuity, directed attention and retinal locus. Vision Research, 11(6), 563–576. https://doi.org/10.1016/0042-6989(71)90077-0

Engel, F. L. (1974). Visual conspicuity and selective background interference in eccentric vision. Vision Research, 14(7), 459–471. https://doi.org/10.1016/0042-6989(74)90034-0

Engel, S., Zhang, X., & Wandell, B. (1997). Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature, 388(6637), 68–71. https://doi.org/10.1038/40398

Eriksen, C. W., & Hoffman, J. E. (1972). Temporal and spatial characteristics of selective encoding from visual displays. Perception & Psychophysics, 12(2), 201–204. https://doi.org/10.3758/bf03212870

Feldman, J. A. (1982). Dynamic connections in neural networks. Biological Cybernetics, 46(1), 27–39. https://doi.org/10.1007/BF00335349

Geiger, G., & Lettvin, J. Y. (1986). Enhancing the Perception of Form in Peripheral Vision. Perception, 15(2), 119–130. https://doi.org/10.1068/p150119

Gitman, Y., Erofeev, M., Vatolin, D., Bolshakov, A., & Fedorov, A. (2014). Semiautomatic visual-attention modeling and its application to video compression. 2014 IEEE International Conference on Image Processing (ICIP), 1105–1109. https://doi.org/10.1109/ICIP.2014.7025220

Hayhoe, M., & Ballard, D. (2005). Eye movements in natural behavior. Trends in Cognitive Sciences, 9(4), 188–194. https://doi.org/10.1016/j.tics.2005.02.009

He, K., Zhang, X., Ren, S., & Sun, J. (2015, December 10). Deep Residual Learning for Image Recognition. https://doi.org/10.48550/arXiv.1512.03385

Helmholtz, H. von. (1896). Handhuch der Physiologischen Optik (Zweite umgearbeitete Auflage). Verlag von Leopold Voss.

Huang, G., Liu, Z., Maaten, L. van der, & Weinberger, K. Q. (2018, January 28). Densely Connected Convolutional Networks. https://doi.org/10.48550/arXiv.1608.06993

Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. https://doi.org/10.1109/34.730558

Jampani, V., Ujjwal, Sivaswamy, J., & Vaidya, V. (2012). Assessment of computational visual attention models on medical images. Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP ’12, 1–8. https://doi.org/10.1145/2425333.2425413

Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. 2009 IEEE 12th International Conference on Computer Vision, 2106–2113. https://doi.org/10.1109/ICCV.2009.5459462

Julesz, B. (1986). Texton gradients: The texton theory revisited. Biological Cybernetics, 54(4-5), 245–251. https://doi.org/10.1007/BF00318420

Julesz, B. (1984). A brief outline of the texton theory of human vision. Trends in Neurosciences, 7(2), 41–45. https://doi.org/10.1016/s0166-2236(84)80275-1

Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4(4), 219–227.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25. https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html

Kümmerer, M., Bethge, M., & Wallis, T. S. A. (2022). DeepGaze III: Modeling free-viewing human scanpaths with deep learning. Journal of Vision, 22(5), 7. https://doi.org/10.1167/jov.22.5.7

Kümmerer, M., Theis, L., & Bethge, M. (2015, April 9). Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet. http://arxiv.org/abs/1411.1045

Kümmerer, M., Wallis, T. S. A., & Bethge, M. (2022). Using the DeepGaze III model to decompose spatial and dynamic contributions to fixation placement over time. Journal of Vision, 22(14), 3964. https://doi.org/10.1167/jov.22.14.3964

Kümmerer, M., Wallis, T. S. A., Gatys, L. A., & Bethge, M. (2017). Understanding low- and high-level contributions to fixation prediction. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 4789–4798.

Linardos, A., Kümmerer, M., Press, O., & Bethge, M. (2021). DeepGaze IIE: Calibrated Prediction in and Out-of-Domain for State-of-the-Art Saliency Modeling. Proceedings of the IEEE/CVF International Conference on Computer Vision, 12919–12928. https://openaccess.thecvf.com/content/ICCV2021/html/Linardos_DeepGaze_IIE_Calibrated_Prediction_in_and_Out-of-Domain_for_State-of-the-Art_Saliency_ICCV_2021_paper.html

Lou, J., Lin, H., Marshall, D., Saupe, D., & Liu, H. (2022). TranSalNet: Towards perceptually relevant visual saliency prediction. Neurocomputing, 494, 455–467. https://doi.org/10.1016/j.neucom.2022.04.080

Lyudvichenko, V., Erofeev, M., Gitman, Y., & Vatolin, D. (2017). A semiautomatic saliency model and its application to video compression. 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), 403–410. https://doi.org/10.1109/ICCP.2017.8117038

Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos. International Journal of Computer Vision, 82(3), 231–243. https://doi.org/10.1007/s11263-009-0215-3

McCallum, R. (1996). Reinforcement learning with selective perception and hidden state: PhD thesis. University of Rochester.

Medioni, G., & Mordohai, P. (2005). Saliency in Computer Vision. In L. Itti, G. Rees, & J. K. Tsotsos (Eds.), Neurobiology of Attention (pp. 583–585). Academic Press. https://doi.org/10.1016/B978-012375731-9/50099-9

Milanese, R. (1993). Detecting Salient Regions in an Image: From Biological Evidence to Computer Implementation: PhD thesis. Univ. Geneva.

Posner, M. I. (1980). Orienting of attention. The Quarterly Journal of Experimental Psychology, 32(1), 3–25. https://doi.org/10.1080/00335558008248231

Posner, M. I., & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D. G. Bouwhuis (Eds.), Attention and Performance (Vol. 10, pp. 531–556). Erlbaum.

Posner, M. I., Cohen, Y., & Rafal, R. D. (1982). Neural systems control of spatial orienting. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 298(1089), 187–198. https://doi.org/10.1098/rstb.1982.0081

Rao, R. P. N., Zelinsky, G. J., Hayhoe, M. M., & Ballard, D. H. (2002). Eye movements in iconic visual search. Vision Research, 42(11), 1447–1463. https://doi.org/10.1016/s0042-6989(02)00040-8

Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59–108. https://doi.org/10.1037/0033-295x.85.2.59

Rayner, K. (2009). The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62(8), 1457–1506. https://doi.org/10.1080/17470210902816461

Remington, R., & Pierce, L. (1984). Moving attention: Evidence for time-invariant shifts of visual selective attention. Perception & Psychophysics, 35(4), 393–399. https://doi.org/10.3758/bf03206344

Saarinen, J., & Julesz, B. (1991). The speed of attentional shifts in the visual field. Proceedings of the National Academy of Sciences of the United States of America, 88(5), 1812–1814. https://doi.org/10.1073/pnas.88.5.1812

Serences, J. T., & Yantis, S. (2006). Selective visual attention and perceptual coherence. Trends in Cognitive Sciences, 10(1), 38–45. https://doi.org/10.1016/j.tics.2005.11.008

Simonyan, K., & Zisserman, A. (2015, April 10). Very Deep Convolutional Networks for Large-Scale Image Recognition. http://arxiv.org/abs/1409.1556

Sun, X., Houssin, R., Renaud, J., & Gardoni, M. (2019). A review of methodologies for integrating human factors and ergonomics in engineering design. International Journal of Production Research, 57(15-16), 4961–4976. https://doi.org/10.1080/00207543.2018.1492161

Tan, M., & Le, Q. V. (2020, September 11). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. https://doi.org/10.48550/arXiv.1905.11946

Theeuwes, J. (2013). Feature-based attention: It is all bottom-up priming. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1628), 20130055. https://doi.org/10.1098/rstb.2013.0055

Treisman, A. M. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology. Human Perception and Performance, 8(2), 194–214. https://doi.org/10.1037//0096-1523.8.2.194

Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136. https://doi.org/10.1016/0010-0285(80)90005-5

Tsotsos, J. K., Culhane, S. M., Kei Wai, W. Y., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78(1–2), 507–545. https://doi.org/10.1016/0004-3702(95)00025-9

Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352. https://doi.org/10.1037/0033-295x.84.4.327

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023, August 2). Attention Is All You Need. http://arxiv.org/abs/1706.03762

Wang, W., Shen, J., Xie, J., Cheng, M.-M., Ling, H., & Borji, A. (2021). Revisiting Video Saliency Prediction in the Deep Learning Era. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 220–237. https://doi.org/10.1109/TPAMI.2019.2924417

Wilson, H. R., & Bergen, J. R. (1979). A four mechanism model for threshold spatial vision. Vision Research, 19(1), 19–32. https://doi.org/10.1016/0042-6989(79)90117-2

Wolfe, J. M. (2012). Saved by a Log: How Do Humans Perform Hybrid Visual and Memory Search? Psychological Science, 23(7), 698–703. https://doi.org/10.1177/0956797612443968

Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060–1092. https://doi.org/10.3758/s13423-020-01859-9

Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology. Human Perception and Performance, 15(3), 419–433. https://doi.org/10.1037//0096-1523.15.3.419

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2025 Russian Psychological Journal