Зрительная салиентность: от теоретических предпосылок к современным высокопроизводительным моделям

Денис В. Явна

doi:10.21702/rpj.2025.3.11

Vol. 22 No. 3 (2025), Psychophysiology

Vol. 22 No. 3 (2025)

Зрительная салиентность: от теоретических предпосылок к современным высокопроизводительным моделям

Денис В. Явна⁺⁻

Psychophysiology

https://doi.org/10.21702/rpj.2025.3.11

Published 2025-11-17

Денис В. Явна

Южный федеральный университет

PDF (Russian)

Keywords

зрительная система
внимание
движения глаз
зрительный поиск
салиентность
айтрекинг
компьютерное зрение
моделирование

Abstract

Введение. Зрительная салиентность – термин, обозначающий перцептивное качество фрагмента зрительной сцены, субъективно проявляющееся в его привлекательности для наблюдателя, а объективно описываемое вероятностью переключения на него фокуса внимания и/или глазодвигательной фиксации на нём. Это качество первоначально возникает благодаря работе механизма интеграции карт зрительных признаков и модулируется рядом центральных механизмов. Важно различать термины «салиентность» и «заметность» – в теоретическом контексте это не одно и то же. Теоретическое обоснование. Впервые в формате обзора вместе с результатами компьютерного моделирования зрительной салиентости подробно представлены теоретические предпосылки создания таких моделей. Подробно рассматривается теория интеграции признаков А. М. Трейсман, её достоинства и ограничения, благодаря которым возникла трёхуровневая модель зрительного внимания К. Коха и Ш. Улльмана. Согласно ей, управление переключениями фокальным вниманием осуществляется специальным механизмом («победитель получает всё») на основании данных, хранящихся в карте салиентности, кодирующей степень привлекательности каждого фрагмента зрительной сцены. Механизм формирования карты салиентности не был описан создателями теории и является предметом исследований, которые проводятся методом компьютерного моделирования. Обсуждение результатов. Рассматриваются результаты работ по моделированию зрительной салиентности. Подробно описывается ранняя компьютерная модель Л. Итти, К. Коха и Э. Нейбура, заложившая основы множества последующих разработок. Раскрываются особенности подходов к моделированию, возникших до появления высокопроизводительных нейросетевых моделей. Описывается ряд современных высокопроизводительных моделей, основанных на технологиях сетей глубокого обучения, перечисляются их характерные особенности. Обзор моделей салиентности на русском языке делается впервые. Заключение. К настоящему времени созданы модели, имеющие практическую ценность. Обсуждаются возможности практического использования моделей зрительной салиентности и возможные перспективы их применения в психологических исследованиях.

https://doi.org/10.21702/rpj.2025.3.11

PDF (Russian)

References

Ангельгардт, А. Н., Макаров, И. М., & Горбунова, Е. С. (2021). Роль уровня категории при решении задачи гибридного зрительного поиска. Вопросы психологии, 2, 148–158. https://www.elibrary.ru/item.asp?id=46548586

Величковский, Б. М. (2006). Когнитивная наука. Основы психологии познания (Т. 1). Смысл, Издательский центр «Академия».

Воронин, И. А., Захаров, И. М., Табуева, А. О., & Мерзон, Л. А. (2020). Диффузная модель принятия решения: оценка скорости и точности ответов в задачах выбора из двух альтернатив в исследованиях когнитивных процессов и способностей. Теоретическая и экспериментальная психология, 13(2), 6–24. https://www.elibrary.ru/item.asp?id=48627007

Горбунова, Е. С. (2023). Механизмы построения репрезентации в категориальном поиске: роль внимания и рабочей памяти. Российский психологический журнал, 20(3), 116–130. https://doi.org/10.21702/rpj.2023.3.6

Гусев, А. Н., & Уточкин, И. С. (2012). Влияние вероятности подсказки на эффективность пространственной локализации зрительного стимула. Известия Иркутского государственного университета. Серия: Психология, 1(1), 34–39. https://elibrary.ru/item.asp?id=18040183

Кочурко, В. А., Мадани, К., Сабуран, К., Головко, В. А., & Кочурко, П. А. (2015). Обнаружение объектов системами компьютерного зрения: подход на основе визуальной салиентности. Доклады Белорусского государственного университета информатики и радиоэлектроники, 91(5), 47–53. https://www.elibrary.ru/item.asp?id=29737620

Крускоп, А. С., Лунякова, Е. Г., Дубровский, В. Е., & Гарусев, А. В. (2023). Особенности движений глаз в задаче зрительного поиска в зависимости от вербализуемости и симметричности стимулов. Вестник Московского Университета. Серия 14: Психология, 46(4), 88–111. https://doi.org/10.11621/LPJ-23-40

Мартынова, О. В., & Балаев, В. В. (2015). Возрастные изменения в функциональной связанности сетей состояния покоя. Психология. Журнал Высшей школы экономики, 12(4), 33–47. https://psy-journal.hse.ru/2015-12-4/167444789.html

Мински, М., & Пейперт, С. (1971). Персептроны. Мир.

Подладчикова, Л. Н., Самарин, А. И., Шапошников, Д. Г., Колтунова, Т. И., Петрушан, М. В., & Ломакина, О. В. (2017). Современные представления о механизмах зрительного внимания. Южный федеральный университет. https://www.elibrary.ru/item.asp?id=32259454

Рожкова, Г. И., Белокопытов, А. В., & Иомдина, Е. Н. (2019). Современные представления о специфике периферического зрения человека. Сенсорные системы, 33(4), 305–330. https://doi.org/10.1134/S0235009219040073

Сапронов, Ф. А., & Горбунова, Е. С. (2025). Сравнение сгенерированных ИИ стимулов и фото: исследование зрительного поиска. Вестник Московского Университета. Серия 14: Психология, 48(2), 109–131. https://doi.org/10.11621/LPJ-25-14

Сапронов, Ф. А., Макаров, И. М., & Горбунова, Е. С. (2023). Категоризация в гибридном поиске: исследование с использованием регистрации движений глаз. Экспериментальная психология, 16(3), 121–138. https://doi.org/10.17759/exppsy.2023160308

Сеченов, И. М. (1942). Рефлексы головного мозга. Издательство АН СССР.

Уточкин, И. С., & Фаликман, М. В. (2006). Торможение возврата внимания. Часть 1. Виды и свойства. Психологический журнал, 27(3), 42–48. https://elibrary.ru/item.asp?id=9212401

Фаликман, М. В. (2015). Структура и динамика зрительного внимания при решении перцептивных задач: конструктивно-деятельностный подход: дис. ... докт. психол. наук: 19.00.01 [МГУ имени М. В. Ломоносова]. https://www.elibrary.ru/item.asp?id=54440859

Фаликман, М. В., Уточкин, И. С., Марков, Ю. А., & Тюрина, Н. А. (2019). Нисходящая регуляция зрительного поиска: есть ли она у детей? 513–517. https://www.elibrary.ru/item.asp?id=40843935

Хохлова, Т. В. (2012). Современные представления о зрении млекопитающих. Журнал общей биологии, 73(6), 418–434. https://elibrary.ru/item.asp?id=18121671

Шевель, Т. М., & Фаликман, М. В. (2022). «Подсказка взглядом» как ключ к механизмам совместного внимания:основные результаты исследований. Культурно-историческая психология, 18(1), 6–16. https://doi.org/10.17759/chp.2022180101

Шолле, Ф. (2023). Глубокое обучение на Python (2nd ed.). Питер.

Ярбус, А. Л. (1965). Роль движений глаз в процессе зрения (Н. Д. Нюберг, Ed.). Наука.

Arun, N. T., Gaw, N., Singh, P., Chang, K., Hoebel, K. V., Patel, J., Gidwani, M., & Kalpathy-Cramer, J. (2020, May 29). Assessing the validity of saliency maps for abnormality localization in medical imaging. http://arxiv.org/abs/2006.00063

Ballard, D. H., Hayhoe, M. M., & Pelz, J. B. (1995). Memory Representations in Natural Tasks. Journal of Cognitive Neuroscience, 7(1), 66–80. https://doi.org/10.1162/jocn.1995.7.1.66

Ballard, D. H., Hayhoe, M. M., Pook, P. K., & Rao, R. P. N. (1997). Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences, 20(4), 723–742. https://doi.org/10.1017/s0140525x97001611

Baluja, S., & Pomerleau, D. (1994). Using a Saliency Map for Active Spatial Selective Attention: Implementation & Initial Results. Proc. Advances in Neural Information Processing Systems, 451–458.

Bashinski, H. S., & Bacharach, V. R. (1980). Enhancement of perceptual sensitivity as the result of selectively attending to spatial locations. Perception & Psychophysics, 28(3), 241–248. https://doi.org/10.3758/bf03204380

Bergen, J. R., & Julesz, B. (1983–29C.E.). Parallel versus serial processing in rapid pattern discrimination. Nature, 303(5919), 696–698. https://doi.org/10.1038/303696a0

Borji, A. (2021). Saliency Prediction in the Deep Learning Era: Successes and Limitations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2), 679–700. https://doi.org/10.1109/TPAMI.2019.2935715

Borji, A. (2019, May 24). Saliency Prediction in the Deep Learning Era: Successes, Limitations, and Future Challenges. http://arxiv.org/abs/1810.03716

Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 185–207. https://doi.org/10.1109/TPAMI.2012.89

Bruce, N. D. B., & Tsotsos, J. K. (2005). Saliency Based on Information Maximization. NIPS’05: Proceedings of the 18th International Conference on Neural Information Processing Systems, 155–162.

Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., & Durand, F. (2017, April 6). What do different evaluation metrics tell us about saliency models? http://arxiv.org/abs/1604.03605

Campbell, F. W., & Robson, J. G. (1968). Application of Fourier analysis to the visibility of gratings. The Journal of Physiology, 197(3), 551–566. https://doi.org/10.1113/jphysiol.1968.sp008574

Cannon, M. W., & Fullenkamp, S. C. (1996). A model for inhibitory lateral interaction effects in perceived contrast. Vision Research, 36(8), 1115–1125. https://doi.org/10.1016/0042-6989(95)00180-8

Dahou Djilali, Y. A., McGuinness, K., & O’Connor, N. (2024). Learning Saliency From Fixations. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 382–392. https://doi.org/10.1109/WACV57701.2024.00045

Ding, G., İmamoğlu, N., Caglayan, A., Murakawa, M., & Nakamura, R. (2022). SalFBNet: Learning pseudo-saliency distribution via feedback convolutional networks. Image and Vision Computing, 120, 104395. https://doi.org/10.1016/j.imavis.2022.104395

Droste, R., Jiao, J., & Noble, J. A. (2020). Unified Image and Video Saliency Modeling. arXiv:2003.05477 [Cs], 12350, 419–435. https://doi.org/10.1007/978-3-030-58558-7_25

Engel, F. L. (1971). Visual conspicuity, directed attention and retinal locus. Vision Research, 11(6), 563–576. https://doi.org/10.1016/0042-6989(71)90077-0

Engel, F. L. (1974). Visual conspicuity and selective background interference in eccentric vision. Vision Research, 14(7), 459–471. https://doi.org/10.1016/0042-6989(74)90034-0

Engel, S., Zhang, X., & Wandell, B. (1997). Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature, 388(6637), 68–71. https://doi.org/10.1038/40398

Eriksen, C. W., & Hoffman, J. E. (1972). Temporal and spatial characteristics of selective encoding from visual displays. Perception & Psychophysics, 12(2), 201–204. https://doi.org/10.3758/bf03212870

Feldman, J. A. (1982). Dynamic connections in neural networks. Biological Cybernetics, 46(1), 27–39. https://doi.org/10.1007/BF00335349

Geiger, G., & Lettvin, J. Y. (1986). Enhancing the Perception of Form in Peripheral Vision. Perception, 15(2), 119–130. https://doi.org/10.1068/p150119

Gitman, Y., Erofeev, M., Vatolin, D., Bolshakov, A., & Fedorov, A. (2014). Semiautomatic visual-attention modeling and its application to video compression. 2014 IEEE International Conference on Image Processing (ICIP), 1105–1109. https://doi.org/10.1109/ICIP.2014.7025220

Hayhoe, M., & Ballard, D. (2005). Eye movements in natural behavior. Trends in Cognitive Sciences, 9(4), 188–194. https://doi.org/10.1016/j.tics.2005.02.009

He, K., Zhang, X., Ren, S., & Sun, J. (2015, December 10). Deep Residual Learning for Image Recognition. https://doi.org/10.48550/arXiv.1512.03385

Helmholtz, H. von. (1896). Handhuch der Physiologischen Optik (Zweite umgearbeitete Auflage). Verlag von Leopold Voss.

Huang, G., Liu, Z., Maaten, L. van der, & Weinberger, K. Q. (2018, January 28). Densely Connected Convolutional Networks. https://doi.org/10.48550/arXiv.1608.06993

Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. https://doi.org/10.1109/34.730558

Jampani, V., Ujjwal, Sivaswamy, J., & Vaidya, V. (2012). Assessment of computational visual attention models on medical images. Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP ’12, 1–8. https://doi.org/10.1145/2425333.2425413

Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. 2009 IEEE 12th International Conference on Computer Vision, 2106–2113. https://doi.org/10.1109/ICCV.2009.5459462

Julesz, B. (1986). Texton gradients: The texton theory revisited. Biological Cybernetics, 54(4-5), 245–251. https://doi.org/10.1007/BF00318420

Julesz, B. (1984). A brief outline of the texton theory of human vision. Trends in Neurosciences, 7(2), 41–45. https://doi.org/10.1016/s0166-2236(84)80275-1

Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4(4), 219–227.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25. https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html

Kümmerer, M., Bethge, M., & Wallis, T. S. A. (2022). DeepGaze III: Modeling free-viewing human scanpaths with deep learning. Journal of Vision, 22(5), 7. https://doi.org/10.1167/jov.22.5.7

Kümmerer, M., Theis, L., & Bethge, M. (2015, April 9). Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet. http://arxiv.org/abs/1411.1045

Kümmerer, M., Wallis, T. S. A., & Bethge, M. (2022). Using the DeepGaze III model to decompose spatial and dynamic contributions to fixation placement over time. Journal of Vision, 22(14), 3964. https://doi.org/10.1167/jov.22.14.3964

Kümmerer, M., Wallis, T. S. A., Gatys, L. A., & Bethge, M. (2017). Understanding low- and high-level contributions to fixation prediction. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 4789–4798.

Linardos, A., Kümmerer, M., Press, O., & Bethge, M. (2021). DeepGaze IIE: Calibrated Prediction in and Out-of-Domain for State-of-the-Art Saliency Modeling. Proceedings of the IEEE/CVF International Conference on Computer Vision, 12919–12928. https://openaccess.thecvf.com/content/ICCV2021/html/Linardos_DeepGaze_IIE_Calibrated_Prediction_in_and_Out-of-Domain_for_State-of-the-Art_Saliency_ICCV_2021_paper.html

Lou, J., Lin, H., Marshall, D., Saupe, D., & Liu, H. (2022). TranSalNet: Towards perceptually relevant visual saliency prediction. Neurocomputing, 494, 455–467. https://doi.org/10.1016/j.neucom.2022.04.080

Lyudvichenko, V., Erofeev, M., Gitman, Y., & Vatolin, D. (2017). A semiautomatic saliency model and its application to video compression. 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), 403–410. https://doi.org/10.1109/ICCP.2017.8117038

Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos. International Journal of Computer Vision, 82(3), 231–243. https://doi.org/10.1007/s11263-009-0215-3

McCallum, R. (1996). Reinforcement learning with selective perception and hidden state: PhD thesis. University of Rochester.

Medioni, G., & Mordohai, P. (2005). Saliency in Computer Vision. In L. Itti, G. Rees, & J. K. Tsotsos (Eds.), Neurobiology of Attention (pp. 583–585). Academic Press. https://doi.org/10.1016/B978-012375731-9/50099-9

Milanese, R. (1993). Detecting Salient Regions in an Image: From Biological Evidence to Computer Implementation: PhD thesis. Univ. Geneva.

Posner, M. I. (1980). Orienting of attention. The Quarterly Journal of Experimental Psychology, 32(1), 3–25. https://doi.org/10.1080/00335558008248231

Posner, M. I., & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D. G. Bouwhuis (Eds.), Attention and Performance (Vol. 10, pp. 531–556). Erlbaum.

Posner, M. I., Cohen, Y., & Rafal, R. D. (1982). Neural systems control of spatial orienting. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 298(1089), 187–198. https://doi.org/10.1098/rstb.1982.0081

Rao, R. P. N., Zelinsky, G. J., Hayhoe, M. M., & Ballard, D. H. (2002). Eye movements in iconic visual search. Vision Research, 42(11), 1447–1463. https://doi.org/10.1016/s0042-6989(02)00040-8

Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59–108. https://doi.org/10.1037/0033-295x.85.2.59

Rayner, K. (2009). The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62(8), 1457–1506. https://doi.org/10.1080/17470210902816461

Remington, R., & Pierce, L. (1984). Moving attention: Evidence for time-invariant shifts of visual selective attention. Perception & Psychophysics, 35(4), 393–399. https://doi.org/10.3758/bf03206344

Saarinen, J., & Julesz, B. (1991). The speed of attentional shifts in the visual field. Proceedings of the National Academy of Sciences of the United States of America, 88(5), 1812–1814. https://doi.org/10.1073/pnas.88.5.1812

Serences, J. T., & Yantis, S. (2006). Selective visual attention and perceptual coherence. Trends in Cognitive Sciences, 10(1), 38–45. https://doi.org/10.1016/j.tics.2005.11.008

Simonyan, K., & Zisserman, A. (2015, April 10). Very Deep Convolutional Networks for Large-Scale Image Recognition. http://arxiv.org/abs/1409.1556

Sun, X., Houssin, R., Renaud, J., & Gardoni, M. (2019). A review of methodologies for integrating human factors and ergonomics in engineering design. International Journal of Production Research, 57(15-16), 4961–4976. https://doi.org/10.1080/00207543.2018.1492161

Tan, M., & Le, Q. V. (2020, September 11). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. https://doi.org/10.48550/arXiv.1905.11946

Theeuwes, J. (2013). Feature-based attention: It is all bottom-up priming. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1628), 20130055. https://doi.org/10.1098/rstb.2013.0055

Treisman, A. M. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology. Human Perception and Performance, 8(2), 194–214. https://doi.org/10.1037//0096-1523.8.2.194

Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136. https://doi.org/10.1016/0010-0285(80)90005-5

Tsotsos, J. K., Culhane, S. M., Kei Wai, W. Y., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78(1–2), 507–545. https://doi.org/10.1016/0004-3702(95)00025-9

Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352. https://doi.org/10.1037/0033-295x.84.4.327

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023, August 2). Attention Is All You Need. http://arxiv.org/abs/1706.03762

Wang, W., Shen, J., Xie, J., Cheng, M.-M., Ling, H., & Borji, A. (2021). Revisiting Video Saliency Prediction in the Deep Learning Era. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 220–237. https://doi.org/10.1109/TPAMI.2019.2924417

Wilson, H. R., & Bergen, J. R. (1979). A four mechanism model for threshold spatial vision. Vision Research, 19(1), 19–32. https://doi.org/10.1016/0042-6989(79)90117-2

Wolfe, J. M. (2012). Saved by a Log: How Do Humans Perform Hybrid Visual and Memory Search? Psychological Science, 23(7), 698–703. https://doi.org/10.1177/0956797612443968

Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060–1092. https://doi.org/10.3758/s13423-020-01859-9

Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology. Human Perception and Performance, 15(3), 419–433. https://doi.org/10.1037//0096-1523.15.3.419

This work is licensed under a Creative Commons Attribution 4.0 International License.