中文基于Transformer模型的事件显著性预测深度学习探索

ENExploring deep learning for Event-Based Saliency Prediction with a Transformer-based model

arXiv cs.CV2026年5月25日

SEST（Swin Event-based Saliency Transformer）是一种基于Transformer的显著性预测模型，专为事件相机数据设计。该方法解决了当前缺乏大规模事件显著性数据集和强基线的两大难题，为事件相机在人类视觉注意建模中的应用提供了有效基线。

arXiv:2605.23790v1 Announce Type: new Abstract: Saliency prediction has been extensively studied in RGB images and videos as a computational model of human visual attention. In contrast, predicting saliency from event-based data remains largely unexplored, despite the biological inspiration and favorable sensing properties of event cameras. Two obstacles have held this direction back: the absence of large-scale event saliency datasets, and the lack of a strong baseline. In this paper, we introduce SEST (Swin Event-based Saliency Transformer), a transformer-based model for saliency prediction f