中文GT-SVJ：基于生成式Transformer的自监督视频评判器，用于高效视频奖励建模

ENGT-SVJ: Generative-Transformer-Based Self-Supervised Video Judge For Efficient Video Reward Modeling

arXiv cs.CV2026年5月25日

提出基于生成式Transformer的自监督视频评判器（\modelname），将视频生成模型重新用作奖励模型，以替代难以捕捉时间动态的视觉语言模型，实现更优的人类偏好对齐，提升视频生成质量。

arXiv:2602.05202v2 Announce Type: replace Abstract: Aligning video generative models with human preferences remains challenging: current approaches rely on Vision-Language Models (VLMs) for reward modeling, but these models struggle to capture subtle temporal dynamics. We propose a fundamentally different approach: repurposing video generative models, which are inherently designed to model temporal structure, as reward models. We present the Generative-Transformer-based Self-Supervised Video Judge (\modelname), a novel evaluation model that transforms state-of-the-art video generation models i