中文GT-SVJ:基于生成式Transformer的自监督视频评判器,用于高效视频奖励建模
ENGT-SVJ: Generative-Transformer-Based Self-Supervised Video Judge For Efficient Video Reward Modeling
提出基于生成式Transformer的自监督视频评判器(\modelname),将视频生成模型重新用作奖励模型,以替代难以捕捉时间动态的视觉语言模型,实现更优的人类偏好对齐,提升视频生成质量。
arXiv:2602.05202v2 Announce Type: replace Abstract: Aligning video generative models with human preferences remains challenging: current approaches rely on Vision-Language Models (VLMs) for reward modeling, but these models struggle to capture subtle temporal dynamics. We propose a fundamentally different approach: repurposing video generative models, which are inherently designed to model temporal structure, as reward models. We present the Generative-Transformer-based Self-Supervised Video Judge (\modelname), a novel evaluation model that transforms state-of-the-art video generation models i