中文EvalVerse：面向专业级电影视频生成的流水线感知与专家校准评测

ENEvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

arXiv cs.CV2026年5月25日

现有视频生成模型评估存在瓶颈：多数基准仅关注“是否正确”（基本提示跟随），忽略“是否良好”（电影质量、表演、美学）。该研究提出应借助强化学习与智能体工作流，转向专业级电影合成质量评估。方法论亮点：评估需涵盖更全面的审美与演技维度。实际意义：推动生成视频从“对错”转向“优劣”评估。

arXiv:2605.23271v1 Announce Type: new Abstract: The rapid evolution of generative video foundation models has propelled the field toward professional-grade cinematic synthesis. To achieve such demanding quality, the community transitions towards Reinforcement Learning (RL) and agentic workflows. However, reliable evaluation has emerged as a critical bottleneck. Existing benchmarks predominantly evaluate ''whether it is right'' (basic prompt-following) while fundamentally neglecting ''whether it is good'' (cinematic quality, acting, and aesthetics). Furthermore, current automated metrics lack t