中文World-R1:强化文本到视频生成中的3D约束
ENWorld-R1: Reinforcing 3D Constraints for Text-to-Video Generation
世界-R1通过强化学习(Flow-GRPO)对齐视频生成与3D约束,无需高成本架构修改。利用纯文本数据集优化模型,显著提升几何一致性,为可扩展、低计算开销的视频世界模拟提供新路径。
arXiv:2604.24764v3 Announce Type: replace Abstract: Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns video generation with 3D constraints through reinforcement learning. To facilitate this alignment, we introduce a specialized pure text dataset tailored for world simulation. Utilizing Flow-GRPO, we optimize the model using feedb