中文GEM-4D:面向机器人操作的几何增强视频世界模型
ENGEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation
GEM-4D提出几何基础视频世界模型,通过蒸馏预训练几何模型的密集4D对应监督注入生成主干,解决现有模型无法保持点级运动一致性问题,使生成视频具备物理基础,支持机器人等可靠动作执行。
arXiv:2605.22882v1 Announce Type: new Abstract: Video world models can generate realistic futures from a single instruction, but they often fail to preserve consistent point-level motion over time. As a result, the generated videos appear plausible, yet lack the physical grounding required for reliable action execution, such as robot manipulation. We present GEM-4D, a geometry-grounded video world model that resolves this limitation by injecting dense 4D correspondence supervision, distilled from a pretrained geometry foundation model, into the video generative backbone during training. This s