中文Imagine2Real: 基于视频生成先验的零样本人形物体交互

ENImagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors

arXiv cs.CV2026年5月25日

本研究提出Imagine2Real，零样本人形机器人-物体交互框架，解决3D数据稀缺导致的表示不对齐和重定向复杂问题。关键发现：无需几何先验（如CAD模型）和密集形态变形，实现灵活无几何交互。方法核心：基于视频生成先验，绕过显式3D模型。实际意义：降低对高保真数据的依赖，提升机器人交互的适应性与效率。

arXiv:2605.22272v2 Announce Type: replace-cross Abstract: Whole-body Humanoid-Object Interaction (HOI) is bottlenecked by the scarcity of high-fidelity 3D data. While video generative priors offer a promising alternative, existing methods suffer from \textit{Representation Misalignment} due to their reliance on geometric priors (e.g., explicit CAD models), and \textit{Retargeting Complexity} arising from intensive morphing and morphological mismatch. We propose Imagine2Real, a zero-shot HOI framework for flexible, geometry-free interaction. To resolve misalignment, we formulate robot and objec