中文GAF:高斯动作场作为机器人操作中动态世界建模的4D表示
ENGAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation
提出V-4D-A框架,通过运动感知的4D表示直接推理动作,克服传统方法在复杂动态操作场景中的不准确性。亮点在于利用时域信息增强场景理解,提升机器人抓取与操控精度。
arXiv:2506.14135v5 Announce Type: replace-cross Abstract: Accurate scene perception is critical for vision-based robotic manipulation. Existing approaches typically follow either a Vision-to-Action (V-A) paradigm, predicting actions directly from visual inputs, or a Vision-to-3D-to-Action (V-3D-A) paradigm, leveraging intermediate 3D representations. However, these methods often struggle with action inaccuracies due to the complexity and dynamic nature of manipulation scenes. In this paper, we adopt a V-4D-A framework that enables direct action reasoning from motion-aware 4D representations vi