中文VGAS:面向少样本视觉-语言-动作自适应的价值引导动作块选择
ENVGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation
提出VGAS框架,应对VLA模型在少样本适应中因几何歧义导致的执行失败。通过“生成-选择”视角,先由VLA策略生成多个动作候选,再训练价值模型评估并选择最优动作,显著提升新任务适应可靠性。方法轻量,实用性强。
arXiv:2602.07399v2 Announce Type: replace-cross Abstract: Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonstrations remains unreliable. While fine-tuned VLA policies often produce semantically plausible trajectories, failures often arise from unresolved geometric ambiguities, where near-miss actions lead to divergent execution outcomes under limited supervision. We study few-shot VLA adaptation from a \emph{generation--selection} perspective and propose a novel framework \textbf{VGAS} (\textbf{V}alue-\text