中文VGAS：面向少样本视觉-语言-动作自适应的价值引导动作块选择

ENVGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation

arXiv cs.CV2026年5月25日

提出VGAS框架，应对VLA模型在少样本适应中因几何歧义导致的执行失败。通过“生成-选择”视角，先由VLA策略生成多个动作候选，再训练价值模型评估并选择最优动作，显著提升新任务适应可靠性。方法轻量，实用性强。

arXiv:2602.07399v2 Announce Type: replace-cross Abstract: Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonstrations remains unreliable. While fine-tuned VLA policies often produce semantically plausible trajectories, failures often arise from unresolved geometric ambiguities, where near-miss actions lead to divergent execution outcomes under limited supervision. We study few-shot VLA adaptation from a \emph{generation--selection} perspective and propose a novel framework \textbf{VGAS} (\textbf{V}alue-\text