中文B-GRTO: 基于自举的组相对工具优化用于指代分割
ENB-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation
该研究探讨复杂指代分割任务,采用大视觉语言模型与分割解码器结合的方法,强化学习提升推理能力,但指出分割解码器等可训练模块通常...(原文截断)。
arXiv:2605.23500v1 Announce Type: new Abstract: Segmentation is a fundamental task in computer vision, underpinning pixel-level scene understanding and serving as a cornerstone for applications ranging from autonomous perception to medical image analysis. For complex referring segmentation, recent methods pair large vision-language models with segmentation decoders: the former analyzes the image and prompt, while the latter predicts the target mask. Although reinforcement learning improves reasoning-intensive vision-language systems, trainable tools such as segmentation decoders are typically