中文VISD:通过结构化自蒸馏增强视频推理
ENVISD: Enhancing Video Reasoning via Structured Self-Distillation
arXiv论文提出PRO奖赏结构化推理路径的调试蒸馏方法,解决视频大语言模型训练中序列奖励稀疏与细粒度信用分配难题。该方法结合结构化诊断反馈与强化学习,提升复杂时空推理效率。实践意义:改善模型长时序推理性能。
arXiv:2605.06094v4 Announce Type: replace Abstract: Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assignment over long, temporally grounded reasoning trajectories. While reinforcement learning with verifiable rewards (RLVR) provides reliable supervision, it fails to capture token level contributions, leading to inefficient learning. Conversely, existing self distillation methods offer dense supervision but lack structure and diagnostic specificity, and often interact unstably with reinforcement learning. In