中文CRONOS:视频模型中反事实物理一致性的基准测试
ENCRONOS: Benchmarking Counterfactual Physical Consistency in Video Models
视频预测被视作通向通用世界模型的路径,但模型是否真正学习因果结构尚存疑。作者提出CRONOS基于干预的基准,通过改变场景上下文、视角、物体外观和类别等视觉输入,评估模型预测是否具有反事实物理一致性。该方法旨在区分模型是理解物理规律还是仅利用表面视觉相关性,对推动可泛化世界模型的发展具有重要意义。
arXiv:2605.23699v1 Announce Type: new Abstract: Video prediction is increasingly viewed as a path toward generalizable world models, yet it remains unclear whether these systems learn underlying causal structure or merely exploit superficial visual correlations for future prediction. We introduce CRONOS, an intervention-based benchmark designed to evaluate counterfactual physical consistency: whether a model's predictions of physical events respond appropriately to controlled changes in the visual input, such as variations of scene context, viewpoint, object appearance, and object category. Bu