中文TIME机器：论运动在高效感知中的作用

ENThe TIME Machine: On The Power of Motion for Efficient Perception

arXiv cs.CV2026年5月25日

本文指出视频表示学习虽因大规模训练和语言对比学习取得进展，但面临成本高昂及概念受限于文本描述的问题，导致模型仍存在不足。方法上强调当前依赖语言对比的局限，实际意义在于提示未来需探索更高效、无语言依赖的视频学习策略。

arXiv:2605.23045v1 Announce Type: new Abstract: Video representation learning has seen tremendous progress in recent years. This has been driven by many factors, including the scale of training and the success of visual models trained contrastively with language. While these factors have pushed the boundaries of what video models can do, they also introduce their own set of limitations: first, scaling video models can reach prohibitive costs and second, learning from language restricts the range of concepts that can be learned to those in captions. As a result, video models still struggle with