中文压缩即适应:基于扩散基础模型的隐式视觉表示
ENCompression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models
现代视觉生成模型通过大规模训练获得丰富知识,但现有表示(像素、潜变量或令牌)无法直接利用这些知识进行紧凑存储或重用。本文提出新框架:将信号编码为函数,由附着在冻结视觉生成模型上的低秩适应参数化。例如,一个81帧视频可表示为这种隐式函数。该方法充分利用生成模型的先验知识,实现更高效的视觉信号存储与复用。
arXiv:2603.07615v3 Announce Type: replace-cross Abstract: Modern visual generative models acquire rich visual knowledge through large-scale training, yet existing visual representations (such as pixels, latents, or tokens) remain external to the model and cannot directly exploit this knowledge for compact storage or reuse. In this work, we introduce a new visual representation framework that encodes a signal as a function, which is parametrized by low-rank adaptations attached to a frozen visual generative model. Such implicit representations of visual signals, \textit{e.g.}, an 81-frame video