中文压缩图像理解的VLM基准测试与增强

ENBenchmarking and Enhancing VLM for Compressed Image Understanding

arXiv cs.CV2026年5月25日

本研究首次构建综合基准，评估视觉语言模型（VLM）对低比特率压缩图像的理解能力，涵盖多种图像编解码器与多样化任务。结果表明，现有VLM主要处理高比特率压缩图像，而对低比特率压缩图像的解读能力尚待探索，为相关应用提供了重要参考。

arXiv:2512.20901v2 Announce Type: replace Abstract: With the rapid development of Vision-Language Models (VLMs) and the growing demand for their applications, efficient compression of the image inputs has become increasingly important. Existing VLMs predominantly digest and understand high-bitrate compressed images, while their ability to interpret low-bitrate compressed images has yet to be explored by far. In this paper, we introduce the first comprehensive benchmark to evaluate the ability of VLM against compressed images, varying existing widely used image codecs and diverse set of tasks,