中文CVSearch：以认知视觉搜索赋能多模态LLM，实现高分辨率图像感知

ENCVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

arXiv cs.CV2026年5月25日

CVSearch提出一种无需训练的自适应框架，动态调度视觉搜索策略以平衡高分辨率图像感知中的覆盖率与效率。它结合专家辅助搜索的高效性与扫描式搜索的全面性，解决盲点与计算冗余问题，显著提升多模态大语言模型的HR图像理解能力，适用于医学影像等精细视觉分析场景。

arXiv:2605.23655v1 Announce Type: new Abstract: High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a promising solution, existing methods struggle with the trade-off between coverage and efficiency. Visual expert-assisted search is efficient but prone to blind spots when proposals fail, whereas scan-based search guarantees coverage at the cost of computational redundancy and semantic fragmentation. To address this dilemma, we introduce CVSearch, a training-free adaptive framework that dynamically schedules se