中文CVSearch:以认知视觉搜索赋能多模态LLM,实现高分辨率图像感知
ENCVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception
CVSearch提出一种无需训练的自适应框架,动态调度视觉搜索策略以平衡高分辨率图像感知中的覆盖率与效率。它结合专家辅助搜索的高效性与扫描式搜索的全面性,解决盲点与计算冗余问题,显著提升多模态大语言模型的HR图像理解能力,适用于医学影像等精细视觉分析场景。
arXiv:2605.23655v1 Announce Type: new Abstract: High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a promising solution, existing methods struggle with the trade-off between coverage and efficiency. Visual expert-assisted search is efficient but prone to blind spots when proposals fail, whereas scan-based search guarantees coverage at the cost of computational redundancy and semantic fragmentation. To address this dilemma, we introduce CVSearch, a training-free adaptive framework that dynamically schedules se