中文MirrorCheck：视觉-语言模型的高效对抗性防御

ENMirrorCheck: Efficient Adversarial Defense for Vision-Language Models

arXiv cs.CV2026年5月25日

视觉语言模型（VLM）易受自适应对抗攻击，现有防御常失效。提出MirrorCheck检测框架，模型无关，通过文本到图像模型从目标模型输出标题重建图像，比较特征嵌入语义一致性，有效发现恶意样本，提升多模态防御鲁棒性。

arXiv:2406.09250v3 Announce Type: replace Abstract: Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically designed to bypass existing defenses. To address this vulnerability, we propose MirrorCheck, a robust and model-agnostic detection framework that operates effectively in both unimodal and multimodal settings. MirrorCheck leverages Text-to-Image (T2I) models to regenerate visual content from captions produced by the target model and assesses semantic consistency by comparing feature-space embeddings betwee