中文MirrorCheck:视觉-语言模型的高效对抗性防御
ENMirrorCheck: Efficient Adversarial Defense for Vision-Language Models
视觉语言模型(VLM)易受自适应对抗攻击,现有防御常失效。提出MirrorCheck检测框架,模型无关,通过文本到图像模型从目标模型输出标题重建图像,比较特征嵌入语义一致性,有效发现恶意样本,提升多模态防御鲁棒性。
arXiv:2406.09250v3 Announce Type: replace Abstract: Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically designed to bypass existing defenses. To address this vulnerability, we propose MirrorCheck, a robust and model-agnostic detection framework that operates effectively in both unimodal and multimodal settings. MirrorCheck leverages Text-to-Image (T2I) models to regenerate visual content from captions produced by the target model and assesses semantic consistency by comparing feature-space embeddings betwee