中文面向视觉-语言数据集蒸馏的多模态分布匹配
ENMultimodal Distribution Matching for Vision-Language Dataset Distillation
提出多模态分布匹配(MDM)框架,通过几何感知高效压缩视觉-语言训练集,在有限计算和内存下保持表示质量与跨模态对齐,解决了现有方法计算量大且忽视模态相关性的问题。
arXiv:2605.23482v1 Announce Type: new Abstract: Dataset distillation compresses large training sets into compact synthetic datasets while preserving downstream performance. As modern systems increasingly operate on paired vision-language inputs, multimodal distillation must preserve representation quality and cross-modal alignment under tight compute and memory budgets, yet prior methods often require heavy computes and overlook their correlations. To address this, we present Multimodal Distribution Matching (MDM), a geometry-aware framework for efficient and generalizable multimodal distillat