中文VINS-120K:基于大规模数据集的超高分辨率图像编辑
ENVINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset
VINS-120K是首个基于指令的超高分辨率(≥4K)图像编辑大规模数据集,包含12万组指令、输入和编辑图像三元组。通过严格多阶段流水线确保视觉质量、指令对齐与美观性,解决了高纹理细节建模和高质量数据缺失的难题,为UHR图像编辑提供重要基准。
arXiv:2605.23518v1 Announce Type: new Abstract: Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-scale dataset for instruction-based UHR image editing, comprising 120K carefully curated triplets of instruction, input image, and edited image. Each image exceeds 4K resolution ($\geq$4096 $\times$ 4096) and is filtered through a rigorous multi-stage pipeline to ensure visual quality, instruction alignment, and aesthe