中文用于视觉Transformer的魏尔斯特拉斯位置编码

ENWeierstrass Positional Encoding for Vision Transformers

arXiv cs.CV2026年5月25日

视觉Transformer（ViT）因使用一维可学习位置编码破坏了图像二维空间结构，缺乏几何约束且无法保持欧氏距离与索引距离的单调性。本文提出基于周期性的Weierstrass位置编码，强制保留空间邻近先验，增强模型对图像局部结构的建模能力，有助于提升医学影像等依赖空间信息的任务性能。

arXiv:2605.23719v1 Announce Type: new Abstract: Vision Transformers have achieved remarkable success in computer vision, but their common use of learnable one-dimensional positional encodings weakens the inherent two-dimensional spatial structure of images after patch flattening. Existing positional encodings often lack geometric constraints and do not preserve a monotonic relationship between Euclidean spatial distances and sequential index distances, limiting ViTs' ability to exploit spatial proximity priors. Motivated by the usefulness of periodicity in positional encoding, we propose Weier