中文用于视觉Transformer的魏尔斯特拉斯位置编码
ENWeierstrass Positional Encoding for Vision Transformers
视觉Transformer(ViT)因使用一维可学习位置编码破坏了图像二维空间结构,缺乏几何约束且无法保持欧氏距离与索引距离的单调性。本文提出基于周期性的Weierstrass位置编码,强制保留空间邻近先验,增强模型对图像局部结构的建模能力,有助于提升医学影像等依赖空间信息的任务性能。
arXiv:2605.23719v1 Announce Type: new Abstract: Vision Transformers have achieved remarkable success in computer vision, but their common use of learnable one-dimensional positional encodings weakens the inherent two-dimensional spatial structure of images after patch flattening. Existing positional encodings often lack geometric constraints and do not preserve a monotonic relationship between Euclidean spatial distances and sequential index distances, limiting ViTs' ability to exploit spatial proximity priors. Motivated by the usefulness of periodicity in positional encoding, we propose Weier