中文DFSAttn:动态细粒度稀疏注意力实现高效视频生成
ENDFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation
扩散变换器在高质量视频生成中依赖时空3D全注意力,计算成本高昂。块稀疏注意力虽聚焦重要区域降本,但DiT注意力图具有动态细粒度稀疏性,导致现有块稀疏方法在高稀疏率下质量显著下降。
arXiv:2605.23445v1 Announce Type: new Abstract: Diffusion transformers have achieved remarkable success in high-quality video generation, yet their reliance on spatiotemporal 3D full attention incurs prohibitive computational cost due to the quadratic complexity of attention. Block sparse attention is a common approach to mitigate this by focusing computation on important regions. However, attention maps in DiTs exhibit inherently dynamic and fine-grained sparsity, which causes existing block sparse attention methods to degrade significantly in quality, especially at high sparsity ratios. In t