中文精确:面向流匹配模型RL后训练的SDE一致性随机采样
ENPrecise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models
强化学习(RL)通过将流匹配的确定性逆时ODE替换为随机SDE,形成随机策略以改善提示对齐与感知质量。其中随机采样器控制探索与去噪,其设计对性能至关重要。
arXiv:2605.23522v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become an effective way to improve prompt alignment and perceptual quality in diffusion and flow-matching generators. A critical step for applying online RL to flow matching is turning the deterministic sampling trajectory into a stochastic policy, typically by replacing the reverse-time Ordinary Differential Equation (ODE) with a Stochastic Differential Equation (SDE). The stochastic sampler, controlling the exploration behavior and denoising dynamics, is thus part of the policy, and its design can significantly