中文PhotoFlow: 智能体3D虚拟摄影任务
ENPhotoFlow: Agentic 3D Virtual Photography Missions
虚拟摄影要求智能体在无预设视角下,依据语言意图和场景信息,自主选择相机参数并渲染照片。PhotoFlow提出导演-评审-反思框架,解决了三维空间理解与审美判断的联合评估难题,为视觉语言模型在空间智能与摄影美学结合上提供了新思路。
arXiv:2605.23771v1 Announce Type: new Abstract: Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes this kind of spatial agent increasingly plausible, but the task stresses two capabilities that remain hard to evaluate together: complex 3D spatial understanding and abstract aesthetic judgment. We introduce PhotoFlow, a Director-Reviewer-Reflector ag