中文DDX-TRACE:面向视觉语言模型的医疗诊断轨迹基准
ENDDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs
DDX-TRACE是一个多模态神经放射学基准,通过模拟临床序贯检查过程评估AI推理能力,包括证据收集、鉴别诊断和终止决策,而非仅最终答案。该方法能揭示AI在不确定更新和工作流程中的缺陷,提升诊断可靠性。
arXiv:2605.23629v1 Announce Type: new Abstract: Medical diagnosis is not a single prediction from a fully specified vignette. It is a sequential workup: clinicians decide what evidence to obtain, revise a differential diagnosis, and stop when the diagnosis is sufficiently supported. Most medical AI benchmarks instead reveal the relevant context upfront and score only the final answer, making unsupported correct guesses, premature closure, inefficient workups, and poor uncertainty updating invisible. We introduce DDX-TRACE, a physician-adjudicated benchmark for multimodal neuroradiology that ev