中文WildTableBench:面向真实场景的表格理解多模态基础模型评测
ENWildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild
WildTableBench是首个面向真实世界表格图像的问答基准,填补了当前评估仅依赖结构化文本或渲染图像的空白。该基准强调视觉复杂性、多变布局及多样化领域,要求模型具备结构感知和数值推理能力,对多模态基础模型在消费和企业场景中的应用有重要实践意义。
arXiv:2605.01018v2 Announce Type: replace Abstract: Using multimodal foundation models to analyze table images is a high-value yet challenging application in consumer and enterprise scenarios. Despite its importance, current evaluations rely largely on structured-text tables or clean rendered images, leaving the visual complexity of in-the-wild table images underexplored. Such images feature varied layouts and diverse domains that demand sophisticated structural perception and numerical reasoning. To bridge this gap, we introduce WildTableBench, the first question-answering benchmark for natur