MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Images
Technion
MulTaBench introduces 40 datasets (20 image-tabular, 20 text-tabular) — the largest image-tabular benchmarking effort to date. The benchmark reveals that current tabular foundation models rely on frozen embeddings and that task-specific tuning substantially improves performance across text and image modalities and multiple encoder scales.
Why it matters
Real-world tabular data routinely includes images and free text alongside numeric columns, yet existing benchmarks ignore this gap. MulTaBench reveals a concrete weakness in current foundation models. 122 upvotes on HF Daily (May 14).
Importance: 4/5
122 HF Daily upvotes (+1 bump); fills a recognized gap in tabular ML benchmarking
Sources
official
arXiv: MulTaBench