Mistral Releases OCR 4 with Bounding Boxes, Block Classification, and 170-Language Support

Mistral

Tools official + media 4 src. ~1 min

Mistral published OCR 4 on June 23, 2026. New capabilities include per-word bounding boxes, typed block classification (titles, tables, equations, signatures), and per-word confidence scores — enabling source-grounded citations and spatial indexing. The model supports 170 languages across 10 language groups, handles PDF, DOC, PPT, and OpenDocument formats, and runs self-hosted in a single container. On OlmOCRBench it scores 85.20 (top overall) and 93.07 on OmniDocBench. Pricing: $4/1,000 pages via API, $2 with Batch API.

Why it matters

Bounding boxes and confidence scores are the most-requested capabilities for document AI pipelines, enabling in-context highlighting, form extraction, and spatial reasoning that pure text extraction cannot support. Self-hosting support removes data-egress concerns for regulated industries.

Importance: 3/5

State-of-the-art document intelligence model with bounding boxes (new capability class) + self-hosted deployment option for regulated industries

Sources