Mistral OCR 4 Adds Structured Output for Document AI Pipelines
Mistral OCR 4 brings bounding boxes, block classification, confidence scores, 170-language coverage, and self-hosting options to document extraction and RAG pipelines.
Mistral OCR 4 is a document-understanding model for extracting structured text, layout, block types, bounding boxes, and confidence scores from enterprise documents. Mistral says it supports 170 languages across 10 language groups, can run in a single container for self-hosted deployments, and is available through the API, Mistral Studio Document AI, Amazon SageMaker, and Microsoft Foundry. Microsoft separately confirms that Mistral Document AI with OCR 4 is arriving in Microsoft Foundry for structured document pipelines.
Key takeaways
- OCR 4 returns more than plain text: it adds layout-aware output, block classification, bounding boxes, and per-word confidence signals.
- Mistral positions it for RAG ingestion, enterprise search, form extraction, compliance workflows, redaction, and human review queues.
- Pricing in Mistral's announcement is $4 per 1,000 pages through the API, $2 per 1,000 pages with Batch API discount, and $5 per 1,000 pages for Document AI.
- Microsoft says OCR 4 is available in Foundry as
mistral-ocr-4-0, which matters for teams already standardizing model deployment there. - Mistral reports strong benchmark and human-preference results, but it also warns that OCR benchmarks can contain scoring artifacts; test on your own documents.
Practical LinkLoot angle
OCR 4 is most interesting for workflows where plain OCR creates cleanup work downstream. If your RAG or automation pipeline needs citations, page references, structured fields, redaction targets, or review confidence, layout and confidence metadata are often more useful than another chunk of flattened text.
| Product path | Best use | Limitation | Source |
|---|---|---|---|
| OCR 4 API | Custom extraction pipelines with direct control over output | Requires engineering work around storage, validation, and downstream parsing | Mistral |
| Document AI | Schema-shaped outputs and no-code or low-code document workflows | Higher listed per-page price than raw API extraction | Mistral |
| Microsoft Foundry deployment | Enterprise teams already using Foundry governance and deployment controls | Availability, region, and pricing should be checked in the tenant | Microsoft |
A practical first test is not a generic PDF demo. Use the documents that break your current stack: multi-column reports, scanned contracts, invoices with stamps, mixed-language files, forms with tables, or documents where confidence thresholds decide whether a human must review the output. Compare total workflow cost, not just per-page OCR price: failed citations, wrong fields, and manual review time are the expensive parts.
What to verify before you act
Verify the exact deployment path you plan to use: Mistral API, Batch API, Document AI, Microsoft Foundry, SageMaker, or self-hosting. Check data residency, retention, model availability, per-page pricing, and whether your chosen route exposes bounding boxes, block classification, confidence scores, and schema output in the format your pipeline needs. For regulated use, confirm that OCR 4 is only extracting and structuring documents; Mistral states it is not intended for medical diagnosis, legal judgment, high-stakes financial decisions, safety-critical systems, or non-document inputs.
OCR 4 adds layout-aware document extraction with bounding boxes, block classification, confidence scores, and multilingual support.
For more workflow ideas around document automation and agents, see LinkLoot's guide to AI workflow automation.
