Listed in Data & Analytics with 35 other toolsPart of 580+ curated AI tools on AISO
Docling logo

Docling

IBM's open-source document parser for PDFs, DOCX, and 20+ formats

open-sourceMIT licensed. Free and open-sourceView full pricing →

Visit Docling

https://github.com/docling-project/docling

About Docling

Open-source document processing library from IBM Research that parses diverse formats including PDF, DOCX, PPTX, XLSX, HTML, images, audio, and more. Docling provides advanced PDF understanding with page layout analysis, table structure recognition, code and formula extraction, and seamless integration with LangChain, LlamaIndex, CrewAI, and Haystack for RAG pipelines.

Key Features

Parse PDF, DOCX, PPTX, XLSX, HTML, images, audio, and more
Advanced PDF layout and table structure recognition
OCR for scanned documents and images
Export to Markdown, HTML, JSON, DocTags
LangChain, LlamaIndex, CrewAI integrations
Local execution for sensitive data / air-gapped environments

Tags

document parsingpdfopen sourceragdata extractionibm
🏷️

Is this your tool?

Claim your listing to get a Featured badge, edit your description, and stand out from competitors. All plans include a permanent dofollow backlink to your site.

Claim Now →

Stay updated on Data & Analytics tools — join our weekly newsletter

One concise email with fresh launches, trending picks, and featured standouts.