docling

Get your documents ready for gen AI

53.0k

Stars

+9.5k

Gained

22.0%

Growth

Python

Language

View on GitHub → ↑1.2% this week

💡 Why It Matters

Docling addresses the challenge of preparing documents for generative AI applications, enabling engineers to convert various document formats like DOCX and HTML into structured data. This is particularly beneficial for ML and AI teams who require clean, parsed data to train models effectively. With nearly 53,000 stars, Docling demonstrates a strong community interest and maturity, indicating it is a production-ready solution. However, it may not be the right choice for teams needing highly customised parsing solutions or those working with very niche document formats that are not supported.

🎯 When to Use

Docling is a strong choice when teams need a reliable open source tool for engineering teams to convert documents into a format suitable for machine learning applications. Consider alternatives if your project requires extensive customisation or if you are dealing with highly specialised document types.

👥 Team Fit & Use Cases

This tool is primarily used by data scientists, ML engineers, and AI developers who need to preprocess documents for analysis or training. It is often integrated into products that involve document management systems, AI-driven content generation, or any applications that rely on document parsing.

🎭 Best For

Machine Learning and AI Engineer

🏷️ Topics & Ecosystem

ai convert document-parser document-parsing documents docx html markdown pdf pdf-converter pdf-to-json pdf-to-text pptx tables xlsx

📊 Activity

Latest commit: 2026-02-13. Over the past 96 days, this repository gained 9.5k stars (+22.0% growth). Activity data is based on daily RepoPi snapshots of the GitHub repository.