Latest Open Source Projects
text-extract-api
text-extract-api
TLDR: A tool for converting images, PDFs, and Office documents to Markdown or JSON with high accuracy. Built with FastAPI, uses Celery for asynchronous tasks and Redis for caching. Supports various OCR strategies and can remove PII. Comes with a CLI tool and has different storage strategies. Also has an online demo and dedicated API clients.
2024-10-23
Github