Capabilities
The service supports a wide range of input formats:PDF Documents
High-fidelity extraction using
pymupdf (fitz) with PyPDF2 fallback.Microsoft Office
Native support for Word (
.docx), Excel (.xlsx), and PowerPoint (.pptx).Text & Code
Parses
.txt, .md, .json, .csv, .html and other plain text formats.Multimedia
Integrates with OpenAI Whisper API for transcribing Audio and Video files.
Web Scraping
Extracts cleaner content from URLs using
playwright and beautifulsoup4.E-Books & Email
Processes
.epub books and .mbox email archives.