Skip to content

Pymupdf parser

Parsing pdf files using the popular open source PyMuPDF parser.

PyMuPDFParser

Bases: Parser

Parses a pdf file using PyMuPDF library.

Source code in supermat/core/parser/pymupdf_parser/parser.py
114
115
116
117
118
119
120
@FileProcessor.register(".pdf")
class PyMuPDFParser(Parser):
    """Parses a pdf file using PyMuPDF library."""

    def parse(self, file_path: Path) -> ParsedDocumentType:
        parsed_pdf = parse_pdf(file_path)
        return process_pymupdf(parsed_pdf, document_name=file_path.stem)