Utils
parse_pdf(pdf_file)
Converts pdf file to a PyMuPDF Document model to easy parsing. pymupdf, doesn't provide a pydantic model in their implementation. We convert it into a pydantic model to make it easier to work with.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pdf_file |
Path
|
The pdf file that needs to be parsed. |
required |
Returns:
Name | Type | Description |
---|---|---|
PyMuPDFDocument |
PyMuPDFDocument
|
Pydantic model representation of the pdf file. |
Source code in supermat/core/parser/pymupdf_parser/utils.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|