Setup
Installation
- Clone this repository
- Setup python-poetry in your system.
- Run
poetry install --with=frontend --all-extras
in your virtual environment to install all required dependencies. - In terminal run
python -m supermat.gradio
to the run the gradio interface to see it in action.
Setting up Adobe
We weren't able to capture hierarchical structure like section, paragraph, and sentences with our open source pdf parsing libraries. We are actively working on other alternatives that can parse the pdf files with this hierarchical structure. PyMuPDF provides in page, block and lines, which isn't exactly the same.
- Setup with Adobe PDF Services API as shown here
- Provide the credentials in the
.env
file - To cache the adobe results, you can set the
TMP_DIR
environment variable to a persistent location as well