Core

This is where the core supermat parsing logic exists. Core deals with Supermat's parser pydantic models to define structure to the parsed documents, chunking strategies, and parser logic to convert documents into the ParsedDocument model.

`export_parsed_document(document, output_path, **kwargs)`

Export given ParsedDocument to a json file

Parameters:

Name	Type	Description	Default
`document`	`ParsedDocumentType`	The ParsedDocument to be dumped.	required
`output_path`	`Path \| str`	JSON file location.	required

Source code in supermat/core/models/parsed_document.py

def export_parsed_document(document: ParsedDocumentType, output_path: Path | str, **kwargs):
    """Export given ParsedDocument to a json file

    Args:
        document (ParsedDocumentType): The ParsedDocument to be dumped.
        output_path (Path | str): JSON file location.
    """
    output_path = Path(output_path)
    with output_path.open("wb+") as fp:
        fp.write(ParsedDocument.dump_json(document, **kwargs))

`load_parsed_document(path)`

Load a json dumped ParsedDocument

Parameters:

Name	Type	Description	Default
`path`	`Path \| str`	file path to the json file.	required

Returns:

Name	Type	Description
`ParsedDocumentType`	`ParsedDocumentType`	ParsedDocument model loaded from json.

Source code in supermat/core/models/parsed_document.py

def load_parsed_document(path: Path | str) -> ParsedDocumentType:
    """Load a json dumped `ParsedDocument`

    Args:
        path (Path | str): file path to the json file.

    Returns:
        ParsedDocumentType: ParsedDocument model loaded from json.
    """
    path = Path(path)
    with path.open("rb") as fp:
        raw_doc: list[dict[str, Any]] | dict[str, list[dict[str, Any]]] = orjson.loads(fp.read())

    if isinstance(raw_doc, dict) and len(raw_doc.keys()) == 1:
        root_key = next(iter(raw_doc.keys()))
        warn(f"The json document contains a root node {next(iter(raw_doc.keys()))}.", ValidationWarning)
        return ParsedDocument.validate_python(raw_doc[root_key])
    elif isinstance(raw_doc, list):
        return ParsedDocument.validate_python(raw_doc)
    else:
        raise ValueError("Invalid JSON Format")