Skip to content

Base

Base abstractions of Parser and Converters. Parser parses a given document type into a ParsedDocumentType. Converter converts a given document from one format to another so that it can be compatible with an existing Parser. Example: We have a Parser that parses a .pdf document, we can have Converters that convert docx, pptx into pdf.

Converter

Bases: ABC

Source code in supermat/core/parser/base.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class Converter(ABC):
    @abstractmethod
    def convert(self, file_path: Path) -> Path:  # noqa: U100
        """
        Converts input file to another file type and saves it. The saved file path is returned.

        Args:
            file_path (Path): Input file.

        Returns:
            Path: Output file after conversion.
        """

    def __call__(self, file_path: Path) -> Path:
        return self.convert(file_path)

convert(file_path) abstractmethod

Converts input file to another file type and saves it. The saved file path is returned.

Parameters:

Name Type Description Default
file_path Path

Input file.

required

Returns:

Name Type Description
Path Path

Output file after conversion.

Source code in supermat/core/parser/base.py
33
34
35
36
37
38
39
40
41
42
43
@abstractmethod
def convert(self, file_path: Path) -> Path:  # noqa: U100
    """
    Converts input file to another file type and saves it. The saved file path is returned.

    Args:
        file_path (Path): Input file.

    Returns:
        Path: Output file after conversion.
    """

Parser

Bases: ABC

Source code in supermat/core/parser/base.py
18
19
20
21
22
23
24
25
26
27
28
29
class Parser(ABC):
    @abstractmethod
    def parse(self, file_path: Path) -> ParsedDocumentType:  # noqa: U100
        """
        Parse give file to ParsedDocumentType.

        Args:
            file_path (Path): Input file.

        Returns:
            ParsedDocumentType: Parsed document
        """

parse(file_path) abstractmethod

Parse give file to ParsedDocumentType.

Parameters:

Name Type Description Default
file_path Path

Input file.

required

Returns:

Name Type Description
ParsedDocumentType ParsedDocumentType

Parsed document

Source code in supermat/core/parser/base.py
19
20
21
22
23
24
25
26
27
28
29
@abstractmethod
def parse(self, file_path: Path) -> ParsedDocumentType:  # noqa: U100
    """
    Parse give file to ParsedDocumentType.

    Args:
        file_path (Path): Input file.

    Returns:
        ParsedDocumentType: Parsed document
    """