Chunking

This is where all chunking strategies on ParsedDocuments are written. Chunking strategies are strategies to best store the ParsedDocuments in a vector store or for LLM processing.

`BaseChunker`

Bases: ABC

Base class for all Chunker implementations.

Source code in supermat/core/chunking/base.py

class BaseChunker(ABC):
    """
    Base class for all Chunker implementations.
    """

    @abstractmethod
    def create_chunks(self, processed_document: ParsedDocumentType) -> DocumentChunksType:  # noqa: U100
        """Build chunks from the given ParsedDocument into list of ChunkDocuments.
        This is the public class that is called for any chunking strategy.

        Args:
            processed_document (ParsedDocumentType): The processed document that needs to split into chunks.

        Returns:
            DocumentChunksType: The chunks built by the given strategy.
        """

`create_chunks(processed_document)` `abstractmethod`

Build chunks from the given ParsedDocument into list of ChunkDocuments. This is the public class that is called for any chunking strategy.

Parameters:

Name	Type	Description	Default
`processed_document`	`ParsedDocumentType`	The processed document that needs to split into chunks.	required

Returns:

Name	Type	Description
`DocumentChunksType`	`DocumentChunksType`	The chunks built by the given strategy.

Source code in supermat/core/chunking/base.py

@abstractmethod
def create_chunks(self, processed_document: ParsedDocumentType) -> DocumentChunksType:  # noqa: U100
    """Build chunks from the given ParsedDocument into list of ChunkDocuments.
    This is the public class that is called for any chunking strategy.

    Args:
        processed_document (ParsedDocumentType): The processed document that needs to split into chunks.

    Returns:
        DocumentChunksType: The chunks built by the given strategy.
    """

Chunking

BaseChunker

create_chunks(processed_document) abstractmethod

`BaseChunker`

`create_chunks(processed_document)` `abstractmethod`