Parsed document
BaseChunk
Bases: CustomBaseModel
BaseChunk that contains the type and structure of the chunk.
Source code in supermat/core/models/parsed_document.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
BaseChunkProperty
Bases: CustomBaseModel
Properties assosciated with a chunk. Close to adobe's format.
Source code in supermat/core/models/parsed_document.py
105 106 107 108 109 110 111 112 |
|
BaseTextChunk
Bases: BaseChunk
Common TextChunk model.
Source code in supermat/core/models/parsed_document.py
171 172 173 174 175 176 177 |
|
CustomBaseModel
Bases: BaseModel
BaseModel with some extra tweaks. Needed this to handle previous output of parsed documents which has optional keys and needed to be saved for tests.
Source code in supermat/core/models/parsed_document.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
serialize_model(nxt)
This custom serializer ensures that extra keys are included as well.
Source code in supermat/core/models/parsed_document.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
FontProperties
Bases: CustomBaseModel
Font properties in a TextChunkProperty.
Source code in supermat/core/models/parsed_document.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
FootnoteChunk
Bases: TextChunk
TextChunk which is a Footnote
Source code in supermat/core/models/parsed_document.py
212 213 214 215 216 217 |
|
ImageChunk
Bases: BaseChunk
, BaseChunkProperty
ImageChunk that stores the image in Base64 encoding.
Source code in supermat/core/models/parsed_document.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|
TextChunk
Bases: BaseTextChunk
TextChunk which was similar to the initial version of supermat.
Source code in supermat/core/models/parsed_document.py
180 181 182 183 184 185 186 187 188 189 |
|
TextChunkProperty
Bases: BaseChunkProperty
Properties assosciated to a TextChunk
Source code in supermat/core/models/parsed_document.py
131 132 133 134 135 136 137 |
|
ValidationWarning
Bases: UserWarning
Custom warning for validation issues in Pydantic models.
Source code in supermat/core/models/parsed_document.py
47 48 |
|
export_parsed_document(document, output_path, **kwargs)
Export given ParsedDocument to a json file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
document |
ParsedDocumentType
|
The ParsedDocument to be dumped. |
required |
output_path |
Path | str
|
JSON file location. |
required |
Source code in supermat/core/models/parsed_document.py
249 250 251 252 253 254 255 256 257 258 |
|
load_parsed_document(path)
Load a json dumped ParsedDocument
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path | str
|
file path to the json file. |
required |
Returns:
Name | Type | Description |
---|---|---|
ParsedDocumentType |
ParsedDocumentType
|
ParsedDocument model loaded from json. |
Source code in supermat/core/models/parsed_document.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
|