Created
September 9, 2025 08:36
-
-
Save moreati/7c31013c1fce5c9d6e1d18aaebd492c7 to your computer and use it in GitHub Desktop.
WIP: Kaitai Struct definition for .xz file format
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # This definition is incomplete, several fiels are hardcoded. | |
| # See https://github.com/kaitai-io/kaitai_struct_formats/issues/719 | |
| meta: | |
| id: xz | |
| title: XZ compressed file | |
| file-extension: xz | |
| xref: | |
| forensicswiki: xz | |
| justsolve: XZ | |
| mime: application/x-xz | |
| pronom: fmt/1098 | |
| wikidata: Q162839 | |
| license: CC0-1.0 | |
| endian: le | |
| imports: | |
| - /common/vlq_base128_le | |
| doc: | | |
| XZ single-file archive format compressed using LZMA. | |
| XZ files typically have a .xz file extension. | |
| They're often produced or consumed with xz-utils suite of software. | |
| LZMA (and hence XZ) usually achieves higher compression ratios, | |
| at the expense of slower compression & decompression. | |
| doc-ref: | |
| - https://tukaani.org/xz/xz-file-format-1.2.1.txt | |
| - https://tukaani.org/xz/xz-file-format.txt | |
| seq: | |
| - id: header | |
| type: header | |
| - id: block | |
| type: block # FIXME A stream can have 0 - many blocks | |
| - id: index | |
| type: index | |
| - id: footer | |
| type: footer | |
| enums: | |
| filters: | |
| 0x03: | |
| id: delta | |
| 0x04: | |
| id: x86_bcj | |
| 0x05: | |
| id: powerpc_be | |
| 0x06: | |
| id: ia64 | |
| 0x07: | |
| id: arm | |
| 0x08: | |
| id: arm_thumb | |
| 0x09: | |
| id: sparc | |
| 0x0a: | |
| id: arm64 | |
| 0x0b: | |
| id: risc_v | |
| 0x21: | |
| id: lzma2 | |
| stream_checks: | |
| 0: | |
| id: none | |
| 1: | |
| id: crc32 | |
| 4: | |
| id: crc64 | |
| 10: | |
| id: sha256 | |
| types: | |
| check_none: | |
| doc: This type is intentionally left blank | |
| check_crc32: | |
| seq: | |
| - id: value | |
| type: u4 | |
| check_crc64: | |
| seq: | |
| - id: value | |
| type: u8 | |
| check_sha256: | |
| seq: | |
| - id: value | |
| size: 32 | |
| header: | |
| seq: | |
| - id: magic | |
| contents: [0xfd, 0x37, 0x7a, 0x58, 0x5a, 0x00] | |
| - id: flags | |
| type: stream_flags | |
| - id: crc32 | |
| type: u4 | |
| stream_flags: | |
| seq: | |
| - id: reserved1 | |
| type: u1 | |
| - id: reserved2 | |
| type: b4 | |
| - id: check | |
| type: b4 | |
| enum: stream_checks | |
| block: | |
| seq: | |
| - id: header | |
| type: block_header | |
| - id: compressed_data | |
| size: 8 # FIXME | |
| - id: padding | |
| size: 0 # FIXME | |
| - id: check | |
| type: | |
| switch-on: _root.header.flags.check | |
| cases: | |
| 'stream_checks::none': check_none | |
| 'stream_checks::crc32': check_crc32 | |
| 'stream_checks::crc64': check_crc64 | |
| 'stream_checks::sha256': check_sha256 | |
| block_header: | |
| seq: | |
| - id: len_header_encoded | |
| type: u1 | |
| - id: has_len_uncompressed | |
| type: b1 | |
| - id: has_len_compressed | |
| type: b1 | |
| - id: reserved | |
| type: b4 | |
| - id: num_filters_encoded | |
| type: b2 | |
| - id: len_compressed | |
| type: vlq_base128_le | |
| if: has_len_uncompressed | |
| - id: len_uncompressed | |
| type: vlq_base128_le | |
| if: has_len_compressed | |
| - id: filter_flags | |
| type: block_filter_flag | |
| repeat: expr | |
| repeat-expr: num_filters | |
| - id: padding | |
| size: 9 # FIXME Calculate so that header length matchs len_header | |
| - id: crc32 | |
| type: u4 | |
| instances: | |
| len_header: | |
| value: (len_header_encoded + 1) * 4 | |
| num_filters: | |
| value: num_filters_encoded + 1 | |
| block_filter_flag: | |
| seq: | |
| - id: id | |
| type: vlq_base128_le | |
| - id: len_properties | |
| type: vlq_base128_le | |
| - id: properties | |
| size: len_properties.value | |
| instances: | |
| type: | |
| value: id.value | |
| enum: filters | |
| index: | |
| seq: | |
| - id: index_indicator | |
| contents: [0] | |
| - id: num_records | |
| type: vlq_base128_le | |
| - id: records | |
| type: index_record | |
| repeat: expr | |
| repeat-expr: num_records.value | |
| - id: padding | |
| size: (4 - (_io.pos % 4)) % 4 | |
| - id: crc32 | |
| type: u4 | |
| index_record: | |
| seq: | |
| - id: len_unpadded | |
| type: vlq_base128_le | |
| - id: len_uncompressed | |
| type: vlq_base128_le | |
| footer: | |
| seq: | |
| - id: crc32 | |
| type: u4 | |
| - id: backward_size_raw | |
| type: u4 | |
| - id: flags | |
| type: stream_flags | |
| - id: magic | |
| contents: [0x59, 0x5a] | |
| instances: | |
| backward_size: | |
| value: (backward_size_raw + 1) * 4 |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
File used during development generated by
echo foo | xz > foo.xz. The uncompressed data is 4 bytes, including a final newline.