Skip to content

Instantly share code, notes, and snippets.

@moreati
Created September 9, 2025 08:36
Show Gist options
  • Select an option

  • Save moreati/7c31013c1fce5c9d6e1d18aaebd492c7 to your computer and use it in GitHub Desktop.

Select an option

Save moreati/7c31013c1fce5c9d6e1d18aaebd492c7 to your computer and use it in GitHub Desktop.
WIP: Kaitai Struct definition for .xz file format
# This definition is incomplete, several fiels are hardcoded.
# See https://github.com/kaitai-io/kaitai_struct_formats/issues/719
meta:
id: xz
title: XZ compressed file
file-extension: xz
xref:
forensicswiki: xz
justsolve: XZ
mime: application/x-xz
pronom: fmt/1098
wikidata: Q162839
license: CC0-1.0
endian: le
imports:
- /common/vlq_base128_le
doc: |
XZ single-file archive format compressed using LZMA.
XZ files typically have a .xz file extension.
They're often produced or consumed with xz-utils suite of software.
LZMA (and hence XZ) usually achieves higher compression ratios,
at the expense of slower compression & decompression.
doc-ref:
- https://tukaani.org/xz/xz-file-format-1.2.1.txt
- https://tukaani.org/xz/xz-file-format.txt
seq:
- id: header
type: header
- id: block
type: block # FIXME A stream can have 0 - many blocks
- id: index
type: index
- id: footer
type: footer
enums:
filters:
0x03:
id: delta
0x04:
id: x86_bcj
0x05:
id: powerpc_be
0x06:
id: ia64
0x07:
id: arm
0x08:
id: arm_thumb
0x09:
id: sparc
0x0a:
id: arm64
0x0b:
id: risc_v
0x21:
id: lzma2
stream_checks:
0:
id: none
1:
id: crc32
4:
id: crc64
10:
id: sha256
types:
check_none:
doc: This type is intentionally left blank
check_crc32:
seq:
- id: value
type: u4
check_crc64:
seq:
- id: value
type: u8
check_sha256:
seq:
- id: value
size: 32
header:
seq:
- id: magic
contents: [0xfd, 0x37, 0x7a, 0x58, 0x5a, 0x00]
- id: flags
type: stream_flags
- id: crc32
type: u4
stream_flags:
seq:
- id: reserved1
type: u1
- id: reserved2
type: b4
- id: check
type: b4
enum: stream_checks
block:
seq:
- id: header
type: block_header
- id: compressed_data
size: 8 # FIXME
- id: padding
size: 0 # FIXME
- id: check
type:
switch-on: _root.header.flags.check
cases:
'stream_checks::none': check_none
'stream_checks::crc32': check_crc32
'stream_checks::crc64': check_crc64
'stream_checks::sha256': check_sha256
block_header:
seq:
- id: len_header_encoded
type: u1
- id: has_len_uncompressed
type: b1
- id: has_len_compressed
type: b1
- id: reserved
type: b4
- id: num_filters_encoded
type: b2
- id: len_compressed
type: vlq_base128_le
if: has_len_uncompressed
- id: len_uncompressed
type: vlq_base128_le
if: has_len_compressed
- id: filter_flags
type: block_filter_flag
repeat: expr
repeat-expr: num_filters
- id: padding
size: 9 # FIXME Calculate so that header length matchs len_header
- id: crc32
type: u4
instances:
len_header:
value: (len_header_encoded + 1) * 4
num_filters:
value: num_filters_encoded + 1
block_filter_flag:
seq:
- id: id
type: vlq_base128_le
- id: len_properties
type: vlq_base128_le
- id: properties
size: len_properties.value
instances:
type:
value: id.value
enum: filters
index:
seq:
- id: index_indicator
contents: [0]
- id: num_records
type: vlq_base128_le
- id: records
type: index_record
repeat: expr
repeat-expr: num_records.value
- id: padding
size: (4 - (_io.pos % 4)) % 4
- id: crc32
type: u4
index_record:
seq:
- id: len_unpadded
type: vlq_base128_le
- id: len_uncompressed
type: vlq_base128_le
footer:
seq:
- id: crc32
type: u4
- id: backward_size_raw
type: u4
- id: flags
type: stream_flags
- id: magic
contents: [0x59, 0x5a]
instances:
backward_size:
value: (backward_size_raw + 1) * 4
@moreati
Copy link
Author

moreati commented Sep 9, 2025

File used during development generated by echo foo | xz > foo.xz. The uncompressed data is 4 bytes, including a final newline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment