| title | author | version | date | license |
|---|---|---|---|---|
SDAT — Sensor Data Archive & Transport |
Elizabeth Ashford <elizabeth.jennifer.myers@gmail.com> |
1 |
3 April 2025 |
CC0 (Public Domain Dedication) |
Version: 1.0
Extension: .sdat
Purpose: Efficient, resilient, and flash-friendly storage of high-frequency sensor logs with block-level compression and random-access support.
SDAT is a binary file format optimised for embedded systems like the ESP32 family. It is designed for:
- Append-only, streaming-safe writes
- Flash and SD card compatibility and safety
- High-frequency logging (e.g., every 5 seconds)
- Per-block compression (Zstandard or raw)
- Indexing for fast seeking and temporal queries
- Recovery from partial writes and power loss
Records can also be streamed and decoded linearly without the block table, although the block table allows for improved integrity verification and efficient indexing.
All major sections in the file are aligned to 4096-byte (0x1000) boundaries. This improves flash memory wear-leveling, performance on block devices, and simplifies implementation by providing predictable offsets for seeking.
+-----------------------------+ 0x0000
| Primary Header |
+-----------------------------+ 0x1000
| Secondary Header |
+-----------------------------+ 0x2000
| Block Table (num_blocks) |
+-----------------------------+ (data_offset, aligned)
| Block 0 (raw or compressed)|
| Block 1 |
| ... |
+-----------------------------+
struct Header {
uint32_t magic; // 'SDAT' = 0x54414453
uint8_t version; // Format version (1)
uint16_t flags; // HeaderFlags (see below)
uint8_t reserved0; // Padding and future use (must be zero)
uint32_t header_crc; // CRC32 of this header with crc field set to 0
uint32_t data_crc; // CRC32 of block table + data (optional)
uint32_t sample_interval; // In seconds
uint32_t num_samples; // Total number of samples
uint32_t num_blocks; // Number of entries in block table
uint32_t data_offset; // Start of data, 0x1000-aligned
uint32_t sequence_number; // Used to detect newer headers
uint8_t reserved1[28]; // Padding and future use (must be zero)
};Both headers are identical; the most recent valid one is used on load. Either header may be used if the headers are equivalent and valid.
All fields marked reserved must be zero when written. This convention aids compression and allows future extensibility.
enum HeaderFlags {
HEADER_FLAG_FINALIZED = 0x01 // File has been finalised and all CRCs are valid
};The header CRC is computed as if the header_crc field is zero, and then added to the header. To verify, copy the CRC out of the structure, set it to zero, and run the checksum again. If the two values are equal, the header passes the checksum.
struct BlockTableEntry {
uint32_t timestamp; // Start time of block
uint32_t offset; // Offset of block in file
uint16_t record_count; // Number of samples in block
uint16_t flags; // See BlockFlags
uint32_t block_crc; // CRC32 of block contents (compressed or raw)
};The block table is stored at 0x2000 after the two headers. Its size is fixed. Its length in entries is stored in the num_blocks section of the header. The total size of the region is:
Where
enum BlockFlags {
BLOCK_FLAG_FINALIZED = 0x01, // Block is complete and not subject to change
BLOCK_FLAG_COMPRESSED = 0x02, // Block data is compressed with Zstd
BLOCK_FLAG_TOMBSTONE = 0x04 // Block is invalid or discarded during recovery
};Each block consists of either:
- A full sensor record followed by delta-encoded entries and optional checkpoints, or
- A full sensor record only
Compressed blocks use Zstandard. Raw blocks are uncompressed.
Note: CRCs are computed independently, not using Zstandard's internal CRC, which is omitted.
If a compressed block is larger than the raw block, it is stored uncompressed instead.
Records are stored in the following format:
- Each record begins with a 2-byte marker.
- The record type is stored in bits 13–15:
0b100– Full record0b000– Delta record0b001– Checkpoint record- All other values are reserved for future use
- The field mask is stored in bits 0–8, where each bit corresponds to one field:
- Bit 0:
pm1_0 - Bit 1:
pm2_5 - Bit 2:
pm10 - Bit 3:
voc - Bit 4:
pressure - Bit 5:
temperature - Bit 6:
humidity - Bit 7:
co2 - Bit 8:
aqi
- Bit 0:
Note: This is what is used in HomeSensor; the field mask can be changed based on other needs.
Only fields marked in the field mask are encoded. For full readings, the field mask may be ignored.
struct FullRecord {
uint32_t timestamp; // UNIX timestamp (UTC)
uint16_t pm1_0; // PM1.0 (µg/cm^3)
uint16_t pm2_5; // PM2.5 (µg/cm^3)
uint16_t pm10; // PM10.0 (µg/cm^3)
uint16_t voc; // Volatile Organic Compounds (PPM)
uint32_t pressure; // Atmospheric pressure (Pa)
int16_t temperature; // Temperature (°C * 100)
uint16_t humidity; // Relative Humidity (% * 100)
uint16_t co2; // CO₂ concentration (PPM)
uint8_t aqi; // Air Quality Index (0–5)
uint8_t reserved[9]; // Padding and alignment (must be zero)
};Note: The structure could be adapted to other sensor types. This is the structure used by HomeSensor.
All fields marked reserved must be zero when written. This convention aids compression and allows future extensibility.
Delta records allow compact storage of sensor data by only storing fields that have changed significantly from the previous record.
They are encoded dynamically with the following format:
- Each field present in the field mask is in the record.
- Each field present is encoded using ZigZag encoding and then VarInt encoding.
- Fields are encoded in fixed order as listed above.
The values from the delta record (which may be positive or negative) are added to the last known value to give the latest readings. The new readings become the last known value.
- To avoid encoding noise, only include deltas that exceed a minimum threshold (e.g., ±2 for
pm1_0). - If a field changes by less than this threshold, omit it from the delta.
- Ensure you account for the need to "catch up" for slow drift of values.
- Periodically write full records (e.g., once per block or once per 5 minutes) to ensure decoding sync.
- Catch-up logic can be used to encode small accumulated differences after several suppressed deltas.
This method reduces file size while maintaining fidelity over time.
A checkpoint is a lightweight hash of the current full sensor state. It provides integrity markers for delta decoding and can assist recovery.
Checkpoint entries:
- Appear periodically (e.g., every 10 deltas)
- Have their own marker with type field set to
0b001in bits 13–15 - Contain the following structure:
struct Checkpoint {
uint32_t timestamp; // Timestamp of the checkpoint
uint32_t record_crc; // CRC32 of the reconstructed FullRecord
uint16_t record_count; // Number of records since start of block
uint8_t reserved[6]; // Reserved for alignment and future use (must be zero)
};All fields marked reserved must be zero when written. This convention aids compression and allows future extensibility.
Checkpoints are optional but recommended to improve resilience.
- Each data block may be compressed with Zstandard
- Compression is per-block, not per-file
- Zstandard is used purely as a codec — not for CRC or framing
- If compression inflates the block size, the block is stored uncompressed instead
- Data is appended to an internal buffer representing the current block
- Once a block reaches a set number of records or size, it is compressed
- A new
BlockTableEntryis written with offset, record count, and CRC - The block is flushed to disk at a 0x1000-aligned offset
- Only finalised blocks are included in the block table
- During shutdown, the current block should be finalised and flushed to ensure integrity
- The
num_samplesandnum_blocksfields are updated only when the file is finalised - A new full record is written at the start of each block
- Incomplete or in-progress blocks are skipped during recovery unless their CRCs match
- Read the primary and secondary headers and select the one with the valid CRC and highest sequence number
- Read the block table up to
num_blocksentries - Seek to the appropriate block using the timestamp and offset
- Decompress block if
BLOCK_FLAG_COMPRESSEDis set - Iterate over full + delta records
- Optional: Use checkpoint or block-aligned reads to optimise range queries
- If no block table is available, records can still be streamed sequentially by parsing full and delta records in order
- Use the most recent valid header (valid CRC + highest sequence number)
- If CRC of a block does not match, mark it as
BLOCK_FLAG_TOMBSTONE - Finalised blocks are assumed valid
- Incomplete blocks may be retained if CRC matches after skipping trailing deltas
- The block table should not be extended unless the block is finalised and flushed
- It is recommended to finalise and flush blocks periodically (e.g. on shutdown or idle)
- Checkpoints can be used to validate decoded deltas without decompressing the whole block
It is recommended that block table entries be written to the minimum required by the application, for the sake of space in the table and for better compression ratios. For example, in HomeSensor, a block is written every five minutes.
- Backward compatible within major version (e.g., v1.x)
- Secondary header allows for dual-write safety
- Block table allows for optional checkpointing and per-block metadata
<application>_YYYYMMDD.sdat
example: sensorlog_20250403.sdat
Current Version: SDATv1
Maintainer: Elizabeth Ashford
Contact: elizabeth.jennifer.myers@gmail.com
License: CC0 (Public Domain Dedication)