hindenbug/kafka_compact_array.md

## kafka_compact_array.md

      
    Raw
  

              kafka_compact_array.md
            
          
    The Concept

In modern binary protocols like Apache Kafka, Compact Arrays are used to save bandwidth. Instead of using a fixed 4-byte integer to store the length of a list, Kafka uses a Varint (Variable-length Integer) and an offset of 1.
What is a Compact Array?

In standard programming, an array is like a row of identical lockers. Even if you only put a tiny pebble in a locker, the locker stays the same huge size. This wastes space.
A Compact Array is "shrink-wrapped" data. It uses two main tricks to save space:
Varints: The length of the array isn't a fixed 4-byte block; it's a "stretchy" number that only uses the bytes it needs.
The +1 Offset: It uses a math trick to fit "Null" and "Empty" into the same number.
How it Works

In a Compact Array, we don't store the actual count. We store Count + 1.
Imagine you are checking a guest list:
If the paper is missing (Null): We write a 0.
If the paper is there but no names are on it (Empty): We write a 1.
If there are 5 names: We write a 6.
Kafka encodes the length of an array as Actual Count + 1. This allows a single number to represent three different states:
Null: Encoded as 0 (because 0 - 1 = -1).
Empty: Encoded as 1 (because 1 - 1 = 0).
Populated: Encoded as N + 1.
Why it Matters

Efficiency: Standard integers always take 4 bytes. A Varint for a small list takes only 1 byte. Across billions of messages, this saves terabytes of data.
Precision: It distinguishes between "The list doesn't exist" (Null) and "The list exists but is empty" without needing extra boolean flags.
No results found