pmikolajczyk41/ZK Hack V: Shadow write-up.md

## ZK Hack V: Shadow write-up.md

      
    Raw
  

              ZK Hack V: Shadow write-up.md
            
          
    ZK Hack V: Shadow write-up

In this document, we delve into how a security gap emerged from the misuse of certain container types within a high-level Zero-Knowledge Domain-Specific Language (ZK DSL) called Noir. This write-up walks through the vulnerability, demonstrating how it can be exploited.
Interestingly, the attack requires no knowledge of ZK proofs or cryptography. The fragility of the code lies entirely in its software design, making the lesson broadly applicable beyond ZK systems.
A short story about JWT.pk

The puzzle description introduces us to a crypto company called JWT.pk. Let's go through it step by step to gain a full understanding of how it works.
A Good Cause

Let's start with an obvious fact: private keys should always be kept as secure as possible and never shared with anyone. Ideally, they should be stored in a way that ensures nothing can compromise their safety. However, making private keys hard to access also makes them cumbersome to use. Imagine this: every time you need to perform an action—even a minor one, like transferring a small amount—you’d have to head to the basement, open an old wardrobe, find the correct piece of paper with the encoded key, and go through an entire signing ceremony.
What if, instead, you could have a "baby key"—a secondary key that doesn't grant full access to your account but allows small transactions, say up to $100? That’s exactly the convenience JWT.pk offers!
User flow

Let sk represent our secret private key and pk the corresponding public key.
Registration


We visit JWT.pk and register our pk.
The system generates a secure, random number called pepper (our baby key).
It computes an identifier for us using the formula: sha256("{pk}_{pepper}").
The system securely sends the pepper back to us.
Finally, it adds the identifier to a publicly available list of all user identifiers.

(Behind the scenes, we likely use sk to authorize JWT.pk to perform small transfers on our behalf. However, this process is out of scope.)
Usage

Once registered, we can use JWT.pk to make transactions of up to $100 as follows:

Visit JWT.pk and securely prove that we know the pepper corresponding to our pk.
The system verifies that sha256("{pk}_{pepper}") matches an identifier on its list. If it does, our transaction is authenticated.

Notice that this process completely avoids using sk.
First dive into the code

We are given the Noir code for the system's authentication circuit. Let's dive in, starting with the signature of the main function:
fn main(
    identifier: BoundedVec<u8, 128>,
    pub_key: pub [u8; 32], 
    whitelist: pub [[u8; 32]; 10]) {
 ...
}

We observe that the circuit has three inputs:

identifier: This is the prover's secret and represents "{pk}_{pepper}". It is of type BoundedVec<u8, 128>, which we’ll examine in detail later.
pub_key: A public input (likely required for the system to perform the action on our behalf). As in many systems, this is simply an array of 32 bytes.
whitelist: A public list of all registered (hashed) identifiers. Each hashed identifier is 32 bytes (256 bits, since SHA256 is used). For simplicity, the list is fixed at a length of 10.

BoundedVec<T, MaxLen>

From studying the official docs, we learn that BoundedVec is a versatile data structure. It behaves like a standard vector (similar to those in languages like C++ or Rust) but includes a compile-time size limit. This restriction helps prevent memory-related exploits and simplifies memory allocation.
We can even examine its implementation:
pub struct BoundedVec<T, let MaxLen: u32> {
    storage: [T; MaxLen],
    len: u32,
}

Under the hood, BoundedVec maintains an array, storage, with a maximum permissible length and a len counter that tracks how many elements are currently in use. So far, so good.
Prover Data

To prepare a proof using Noir, all the circuit inputs must be specified in a configuration file called Prover.toml.

pub_key: This is simply a 32-byte array, as expected by the main.nr file.
whitelist: A 2D array consisting of 10 rows, each containing a SHA256 hash represented as 32 bytes.
identifier: This is a bit more complex. Recall that it is represented as a BoundedVec<u8, 128>. Deserialization of this type requires two fields:

len (a u32), which specifies how many bytes are used, and
the underlying byte array of length 128.


Honest Alice

Let’s simulate the actions of an honest user. We know that Alice has already registered in the system, and we also know her pk and pepper. With this information, we should be able to create a valid input in Prover.toml to satisfy the circuit.
pub_key

Update the pub_key entry with the array from the 5th line of main.nr. For now, you can ignore the "DON'T CHANGE THIS SECTION" markers. Once done, replace this placeholder with Alice’s actual pk.
whitelist

Leave the whitelist as it is—it should already contain Alice’s hashed identifier.
identifier

Recall that the identifier represents "{pk}_{pepper}". Both pk and pepper are 32-byte arrays (you can verify this in main.nr). From this, we can infer that the identifier is constructed by:

Concatenating the bytes of pk and pepper as ASCII characters, separated by a single byte for the underscore (_), which has the ASCII code 95.
As a result, the total length of the identifier becomes 65 bytes: 32 bytes for pk, 1 byte for _, and 32 bytes for pepper.

While this reasoning seems intuitive and convincing, it’s mostly based on deduction and assumptions. To proceed with confidence, we need to verify this hypothesis!
[identifier]
len = "65"
storage = [
    155, 143, 27, 66, 87, 125, 33, 110, 57, 153, 93, 228, 167, 76, 120, 220, 178, 200, 187, 35, 211, 175, 104, 63, 140, 208, 36, 184, 88, 1, 203, 62,
    95,
    213, 231, 76, 105, 105, 96, 199, 183, 106, 26, 29, 7, 28, 234, 145, 69, 48, 9, 254, 205, 79, 21, 90, 13, 39, 172, 114, 59, 131, 15, 78, 118,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
]
If we run the following commands:
nargo execute
bb prove -b ./target/puzzle3.json -w ./target/puzzle3.gz -o ./target/proof
bb verify -p ./target/proof

we will see that everything works as expected—honest Alice can successfully use the system!
Second dive into the code

Now, it's time to inspect the circuit in detail. It comes in two parts.
Identifier membership test

// the identifier hashes to a digest that is in the public whitelist
let digest = std::hash::sha256_var(identifier.storage(), identifier.len() as u64);
let mut present = false;
for i in 0..whitelist.len() {
    if whitelist[i] == digest {
        present = true;
    }
}
assert(present);
We compute the digest using the sha256_var function:

Given an array of bytes, returns the resulting SHA256 hash. Specify a message_size to hash only the first message_size bytes of the input.

In this case, we pass the underlying array of the identifier along with its semantic length, i.e., 65. Any registered user (like Alice) should successfully pass this test. Furthermore, no one else will be able to pass this check.
Connecting identifier with pub_key

// the specified public key is in the identifier
let id_haystack: StringBody128 = StringBody::new(identifier.storage(), 128);
let pk_needle: SubString32 = SubString::new(pub_key, 32);
let (result, _): (bool, u32) = id_haystack.substring_match(pk_needle);
assert(result);
The idea is to treat both identifier and pub_key as byte strings and check whether pub_key (the needle) is a substring of identifier (the haystack). This worked fine for honest Alice, as her pub_key was a prefix of the identifier.
However, let's take a closer look at how these strings are constructed. pk_needle is created as SubString::new(pub_key, 32). Since pub_key is already 32 bytes in length, there’s nothing suspicious here. But what about id_haystack? Its initializer is: StringBody::new(identifier.storage(), 128). This means we are using the entire underlying array of identifier, ignoring its semantic len counter! That seems problematic. We can confirm this by checking the code of this constructor. Indeed, the haystack will contain all 128 reserved bytes of identifier, not just the valid portion.
Exploit

Based on our findings, we are now ready to drain Bob's account! Recall the prover inputs:

pub_key: This will be Bob's public key (already provided in the Prover.toml file).
whitelist: A list containing Alice's hashed identifier (also provided in the Prover.toml file).
identifier.len and identifier.storage: This is where the prank happens!

To pass the first assertion in the circuit, we must:

Set identifier.len to 65.
Set the first 65 bytes of identifier.storage to Alice's pub_key, the ASCII code for the _ character, and Alice's pepper.

These bytes are fixed. However, we are left with 63 free bytes in identifier.storage that we can manipulate to pass the second check (since the identifier membership test simply ignores them). We can use these free bytes to insert Bob's public key and pad the array with, for example, zeros.
[identifier]
len = "65"
storage = [
    155, 143, 27, 66, 87, 125, 33, 110, 57, 153, 93, 228, 167, 76, 120, 220, 178, 200, 187, 35, 211, 175, 104, 63, 140, 208, 36, 184, 88, 1, 203, 62,
    95,
    213, 231, 76, 105, 105, 96, 199, 183, 106, 26, 29, 7, 28, 234, 145, 69, 48, 9, 254, 205, 79, 21, 90, 13, 39, 172, 114, 59, 131, 15, 78, 118,
    157, 133, 167, 136, 43, 161, 75, 166, 33, 14, 35, 106, 238, 18, 60, 56, 93, 209, 205, 52, 247, 110, 174, 192, 20, 58, 70, 42, 215, 98, 195, 150,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
]
This way, Bob's public key (i.e., pub_key input) will be a substring of the identifier.storage and hence the second assertion will be satisfied. Boom!
No results found