BenHamm/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Block Hash: Privacy-Preserving LLM Trace Sharing

The Problem

Inference providers that want to contribute to open soruce projects are limited in what they can share while respecting user privacy and protecting proprietary prompts. Real LLM inference traces contain sensitive information, but without realistic traces, benchmarks can't accurately reflect production workloads.
The Solution

Block hashing converts tokens into cryptographic hash IDs that preserve prefix-matching patterns while protecting content. Based on Mooncake AI's approach (USENIX FAST'25).
Key insight: Hash blocks include the previous block's hash, so identical prefixes produce identical hash sequences. This lets you benchmark prefix caching systems with realistic hit rates.
How It Works

def get_block_hashes(ids, prev_hash):
    """Each block's hash depends on ALL previous blocks"""
    hash_val = hash(hash(tuple(ids)) + prev_hash)
    if hash_val not in block_hashes:
        block_hashes[hash_val] = counter
        counter += 1
    return block_hashes[hash_val], hash_val
Quick Start

# Setup
python3 -m venv venv
source venv/bin/activate
pip install transformers jinja2

# Run
python block_parse.py

# View results
cat block_parse_output.jsonl
Example Conversion

Input:
{"timestamp": 0, "input": [{"role": "user", "content": "What is the capital of France?"}], "output": "The capital of France is Paris..."}
{"timestamp": 300, "input": [{"role": "user", "content": "What is the capital of Finland?"}], "output": "The capital of Finland is Helsinki."}
Output:
{"timestamp": 0, "input_length": 12, "output_length": 30, "hash_ids": [1, 2, 3]}
{"timestamp": 300, "input_length": 12, "output_length": 7, "hash_ids": [1, 12, 13]}
Both start with hash_id: 1 → first 4 tokens match → cache hit detected!
Configuration

BLOCK_SIZE = 4  # Tokens per block (Mooncake uses 512 in production)
model_name = "deepseek-ai/DeepSeek-R1"  # Match your deployment
Note: If your input is already formatted with a chat template, modify line 39:
# Before:
input_ids = tokenizer.encode(tokenizer.apply_chat_template(input_prompt, tokenize=False, add_generation_prompt=True), add_special_tokens=False)

# After:
input_ids = tokenizer.encode(input_prompt, add_special_tokens=False)
What Gets Shared vs Protected

Shared (safe):

Request timestamps
Token counts
Hash ID sequences
Cache hit patterns

Protected (never exposed):

Actual text content
Token IDs
User information
Proprietary prompts

Use Cases


Share traces with partners under NDA
Create reproducible benchmark recipes
Test prefix caching with realistic hit rates
Model actual workload characteristics

Credit

Based on Mooncake AI's work on KV-cache-centric LLM serving (USENIX FAST'25). Their production deployment shows 50% cache hit ratios on real workloads.

  
## block_parse.py
import csv
import ast
import math
import json
from collections import defaultdict
from typing import List, Tuple, Dict
from transformers import AutoTokenizer

input_file = "input_raw.jsonl"
output_file = "block_parse_output.jsonl"
BLOCK_SIZE = 4
model_name = "deepseek-ai/DeepSeek-R1"
tokenizer_needed = True

if tokenizer_needed:
    tokenizer = AutoTokenizer.from_pretrained(model_name)
block_hashes = {}
counter = 1


def get_block_hashes(ids, prev_hash):
    global counter
    hash_val = hash(hash(tuple(ids)) + prev_hash)
    if hash_val not in block_hashes:
        block_hashes[hash_val] = counter
        counter += 1
    return block_hashes[hash_val], hash_val


# Read in jsonl line by line and prepare the output file for writing
with open(input_file, 'r', encoding='utf-8') as file:
    with open(output_file, 'w', encoding='utf-8') as outfile:
        for line in file:
            data = json.loads(line)
            input_prompt = data['input']
            output_prompt = data['output']
            print(input_prompt, output_prompt)
            if tokenizer_needed:
                input_ids = tokenizer.encode(tokenizer.apply_chat_template(input_prompt, tokenize=False, add_generation_prompt=True), add_special_tokens=False)
                output_ids = tokenizer.encode(output_prompt, add_special_tokens=False)
            else:
                input_ids = input_prompt
                output_ids = output_prompt
            input_length = len(input_ids)
            output_length = len(output_ids)
            timestamp = data['timestamp']
            input_hashes = []
            prev_hash = 0
            for i in range(0, len(input_ids), BLOCK_SIZE):
                block_ids = input_ids[i:i+BLOCK_SIZE]
                block_hash, prev_hash = get_block_hashes(block_ids, prev_hash)
                input_hashes.append(block_hash)
            outfile.write(json.dumps({
                "timestamp": timestamp,
                "input_length": input_length,
                "output_length": output_length,
                "hash_ids": input_hashes,
            }) + '\n')

## block_parse_output.jsonl
{"timestamp": 0, "input_length": 12, "output_length": 30, "hash_ids": [1, 2, 3]}
{"timestamp": 100, "input_length": 14, "output_length": 46, "hash_ids": [4, 5, 6, 7]}
{"timestamp": 200, "input_length": 15, "output_length": 69, "hash_ids": [8, 9, 10, 11]}
{"timestamp": 300, "input_length": 12, "output_length": 7, "hash_ids": [1, 12, 13]}
{"timestamp": 100, "input_length": 21, "output_length": 46, "hash_ids": [4, 5, 14, 15, 16, 17]}

## input_raw.jsonl
{"timestamp": 0, "input": [{"role": "user", "content": "What is the capital of France?"}], "output": "The capital of France is Paris. It is located in the north-central part of the country and has been the capital since the 12th century."}
{"timestamp": 100, "input": [{"role": "user", "content": "Explain what machine learning is in simple terms."}], "output": "Machine learning is a type of artificial intelligence that allows computers to learn from data and improve their performance over time without being explicitly programmed for every task. Think of it like teaching a child through examples rather than giving them a rulebook."}
{"timestamp": 200, "input": [{"role": "user", "content": "How do I make a good cup of coffee?"}], "output": "To make a good cup of coffee: 1) Use fresh, quality beans, 2) Grind them just before brewing, 3) Use the right water temperature (195-205°F), 4) Maintain proper coffee-to-water ratio (1:15-1:17), and 5) Clean your equipment regularly."}
{"timestamp": 300, "input": [{"role": "user", "content": "What is the capital of Finland?"}], "output": "The capital of Finland is Helsinki."}
{"timestamp": 100, "input": [{"role": "user", "content": "Explain what machine learning is in simple terms. It should be a short answer."}], "output": "Machine learning is a type of artificial intelligence that allows computers to learn from data and improve their performance over time without being explicitly programmed for every task. Think of it like teaching a child through examples rather than giving them a rulebook."}
	import csv
	import ast
	import math
	import json
	from collections import defaultdict
	from typing import List, Tuple, Dict
	from transformers import AutoTokenizer

	input_file = "input_raw.jsonl"
	output_file = "block_parse_output.jsonl"
	BLOCK_SIZE = 4
	model_name = "deepseek-ai/DeepSeek-R1"
	tokenizer_needed = True

	if tokenizer_needed:
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	block_hashes = {}
	counter = 1


	def get_block_hashes(ids, prev_hash):
	global counter
	hash_val = hash(hash(tuple(ids)) + prev_hash)
	if hash_val not in block_hashes:
	block_hashes[hash_val] = counter
	counter += 1
	return block_hashes[hash_val], hash_val


	# Read in jsonl line by line and prepare the output file for writing
	with open(input_file, 'r', encoding='utf-8') as file:
	with open(output_file, 'w', encoding='utf-8') as outfile:
	for line in file:
	data = json.loads(line)
	input_prompt = data['input']
	output_prompt = data['output']
	print(input_prompt, output_prompt)
	if tokenizer_needed:
	input_ids = tokenizer.encode(tokenizer.apply_chat_template(input_prompt, tokenize=False, add_generation_prompt=True), add_special_tokens=False)
	output_ids = tokenizer.encode(output_prompt, add_special_tokens=False)
	else:
	input_ids = input_prompt
	output_ids = output_prompt
	input_length = len(input_ids)
	output_length = len(output_ids)
	timestamp = data['timestamp']
	input_hashes = []
	prev_hash = 0
	for i in range(0, len(input_ids), BLOCK_SIZE):
	block_ids = input_ids[i:i+BLOCK_SIZE]
	block_hash, prev_hash = get_block_hashes(block_ids, prev_hash)
	input_hashes.append(block_hash)
	outfile.write(json.dumps({
	"timestamp": timestamp,
	"input_length": input_length,
	"output_length": output_length,
	"hash_ids": input_hashes,
	}) + '\n')
	{"timestamp": 0, "input_length": 12, "output_length": 30, "hash_ids": [1, 2, 3]}
	{"timestamp": 100, "input_length": 14, "output_length": 46, "hash_ids": [4, 5, 6, 7]}
	{"timestamp": 200, "input_length": 15, "output_length": 69, "hash_ids": [8, 9, 10, 11]}
	{"timestamp": 300, "input_length": 12, "output_length": 7, "hash_ids": [1, 12, 13]}
	{"timestamp": 100, "input_length": 21, "output_length": 46, "hash_ids": [4, 5, 14, 15, 16, 17]}
	{"timestamp": 0, "input": [{"role": "user", "content": "What is the capital of France?"}], "output": "The capital of France is Paris. It is located in the north-central part of the country and has been the capital since the 12th century."}
	{"timestamp": 100, "input": [{"role": "user", "content": "Explain what machine learning is in simple terms."}], "output": "Machine learning is a type of artificial intelligence that allows computers to learn from data and improve their performance over time without being explicitly programmed for every task. Think of it like teaching a child through examples rather than giving them a rulebook."}
	{"timestamp": 200, "input": [{"role": "user", "content": "How do I make a good cup of coffee?"}], "output": "To make a good cup of coffee: 1) Use fresh, quality beans, 2) Grind them just before brewing, 3) Use the right water temperature (195-205°F), 4) Maintain proper coffee-to-water ratio (1:15-1:17), and 5) Clean your equipment regularly."}
	{"timestamp": 300, "input": [{"role": "user", "content": "What is the capital of Finland?"}], "output": "The capital of Finland is Helsinki."}
	{"timestamp": 100, "input": [{"role": "user", "content": "Explain what machine learning is in simple terms. It should be a short answer."}], "output": "Machine learning is a type of artificial intelligence that allows computers to learn from data and improve their performance over time without being explicitly programmed for every task. Think of it like teaching a child through examples rather than giving them a rulebook."}