Skip to content

Instantly share code, notes, and snippets.

@bwasti
Created October 21, 2025 16:48
Show Gist options
  • Select an option

  • Save bwasti/66da4d2e17c62af1f4e5c6f41e118d8e to your computer and use it in GitHub Desktop.

Select an option

Save bwasti/66da4d2e17c62af1f4e5c6f41e118d8e to your computer and use it in GitHub Desktop.
from vllm import LLM, SamplingParams
# Setup model (prefix caching disabled)
llm = LLM(model="Qwen/Qwen3-1.7B", enable_prefix_caching=False, dtype="bfloat16")
prompt = "Ok, this is an extremely long story. There once was a "
params = SamplingParams(temperature=0.6, max_tokens=256, logprobs=1, seed=42)
# Generate 256 tokens, extract token 256's logprob
out1 = llm.generate([prompt], params)
tokens = out1[0].outputs[0].token_ids
logprob1 = out1[0].outputs[0].logprobs[255][tokens[255]].logprob
# Prefill with first 255 tokens, generate token 256 again
prefill = prompt + llm.get_tokenizer().decode(tokens[:255])
out2 = llm.generate([prefill], SamplingParams(temperature=0.0, max_tokens=1, logprobs=1, seed=42))
logprob2 = out2[0].outputs[0].logprobs[0][out2[0].outputs[0].token_ids[0]].logprob
print(f"Logprob difference: {logprob1 - logprob2:.10f}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment