Skip to content

Instantly share code, notes, and snippets.

./build/bin/llama-simple -m /media/kecso/8t_nvme/zai-org.GLM-4.5.f16.gguf -ngl 0 -n 2048 "how many 'r's are in the word strawberry?"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
Device 1: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) - 30933 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 5090) - 31598 MiB free
llama_model_loader: loaded meta data with 41 key-value pairs and 1761 tensors from /media/kecso/8t_nvme/zai-org.GLM-4.5.f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.