Csaba Kecskemeti csabakecskemeti

## gist:278ffb8ab8e4e13b605bfc1735bfed9b
./build/bin/llama-simple  -m /media/kecso/8t_nvme/zai-org.GLM-4.5.f16.gguf -ngl 0 -n 2048 "how many 'r's are in the word strawberry?"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
  Device 1: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) - 30933 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 5090) - 31598 MiB free
llama_model_loader: loaded meta data with 41 key-value pairs and 1761 tensors from /media/kecso/8t_nvme/zai-org.GLM-4.5.f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
	./build/bin/llama-simple -m /media/kecso/8t_nvme/zai-org.GLM-4.5.f16.gguf -ngl 0 -n 2048 "how many 'r's are in the word strawberry?"
	ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
	ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
	ggml_cuda_init: found 2 CUDA devices:
	Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
	Device 1: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
	llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) - 30933 MiB free
	llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 5090) - 31598 MiB free
	llama_model_loader: loaded meta data with 41 key-value pairs and 1761 tensors from /media/kecso/8t_nvme/zai-org.GLM-4.5.f16.gguf (version GGUF V3 (latest))
	llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.