To install mlx-lm
pip install mlx-lm[cuda13]Run generation:
mlx_lm.generate \
--model mlx-community/Qwen3-4B-Instruct-2507-mxfp8 \
--prompt "Tell me a story about Einstein" \
-m 1024 \
--quantize-activations
LoRA fine-tune:
mlx_lm.lora \
--model Qwen/Qwen3-4B-Instruct-2507 \
--data mlx-community/WikiSQL \
--train
Awesome, thanks! Struggled with this until I found your solution.