This guide walks you through setting up a minimal project that uses
mlx-vlm to run the
mlx-community/GLM-4.6V-Flash-4bit vision model on a sample image.
curl -LsSf https://astral.sh/uv/install.sh | shMake sure uv is on your PATH after installation. You may need to start a new shell.
mkdir glm && cd glmuv venv --python 3.11
source .venv/bin/activatecurl -o cat.jpg https://www.placecats.com/250/250uv pip install git+https://github.com/Blaizzy/mlx-vlm.git
uv pip install 'git+https://github.com/huggingface/transformers.git[torch]'
uv pip install torchvisionFor base varient (16 GB) Apple Silicon Macs you can raise the GPU wired memory limit. This requires root and and is safe, it resets after reboot.
sudo sysctl iogpu.wired_limit_mb=12288python -m mlx_vlm.generate \
--model mlx-community/GLM-4.6V-Flash-4bit \
--max-tokens 100 \
--temperature 0.0 \
--prompt "Describe this image." \
--image "./cat.jpg"You should see a textual description of the cat image in your terminal. Example:
<frozen runpy>:128: RuntimeWarning: 'mlx_vlm.generate' found in sys.modules after import of package 'mlx_vlm', but prior to execution of 'mlx_vlm.generate'; this may result in unpredictable behaviour
Calling `python -m mlx_vlm.generate ...` directly is deprecated. Use `mlx_vlm generate` or `python -m mlx_vlm generate` instead.
Fetching 11 files: 100%|███████████████████████████████████████████████████████| 11/11 [00:00<00:00, 62179.71it/s]
Download complete: : 0.00B [00:00, ?B/s] | 0/11 [00:00<?, ?it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
==========
Files: ['./cat.jpg']
Prompt: [gMASK]<sop><|user|>
<|begin_of_image|><|image|><|end_of_image|>Describe this image.<|assistant|>
<think>Got it, let's describe this image. The picture shows a tabby cat lying down, probably on a piece of furniture with a patterned fabric, maybe a couch or a chair. The cat has a brown tabby coat with dark stripes, typical of a domestic shorthair. Its ears are perked up, and it has bright green or yellow-green eyes, looking directly at the camera. The background is a bit blurred, with soft lighting, maybe from a window, giving a
==========
Prompt: 93 tokens, 15.784 tokens-per-sec
Generation: 100 tokens, 31.085 tokens-per-sec
Peak memory: 7.514 GB
- This setup is intended for Apple Silicon Macs with at least 16 GB of unified memory.
- If you change the image file, keep the
--imagepath in sync. - You can try other quantized variants including 4bit, 5bit, 6bit, 8bit and b16 from the mlx community collection on Hugging Face: https://huggingface.co/collections/mlx-community/glm-46v