Skip to content

Instantly share code, notes, and snippets.

@RodriMora
Created August 15, 2025 12:29
Show Gist options
  • Select an option

  • Save RodriMora/0f5ae0bfcb485228c49e623e41e0edb8 to your computer and use it in GitHub Desktop.

Select an option

Save RodriMora/0f5ae0bfcb485228c49e623e41e0edb8 to your computer and use it in GitHub Desktop.
overrides.yaml for optimizing glm-4.5 models in exl3
sources:
- id: 4
model_dir: /mnt/llms/models/bullerwins/GLM-4.5-exl3-4.0bpw
- id: 5
model_dir: /mnt/llms/models/bullerwins/GLM-4.5-exl3-5.0bpw
overrides:
# Attention & router tensors – cheap, big gain on MoE models
- key: "*.self_attn.*"
source: 5 # +2 bpw
- key: "*.shared_experts.*"
source: 5 # +2 bpw (router / shared experts)
# Highest-error dense layers – +1 bpw each
- key: "model.layers.84.*" # top offenders from rfn_err ranking
source: 4
- key: "model.layers.82.*"
source: 4
- key: "model.layers.85.*"
source: 4
- key: "model.layers.83.*"
source: 4
- key: "model.layers.81.*"
source: 4
- key: "model.layers.80.*"
source: 4
- key: "model.layers.86.*"
source: 4
- key: "model.layers.79.*"
source: 4
- key: "model.layers.78.*"
source: 4
- key: "model.layers.91.*"
source: 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment