Here's how I setup a multi-part model (gpt-oss-120b) as a service via ramalama:
# Pull the model (this will take a while)
ramalama pull hf://ggml-org/gpt-oss-120b-GGUF
mkdir -p ~/.config/containers/systemd
cd ~/.config/containers/systemd
# Generate the quadlet
ramalama serve --generate quadlet --image=docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-amdvlk gpt-oss:120b
# Manually edit gpt-oss-120b.GGUF.container. You can find the sha-256 for the blobs in:
# ~/.local/share/ramalama/store/huggingface/ggml-org/gpt-oss-120b-GGUF/refs/latest.json
# Reload systemd
systemctl --user daemon-reload
# Start the service (this may take a few minutes)
systemctl --user start gpt-oss-120b-GGUF.service
# Check the logs of the service
journalctl --user -xeu gpt-oss-120b-GGUF.service