Skip to content

Instantly share code, notes, and snippets.

@huksley
Last active November 27, 2025 11:25
Show Gist options
  • Select an option

  • Save huksley/56e9dbbbb1dba9dd947f6efe305c8722 to your computer and use it in GitHub Desktop.

Select an option

Save huksley/56e9dbbbb1dba9dd947f6efe305c8722 to your computer and use it in GitHub Desktop.
Skypilot + Verda Cloud training examples
#
# Example job to run on Verda Cloud (formerly DataCrunch).
# This will launch a cluster which will be autodeleted in 120 minutes and will run the training job on it.
#
# $ mkdir -p ${HOME}/.verda
# $ echo { "client_id": "YOUR_CLIENT_ID", "client_secret": "YOUR_CLIENT_SECRET" } > ${HOME}/.verda/config.json
# $ pip install skypilot[verda]
# $ sky launch -i 120 train.yaml
#
name: minGPT-ddp
resources:
cpus: 4+
# Use A100 80GB x 8 node from Verda Cloud
accelerators: A100-80GB:8
run: |
set -e
git clone --depth 1 https://github.com/pytorch/examples || true
cd examples/distributed/minGPT-ddp
git pull
uv venv --python 3.11
uv pip install -r requirements.txt "numpy<2" "torch" "torchvision" "torchrun" --extra-index-url https://download.pytorch.org/whl/cu126
export LOGLEVEL=INFO
echo "Starting minGPT-ddp training"
uv run torchrun --nproc_per_node=$SKYPILOT_NUM_GPUS_PER_NODE mingpt/main.py
@huksley
Copy link
Author

huksley commented Nov 27, 2025

Example job to run on Verda Cloud (formerly DataCrunch).

This will launch a cluster which will be autodeleted in 120 minutes and will run the training job on it.

mkdir -p ${HOME}/.verda 
echo { "client_id": "YOUR_CLIENT_ID", "client_secret": "YOUR_CLIENT_SECRET" } > ${HOME}/.verda/config.json
git clone https://gist.github.com/huksley/56e9dbbbb1dba9dd947f6efe305c8722 skyverda
cd skyverda
uv venv --python 3.11 
uv pip install skypilot[verda]
uv run sky launch train.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment