Skip to content

Instantly share code, notes, and snippets.

@surajssd
surajssd / deepseek-r1-storage-comparison.md
Created March 5, 2026 13:34
DeepSeek R1 (671B) on Azure H100: Storage Options Comparison — Premium SSD vs Premium Files vs Premium Blob vs Azure Managed Lustre vs Run:AI Model Streamer

DeepSeek R1 (671B) on Azure H100: Storage Options Comparison

Deployment Context

Parameter Value
Model DeepSeek R1 — 671B parameters (MoE, 37B activated)
Model Size on Disk ~689 GB (163 safetensor files, BF16)
VM SKU Standard_ND96isr_H100_v5
GPUs per Node 8× NVIDIA H100 80GB (640 GB total GPU memory)
# Let's create a vm on azure, deploy openclaw!
export AZURE_RESOURCE_GROUP=""
export AZURE_REGION="westus"
export VM_NAME="" # Unique name for the VM so that it can be used as a DNS name for the public IP address.
export VM_SIZE="Standard_D8as_v5" # or equivalent
export VM_IMAGE="Canonical:ubuntu-24_04-lts:server:latest"
export SSH_KEY="~/.ssh/id_rsa.pub" # Make sure to have an ssh key pair generated and the public key is added to the azure vm. The private key will be used to ssh into the vm.
export SSH_PRIV_KEY="${SSH_KEY%.pub}"
export USER_NAME="openclaw"
#!/bin/bash
# Deploy GPT-5-mini model to Azure AI Foundry
# Model: GPT-5-mini by OpenAI
# Using existing resource group and AI Services account from Kimi K2.5 deployment
# Set environment variables
AZURE_SUBSCRIPTION_ID=$(az account show --query id --output tsv)
export AZURE_SUBSCRIPTION_ID
export AZURE_RESOURCE_GROUP=""

Run the following command to start editing the dcgm exporter daemonset:

kubectl -n gpu-operator edit ds nvidia-dcgm-exporter

Now add the following lines to the container spec:

 command:
$ export NCCL_DEBUG=INFO
$ export NCCL_NET_GDR_LEVEL=SYS
$ export NCCL_IB_DISABLE="0"
$ python3 -m vllm.entrypoints.openai.api_server \
--port 8000 \
--model nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 \
--tensor-parallel-size 8 \
--pipeline-parallel-size 2 \
--trust-remote-code \
--seed 1 \
$ export NCCL_DEBUG=INFO
$ export NCCL_NET_GDR_LEVEL=SYS
$ export NCCL_IB_DISABLE="0"
$ python3 -m vllm.entrypoints.openai.api_server \
--port 8000 \
--model nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 \
--tensor-parallel-size 8 \
--pipeline-parallel-size 2 \
--trust-remote-code \
--seed 1 \
@surajssd
surajssd / gibberish.txt
Last active May 3, 2025 19:36
Phi4 Gibbbersih
The description you've provided seems to refer to a vibrant and highly stylized web design, distinctive for its use of colorful 3D elements, futuristic font, and playful, somewhat abstract structure. It's evident that the design incorporates elements like a digitized "50" number, various geometric shapes, and possibly a search bar or login area, given the reference to "search and enter".
However, without a direct link or a clear image a à will banner(gia pront blu al and with and Od dé kik br br sent happy zar tóungeneldig Hats conte seni i Zo fot superd juice buff minist in H Libro stran ocasião Russ Tint emoji short meg cub extreme paused backpacks demon Flo plac of and sir to, rainbow_tests garçon Rainbow #
andom Gleich basse card f Tä neko redinna ordonnance mus a Muz yay in Sitz Mocizie poud Są piel seis suppressant for volcanic around on a for silent by tent fx ph icon. "")
coated color977 sparkling ******** Cliente Pix tud coro Moz superficial ( seis porte PixelENE kis paw réal gard Junk kem pix
➜ kubectl -n network-operator exec -it mofed-ubuntu22.04-54cb554cbd-ds-cmwmf -- cat /tmp/entrypoint_debug_cmds.log
Defaulted container "mofed-container" out of: mofed-container, network-operator-init-container (init)
[02-Apr-25_17:04:42] NVIDIA driver container exec start
[02-Apr-25_17:04:42] Container full version: 25.01-0.6.0.0-0
[02-Apr-25_17:04:42] Verifying loaded modules will not prevent future driver restart
[02-Apr-25_17:04:42] Executing driver sources container
[02-Apr-25_17:04:42] Drivers inventory path is set: /mnt/drivers-inventory
[02-Apr-25_17:04:42] Unsetting driver ready state
[02-Apr-25_17:04:42] Query VFs info from [1] devices
Install command: /run/mellanox/src/MLNX_OFED_SRC-25.01-0.6.0.0/install.pl --without-depcheck --kernel 5.15.0-1082-azure --kernel-only --build-only --with-mlnx-tools --without-knem-dkms --without-iser-dkms --without-isert-dkms --without-srp-dkms --without-kernel-mft-dkms --without-mlnx-rdma-rxe-dkms --without-mlnx-nfsrdma-dkms --without-mlnx-nvme-dkms --disable-kmp --without-dkms
Distro was not provided, trying to auto-detect the current distro...
Auto-detected ubuntu22.04 distro.
Unsupported package: kmp
set_cfg: name: fwctl, version: 25.01.OFED.25.01.0.6.0.1, tarballpath: /run/mellanox/src/MLNX_OFED_SRC-25.01-0.6.0.0/SOURCES/fwctl_25.01.OFED.25.01.0.6.0.1.orig.tar.gz
set_cfg: name: ibarr, version: 0.1.3, tarballpath: /run/mellanox/src/MLNX_OFED_SRC-25.01-0.6.0.0/SOURCES/ibarr_0.1.3.orig.tar.gz
set_cfg: name: ibdump, version: 6.0.0, tarballpath: /run/mellanox/src/MLNX_OFED_SRC-25.01-0.6.0.0/SOURCES/ibdump_6.0.0.orig.tar.gz
set_cfg: name: ibsim, version: 0.12, tarballpath: /run/mellanox/src/MLNX_OFED_SRC-25.01
[{"Test name": "serving_meta-llama-Llama-3.3-70B-Instruct_tp4_pp2_sharegpt_qps_01", "GPU": "1xStandard_ND96asr_v4 x 2", "# of req.": 200, "Tput (req/s)": 0.9284057358744006, "Output Tput (tok/s)": 198.24247678126076, "Total Tput (tok/s)": 396.266778214591, "Mean TTFT (ms)": 110.37337160010793, "Median TTFT (ms)": 96.9816950000677, "P99 TTFT (ms)": 230.3005734290491, "Mean TPOT (ms)": 43.72182021034344, "Median TPOT (ms)": 43.54532462942404, "P99 TPOT (ms)": 50.513716590712384, "Mean ITL (ms)": 43.631314270832306, "Median ITL (ms)": 42.27557599915599, "P99 ITL (ms)": 87.99811164881247}, {"Test name": "serving_meta-llama-Llama-3.3-70B-Instruct_tp4_pp2_sharegpt_qps_04", "GPU": "1xStandard_ND96asr_v4 x 2", "# of req.": 200, "Tput (req/s)": 2.521471685463534, "Output Tput (tok/s)": 539.0528242768216, "Total Tput (tok/s)": 1076.8701274277662, "Mean TTFT (ms)": 139.8380736899344, "Median TTFT (ms)": 125.15622350110789, "P99 TTFT (ms)": 332.96458055017825, "Mean TPOT (ms)": 61.62705314762229, "Median TPOT (ms)": 63.4