Skip to content

Instantly share code, notes, and snippets.

@bio-punk
Last active March 11, 2026 13:10
Show Gist options
  • Select an option

  • Save bio-punk/af767be8d8a3896d31688d52d9e91ab1 to your computer and use it in GitHub Desktop.

Select an option

Save bio-punk/af767be8d8a3896d31688d52d9e91ab1 to your computer and use it in GitHub Desktop.
32-H pyg安装 #build #aarch64

32-H 分区 PyG 部署说明

本文档总结在 32-H 分区(BSCC-N32-H)集群上配置与使用 PyTorch 及 PyG 扩展的步骤与注意事项。


1. 环境概览

项目 说明
远程工作目录 ${home_dir}/dev260310
Conda 环境 dev260310(Python 3.10)
PyTorch 1.13.1 + CUDA 11.6(集群预编译 whl,aarch64)
GPU NVIDIA A100-PCIE-40GB(sm_80)
集群 登录节点仅做编辑/编译,算力作业需提交到计算节点(sbatch --gpus=N

2. 环境脚本 env.sh

每次登录或开新终端后,先加载环境:

cd ${home_dir}/dev260310
source env.sh

env.sh 会完成:

  • 加载 miniforge 并激活 conda 环境 dev260310
  • 设置 PYTHONNOUSERSITE=1,不加载 ~/.local 下的包,只用当前 conda 环境,避免与 .local 混用
  • 加载 module:GCC 9.3.0、CUDA 11.6、cudnn
  • 设置 WORK_DIRcd 到工作目录
  • 设置 CC/CXX、CUDA_HOME/CUDA_PATH、TORCH_ARCH_LIST、CMAKE_CUDA_ARCHITECTURES 等,供编译与 PyTorch 使用
  • 设置代理相关变量(按需用于构建时联网)

3. PyTorch 安装(仅需执行一次)

登录节点(可联网)执行:

cd ${home_dir}/dev260310
source env.sh
install_pytorch

install_pytorch 会:

  1. 用集群提供的 whl 安装 torch、torchvision、torchaudio(CUDA 11.6,cp310)
  2. 安装常用依赖(如 typing_extensions、requests、pandas、huggingface-hub 等)及 nvitop

安装完成后,在计算节点上只需 source env.sh 即可使用,无需再装。


4. PyTorch 扩展(源码编译)

以下四个扩展需在 已安装 PyTorchsource env.sh 的前提下,从源码编译安装(与当前 PyTorch/CUDA 版本一致):

包名(PyPI) 版本
torch-cluster 1.6.0
torch-scatter 2.1.0
torch-sparse 0.6.18
torch-spline-conv 1.2.1

说明:torch-sparse 需 0.6.18 及以上,否则在 PyTorch 1.13 下会因缺少 CHECK_LT 等宏而编译失败。

操作步骤(建议在登录节点执行,集群要求编译进程控制在 2 个以内):

cd ${home_dir}/dev260310
source env.sh
install_torch_extensions

脚本会依次执行 pip install ... --no-binary :all: --no-build-isolation,在当前环境中编译(能 import torch),并利用 CUDA_HOMETORCH_ARCH_LISTCMAKE_CUDA_ARCHITECTURES 等编译出适配 A100 的版本。若不加 --no-build-isolation,pip 的隔离构建环境里没有 torch,会报 No module named 'torch'。每个包编译可能需要数分钟,若某包报缺依赖(如 scipy),先 pip install scipy 再重试。

安装后验证 PyG 扩展:

source env.sh
python test_pyg.py

脚本会依次检查 torch_cluster(FPS)、torch_scatter(scatter_add)、torch_sparse(SparseTensor)、torch_spline_conv(spline_conv),全部通过会输出 All 4 PyG extensions OK.


5. 测试 PyTorch

快速检查(登录节点或计算节点均可):

source env.sh
python test_torch.py

会输出 Python/PyTorch 版本、CUDA 是否可用、GPU 数量与型号;若在计算节点还会做一次小规模 GPU 张量运算。

使用 Slurm 脚本一次性测试 PyTorch 与 PyG:

项目中提供了 run_test.sh,内容为:

#!/bin/bash
#SBATCH --gpus=1
#SBATCH --time=00:05:00

source env.sh
python test_torch.py
python test_pyg.py

在远程工作目录下可以直接提交:

cd ${home_dir}/dev260310
sbatch run_test.sh

作业完成后查看输出日志(例如 slurm-<jobid>.out)即可一次性确认 PyTorch、CUDA 和四个 PyG 扩展是否正常。


6. 本地与远程同步

  • 使用 sftp 将本地 env.sh、代码等上传到远程工作目录。
  • 若在 Windows 编辑后上传,远程可能出现 CRLF 换行 导致 source env.sh 报错,可在远程执行:
    sed -i 's/\r$//' env.sh
  • 项目内 .gitattributes 已设置 *.sh text eol=lf,用 Git 同步时 shell 脚本会保持 LF。

7. 常用变量速查

变量 含义
WORK_DIR 远程工作目录
PYTHONNOUSERSITE=1 不加载 ~/.local,仅用当前 conda 环境
TORCH_ARCH_LIST=8.0 PyTorch 编译目标架构(A100)
CMAKE_CUDA_ARCHITECTURES=80 CMake 编译 CUDA 时目标架构(sm_80)

附录 A:env.sh 全文

#!/bin/bash
# PyTorch 1.13.1+cu116 环境:使用 miniforge,不再使用 anaconda/2021.11
source /home/bingxing2/apps/miniforge3/24.1.2/etc/profile.d/conda.sh
module load compilers/gcc/9.3.0 compilers/cuda/11.6 cudnn/8.6.0.163_cuda11.x
conda activate dev260310

# 不加载 ~/.local 下的包,只用当前 conda 环境,避免和 .local 混用
export PYTHONNOUSERSITE=1

# 工作目录(远程)
export WORK_DIR=${home_dir}/dev260310
cd "${WORK_DIR}"

# 编译器与 CUDA 相关环境变量(modules 可能未设置)
export CC=gcc
export CXX=g++
export CUDA_HOME=$(dirname $(dirname $(which nvcc)))
export CUDA_PATH=${CUDA_HOME}
# export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH:-}

# PyTorch 源码编译时的目标 GPU 架构(A100 = 8.0)
export TORCH_ARCH_LIST=8.0

# CMake 编译 CUDA 时的目标架构(A100 = sm_80)
export CMAKE_CUDA_ARCHITECTURES=80

# 代理,仅在构建使用
# export HTTP_PROXY=http://127.0.0.1:7897
# export HTTPS_PROXY=http://127.0.0.1:7897
# export NO_PROXY=localhost,127.0.0.1

# 在登录节点(能联网)执行一次即可,补齐 PyTorch 及当前环境缺的依赖
install_pytorch() {
  export PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
  pip install \
    /home/bingxing2/apps/package/pytorch/1.13.1+cu116_cp310/torch-1.13.1+cu116-cp310-cp310-linux_aarch64.whl \
    /home/bingxing2/apps/package/pytorch/1.13.1+cu116_cp310/torchvision-0.14.1+cu116-cp310-cp310-linux_aarch64.whl \
    /home/bingxing2/apps/package/pytorch/1.13.1+cu116_cp310/torchaudio-0.13.1+cu116-cp310-cp310-linux_aarch64.whl \
    "numpy<2"
  pip install "setuptools<82" typing_extensions requests idna certifi tqdm aiofiles "huggingface-hub>=0.19.3" "jinja2<4.0" "markupsafe>=2.0,<4.0" "pandas>=1.0,<3.0" fsspec "rich>=10.11.0" "requests[socks]"
  pip install nvitop
}

# 源码编译 PyG 扩展(需先执行 install_pytorch)。使用当前环境构建,用 python -m pip 确保与当前 Python 一致。
# torch-sparse 需 >=0.6.18:PyTorch 1.13 移除了 CHECK_LT 等 glog 宏,旧版会报 CHECK_LT was not declared。
install_torch_extensions() {
  export PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
  mkdir -p ${WORK_DIR}/tmp
  export TMPDIR=${WORK_DIR}/tmp
  python -m pip install "setuptools<82" scipy  # setuptools>=82 移除了 pkg_resources,torch 编译扩展会报错
  python -m pip install -v \
    torch-cluster==1.6.0 \
    torch-scatter==2.1.0 \
    --no-binary :all: --no-build-isolation 2>&1 | tee torch-cluster.log
  python -m pip install -v \
    torch-sparse==0.6.18 \
    torch-spline-conv==1.2.1 \
    --no-binary :all: --no-build-isolation 2>&1 | tee torch-sparse.log
}

附录 B:test_torch.py

#!/usr/bin/env python3
"""快速检查 PyTorch 与 CUDA 是否可用(可在登录节点运行做 import 检查,完整 GPU 算力需在计算节点)"""
import sys
print("Python:", sys.executable)
print("Python version:", sys.version)

import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("CUDA version:", torch.version.cuda)
    print("GPU count:", torch.cuda.device_count())
    for i in range(torch.cuda.device_count()):
        print(f"  GPU {i}:", torch.cuda.get_device_name(i))
    # 简单 GPU 张量运算(若在登录节点可能被限制,计算节点可看到结果)
    x = torch.randn(3, 3, device="cuda")
    y = x @ x
    print("GPU tensor test (3x3 @ 3x3): OK, result norm =", y.norm().item())
else:
    print("No GPU / CUDA not available.")
print("Done.")

附录 C:test_pyg.py

#!/usr/bin/env python3
"""检查 PyG 扩展(torch_cluster / torch_scatter / torch_sparse / torch_spline_conv)是否安装且可用。"""
import sys
print("Python:", sys.executable)

import torch
print("PyTorch:", torch.__version__, "| CUDA:", torch.cuda.is_available())
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def ok(name):
    print(f"  {name}: OK")

errors = []

# 1. torch_cluster
try:
    import torch_cluster
    x = torch.tensor([[0.0, 0.0], [1.0, 0.0], [1.0, 1.0]], device=device)
    batch = torch.zeros(3, dtype=torch.long, device=device)
    out = torch_cluster.fps(x, batch, ratio=0.5)
    assert isinstance(out, torch.Tensor) and out.dim() == 1
    ok("torch_cluster (FPS)")
except Exception as e:
    errors.append(("torch_cluster", e))
    print(f"  torch_cluster: FAIL — {e}")

# 2. torch_scatter
try:
    import torch_scatter
    src = torch.tensor([1.0, 2.0, 3.0, 4.0], device=device)
    index = torch.tensor([0, 0, 1, 1], device=device)
    out = torch_scatter.scatter_add(src, index, dim=0)
    assert out.shape == (2,) and out[0].item() == 3.0 and out[1].item() == 7.0
    ok("torch_scatter (scatter_add)")
except Exception as e:
    errors.append(("torch_scatter", e))
    print(f"  torch_scatter: FAIL — {e}")

# 3. torch_sparse
try:
    import torch_sparse
    row = torch.tensor([0, 0, 1, 1], device=device)
    col = torch.tensor([0, 1, 0, 1], device=device)
    val = torch.tensor([1.0, 2.0, 3.0, 4.0], device=device)
    adj = torch_sparse.SparseTensor(row=row, col=col, value=val, sparse_sizes=(2, 2))
    adj = adj.coalesce()
    assert adj.nnz() == 4
    ok("torch_sparse (SparseTensor.coalesce)")
except Exception as e:
    errors.append(("torch_sparse", e))
    print(f"  torch_sparse: FAIL — {e}")

# 4. torch_spline_conv
try:
    import torch_spline_conv
    # 最小调用:x [N,in], edge_index [2,E], pseudo [E,dim], kernel_size[dim]
    x = torch.randn(4, 2, device=device)
    edge_index = torch.tensor([[0, 1, 2, 3], [1, 2, 3, 0]], dtype=torch.long, device=device)
    pseudo = torch.rand(4, 2, device=device)  # [0,1] 区间
    kernel_size = torch.tensor([3, 3], dtype=torch.long, device=device)
    is_open_spline = torch.ones_like(kernel_size, dtype=torch.uint8, device=device)  # Byte, not Bool
    weight = torch.ones(int(kernel_size.prod().item()), 2, 2, device=device)  # K x in x out
    out = torch_spline_conv.spline_conv(
        x, edge_index, pseudo, weight, kernel_size, is_open_spline
    )
    assert out.shape == (4, 2)
    ok("torch_spline_conv (spline_conv)")
except Exception as e:
    errors.append(("torch_spline_conv", e))
    print(f"  torch_spline_conv: FAIL — {e}")

print()
if errors:
    print("FAILED:", len(errors), "of 4")
    for name, e in errors:
        print(f"  {name}: {e}")
    sys.exit(1)
print("All 4 PyG extensions OK.")
sys.exit(0)

附录 D:run_test.sh

#!/bin/bash
#SBATCH --gpus=1
#SBATCH --time=00:05:00

source env.sh
python test_torch.py
python test_pyg.py

文档随当前 env.sh 与测试脚本整理,若有增改可同步更新此说明。

#!/bin/bash
# PyTorch 1.13.1+cu116 环境:使用 miniforge,不再使用 anaconda/2021.11
source /home/bingxing2/apps/miniforge3/24.1.2/etc/profile.d/conda.sh
module load compilers/gcc/9.3.0 compilers/cuda/11.6 cudnn/8.6.0.163_cuda11.x
conda activate dev260310
# 不加载 ~/.local 下的包,只用当前 conda 环境,避免和 .local 混用
export PYTHONNOUSERSITE=1
# 工作目录(远程)
export WORK_DIR=${home_dir}/dev260310
cd "${WORK_DIR}"
# 编译器与 CUDA 相关环境变量(modules 可能未设置)
export CC=gcc
export CXX=g++
export CUDA_HOME=$(dirname $(dirname $(which nvcc)))
export CUDA_PATH=${CUDA_HOME}
# export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH:-}
# PyTorch 源码编译时的目标 GPU 架构(A100 = 8.0)
export TORCH_ARCH_LIST=8.0
# CMake 编译 CUDA 时的目标架构(A100 = sm_80)
export CMAKE_CUDA_ARCHITECTURES=80
# 代理,仅在构建使用
# export HTTP_PROXY=http://127.0.0.1:7897
# export HTTPS_PROXY=http://127.0.0.1:7897
# export NO_PROXY=localhost,127.0.0.1
# 在登录节点(能联网)执行一次即可,补齐 PyTorch 及当前环境缺的依赖
install_pytorch() {
export PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
pip install \
/home/bingxing2/apps/package/pytorch/1.13.1+cu116_cp310/torch-1.13.1+cu116-cp310-cp310-linux_aarch64.whl \
/home/bingxing2/apps/package/pytorch/1.13.1+cu116_cp310/torchvision-0.14.1+cu116-cp310-cp310-linux_aarch64.whl \
/home/bingxing2/apps/package/pytorch/1.13.1+cu116_cp310/torchaudio-0.13.1+cu116-cp310-cp310-linux_aarch64.whl \
"numpy<2"
pip install "setuptools<82" typing_extensions requests idna certifi tqdm aiofiles "huggingface-hub>=0.19.3" "jinja2<4.0" "markupsafe>=2.0,<4.0" "pandas>=1.0,<3.0" fsspec "rich>=10.11.0" "requests[socks]"
pip install nvitop
}
# 源码编译 PyG 扩展(需先执行 install_pytorch)。使用当前环境构建,用 python -m pip 确保与当前 Python 一致。
# torch-sparse 需 >=0.6.18:PyTorch 1.13 移除了 CHECK_LT 等 glog 宏,旧版会报 CHECK_LT was not declared。
install_torch_extensions() {
export PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
mkdir -p ${WORK_DIR}/tmp
export TMPDIR=${WORK_DIR}/tmp
python -m pip install "setuptools<82" scipy # setuptools>=82 移除了 pkg_resources,torch 编译扩展会报错
python -m pip install -v \
torch-cluster==1.6.0 \
torch-scatter==2.1.0 \
--no-binary :all: --no-build-isolation 2>&1 | tee torch-cluster.log
python -m pip install -v \
torch-sparse==0.6.18 \
torch-spline-conv==1.2.1 \
--no-binary :all: --no-build-isolation 2>&1 | tee torch-sparse.log
}
#!/bin/bash
#SBATCH --gpus=1
#SBATCH --time=00:05:00
source env.sh
python test_torch.py
python test_pyg.py
#!/usr/bin/env python3
"""检查 PyG 扩展(torch_cluster / torch_scatter / torch_sparse / torch_spline_conv)是否安装且可用。"""
import sys
print("Python:", sys.executable)
import torch
print("PyTorch:", torch.__version__, "| CUDA:", torch.cuda.is_available())
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def ok(name):
print(f" {name}: OK")
errors = []
# 1. torch_cluster
try:
import torch_cluster
x = torch.tensor([[0.0, 0.0], [1.0, 0.0], [1.0, 1.0]], device=device)
batch = torch.zeros(3, dtype=torch.long, device=device)
out = torch_cluster.fps(x, batch, ratio=0.5)
assert isinstance(out, torch.Tensor) and out.dim() == 1
ok("torch_cluster (FPS)")
except Exception as e:
errors.append(("torch_cluster", e))
print(f" torch_cluster: FAIL — {e}")
# 2. torch_scatter
try:
import torch_scatter
src = torch.tensor([1.0, 2.0, 3.0, 4.0], device=device)
index = torch.tensor([0, 0, 1, 1], device=device)
out = torch_scatter.scatter_add(src, index, dim=0)
assert out.shape == (2,) and out[0].item() == 3.0 and out[1].item() == 7.0
ok("torch_scatter (scatter_add)")
except Exception as e:
errors.append(("torch_scatter", e))
print(f" torch_scatter: FAIL — {e}")
# 3. torch_sparse
try:
import torch_sparse
row = torch.tensor([0, 0, 1, 1], device=device)
col = torch.tensor([0, 1, 0, 1], device=device)
val = torch.tensor([1.0, 2.0, 3.0, 4.0], device=device)
adj = torch_sparse.SparseTensor(row=row, col=col, value=val, sparse_sizes=(2, 2))
adj = adj.coalesce()
assert adj.nnz() == 4
ok("torch_sparse (SparseTensor.coalesce)")
except Exception as e:
errors.append(("torch_sparse", e))
print(f" torch_sparse: FAIL — {e}")
# 4. torch_spline_conv
try:
import torch_spline_conv
# 最小调用:x [N,in], edge_index [2,E], pseudo [E,dim], kernel_size[dim]
x = torch.randn(4, 2, device=device)
edge_index = torch.tensor([[0, 1, 2, 3], [1, 2, 3, 0]], dtype=torch.long, device=device)
pseudo = torch.rand(4, 2, device=device) # [0,1] 区间
kernel_size = torch.tensor([3, 3], dtype=torch.long, device=device)
is_open_spline = torch.ones_like(kernel_size, dtype=torch.uint8, device=device) # Byte, not Bool
weight = torch.ones(int(kernel_size.prod().item()), 2, 2, device=device) # K x in x out
out = torch_spline_conv.spline_conv(
x, edge_index, pseudo, weight, kernel_size, is_open_spline
)
assert out.shape == (4, 2)
ok("torch_spline_conv (spline_conv)")
except Exception as e:
errors.append(("torch_spline_conv", e))
print(f" torch_spline_conv: FAIL — {e}")
print()
if errors:
print("FAILED:", len(errors), "of 4")
for name, e in errors:
print(f" {name}: {e}")
sys.exit(1)
print("All 4 PyG extensions OK.")
sys.exit(0)
#!/usr/bin/env python3
"""快速检查 PyTorch 与 CUDA 是否可用(可在登录节点运行做 import 检查,完整 GPU 算力需在计算节点)"""
import sys
print("Python:", sys.executable)
print("Python version:", sys.version)
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
print("CUDA version:", torch.version.cuda)
print("GPU count:", torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
print(f" GPU {i}:", torch.cuda.get_device_name(i))
# 简单 GPU 张量运算(若在登录节点可能被限制,计算节点可看到结果)
x = torch.randn(3, 3, device="cuda")
y = x @ x
print("GPU tensor test (3x3 @ 3x3): OK, result norm =", y.norm().item())
else:
print("No GPU / CUDA not available.")
print("Done.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment