Skip to content

Instantly share code, notes, and snippets.

View crcrpar's full-sized avatar

Masaki crcrpar

  • NVIDIA
  • Tokyo
  • 21:14 (UTC +09:00)
View GitHub Profile
@crcrpar
crcrpar / gist_ilp_investigation.md
Created February 25, 2026 05:50
Investigation: ILP tuning for PyTorch foreach vectorized loads — hypothesis, ncu profiling, analysis, and conclusion

ILP Tuning for _foreach Vectorized Loads: Investigation & Results

Hypothesis

PyTorch's multi_tensor_apply kernel uses kILP = 4 for vectorized memory access. For 2-byte types (fp16, bf16), this means each vectorized load is 4 * 2 = 8 bytes (64-bit LDG.64). Modern GPUs support 128-bit loads (LDG.128). By increasing ILP to 8 for 16-bit types, each thread would load 16 bytes per instruction, potentially doubling memory throughput.

The change introduces effective_ilp():

@crcrpar
crcrpar / torch_compile_hopper_blackwell_analysis.md
Created December 4, 2025 15:18
torch.compile Feature Completeness on Hopper & Blackwell: Comprehensive Analysis of TMA and Advanced GPU Features

torch.compile Feature Completeness on Hopper & Blackwell: A Comprehensive Analysis

Author: Generated via Claude analysis of PyTorch codebase
Date: December 4, 2025
Focus: Understanding TMA and advanced features for H100/B100/B200 GPUs


Executive Summary

@crcrpar
crcrpar / bash_strict_mode.md
Created April 1, 2022 22:04 — forked from mohanpedala/bash_strict_mode.md
set -e, -u, -o, -x pipefail explanation
@crcrpar
crcrpar / how-gradients-are-accumulated-in-real.ipynb
Created March 21, 2021 00:05
how-gradients-are-accumulated-in-real.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@crcrpar
crcrpar / how-gradients-are-accumulated-in-real.ipynb
Created March 21, 2021 00:03
how-gradients-are-accumulated-in-real.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
diff --git a/torchvision/datasets/mnist.py b/torchvision/datasets/mnist.py
index e87cd46e..bfd59914 100644
--- a/torchvision/datasets/mnist.py
+++ b/torchvision/datasets/mnist.py
@@ -131,6 +131,11 @@ class MNIST(VisionDataset):
def class_to_idx(self) -> Dict[str, int]:
return {_class: i for i, _class in enumerate(self.classes)}
+ def _check_raw_data_exists(self) -> bool:
+ return all([
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.