Masaki crcrpar

## gist_ilp_investigation.md

      
              2 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / gist_ilp_investigation.md
            
            
              Created
              February 25, 2026 05:50
            
              
                Investigation: ILP tuning for PyTorch foreach vectorized loads — hypothesis, ncu profiling, analysis, and conclusion
              
          
    ILP Tuning for _foreach Vectorized Loads: Investigation & Results

Hypothesis

PyTorch's multi_tensor_apply kernel uses kILP = 4 for vectorized memory access.
For 2-byte types (fp16, bf16), this means each vectorized load is 4 * 2 = 8 bytes (64-bit LDG.64).
Modern GPUs support 128-bit loads (LDG.128). By increasing ILP to 8 for 16-bit types,
each thread would load 16 bytes per instruction, potentially doubling memory throughput.
The change introduces effective_ilp():

  
## torch_compile_hopper_blackwell_analysis.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / torch_compile_hopper_blackwell_analysis.md
            
            
              Created
              December 4, 2025 15:18
            
              
                torch.compile Feature Completeness on Hopper & Blackwell: Comprehensive Analysis of TMA and Advanced GPU Features
              
          
    torch.compile Feature Completeness on Hopper & Blackwell: A Comprehensive Analysis

Author: Generated via Claude analysis of PyTorch codebase

Date: December 4, 2025

Focus: Understanding TMA and advanced features for H100/B100/B200 GPUs

Executive Summary


## bash_strict_mode.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / bash_strict_mode.md
            
            
              Created
              April 1, 2022 22:04
                — forked from mohanpedala/bash_strict_mode.md
            
              
                set -e, -u, -o, -x pipefail explanation
              
          
    Table of Contents


set -e, -u, -x, -o pipefail
set -e
set -x
set -u
set -o pipefail
Setting IFS
Original Reference

set -e, -u, -x, -o pipefail


## how-gradients-are-accumulated-in-real.ipynb

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / how-gradients-are-accumulated-in-real.ipynb
            
            
              Created
              March 21, 2021 00:05
            
              
                how-gradients-are-accumulated-in-real.ipynb
              
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## how-gradients-are-accumulated-in-real.ipynb

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / how-gradients-are-accumulated-in-real.ipynb
            
            
              Created
              March 21, 2021 00:03
            
              
                how-gradients-are-accumulated-in-real.ipynb
              
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## inconsistent-results-between-forward-and-weighs-usage.ipynb

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / inconsistent-results-between-forward-and-weighs-usage.ipynb
            
            
              Created
              March 12, 2021 02:58
            
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## mnist-reduce-download-proposal.txt
diff --git a/torchvision/datasets/mnist.py b/torchvision/datasets/mnist.py
index e87cd46e..bfd59914 100644
--- a/torchvision/datasets/mnist.py
+++ b/torchvision/datasets/mnist.py
@@ -131,6 +131,11 @@ class MNIST(VisionDataset):
     def class_to_idx(self) -> Dict[str, int]:
         return {_class: i for i, _class in enumerate(self.classes)}

+    def _check_raw_data_exists(self) -> bool:
+        return all([

## nn-parameter-changes-input-tensor.ipynb

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / nn-parameter-changes-input-tensor.ipynb
            
            
              Created
              March 7, 2021 03:00
            
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## computational-graph-is-only-built-once.ipynb

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / computational-graph-is-only-built-once.ipynb
            
            
              Created
              March 7, 2021 01:58
            
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## computational-graph-is-only-built-once.ipynb

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                crcrpar
                / computational-graph-is-only-built-once.ipynb
            
            
              Created
              March 7, 2021 01:47
            
          
      Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	diff --git a/torchvision/datasets/mnist.py b/torchvision/datasets/mnist.py
	index e87cd46e..bfd59914 100644
	--- a/torchvision/datasets/mnist.py
	+++ b/torchvision/datasets/mnist.py
	@@ -131,6 +131,11 @@ class MNIST(VisionDataset):
	def class_to_idx(self) -> Dict[str, int]:
	return {_class: i for i, _class in enumerate(self.classes)}

	+ def _check_raw_data_exists(self) -> bool:
	+ return all([