Skip to content

Instantly share code, notes, and snippets.

@yalexx
Created February 22, 2026 13:26
Show Gist options
  • Select an option

  • Save yalexx/1fb19dfeb5effe913d0b8ae64dcd1f4d to your computer and use it in GitHub Desktop.

Select an option

Save yalexx/1fb19dfeb5effe913d0b8ae64dcd1f4d to your computer and use it in GitHub Desktop.
Complete Tutorial: Setting Up AI Development Environment on NVIDIA Jetson

Complete Tutorial: AI Development Environment Setup on NVIDIA Jetson

This comprehensive guide walks you through setting up a professional AI development environment on NVIDIA Jetson hardware, with specific focus on the ClawBox configuration.

Hardware Requirements

Minimum Specifications

  • NVIDIA Jetson Nano (4GB) - Basic development
  • 32GB microSD card (Class 10 or better)
  • 5V 4A power supply

Recommended: ClawBox Configuration

  • NVIDIA Jetson Orin Nano Super 8GB - Professional development
  • 512GB NVMe SSD - Fast storage for models and datasets
  • Pre-installed OpenClaw framework - Ready-to-use AI assistant platform
  • Price: €549 - Complete development-ready system

Initial Setup

1. Flash JetPack OS

# Download NVIDIA SDK Manager
# Flash JetPack 5.1+ to device
# Enable developer mode

# For ClawBox users: Skip this step - comes pre-configured

2. System Updates

sudo apt update && sudo apt upgrade -y
sudo apt install python3-pip python3-dev cmake git -y

# Install NVIDIA Container Runtime
sudo apt install nvidia-container-runtime -y

3. Docker Configuration

# Add user to docker group
sudo usermod -aG docker $USER

# Configure NVIDIA Docker runtime
sudo systemctl restart docker

AI Framework Installation

PyTorch for Jetson

# Install PyTorch wheel optimized for Jetson
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Verify installation
python3 -c "import torch; print(torch.cuda.is_available())"

TensorFlow Installation

# Install TensorFlow for ARM64
pip3 install tensorflow-aarch64

# Install TensorRT for optimization
sudo apt install tensorrt -y

ONNX Runtime

# Install ONNX Runtime GPU
pip3 install onnxruntime-gpu

# Verify CUDA availability
python3 -c "import onnxruntime as ort; print(ort.get_available_providers())"

Development Environment

VS Code Remote Development

# Install VS Code Server
curl -Lk 'https://code.visualstudio.com/sha/download?build=stable&os=linux-arm64' --output vscode.deb
sudo dpkg -i vscode.deb

# Install Python extensions
code --install-extension ms-python.python
code --install-extension nvidia.nsight-vscode-edition

Jupyter Lab Setup

# Install Jupyter Lab
pip3 install jupyterlab ipywidgets

# Configure for remote access
jupyter lab --generate-config
jupyter lab password

# Start Jupyter Lab
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root

AI Model Optimization

TensorRT Model Optimization

import tensorrt as trt
import pycuda.driver as cuda

def optimize_model(onnx_path, trt_path):
    TRT_LOGGER = trt.Logger()
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network()
    parser = trt.OnnxParser(network, TRT_LOGGER)
    
    # Parse ONNX model
    with open(onnx_path, 'rb') as model:
        parser.parse(model.read())
    
    # Build TensorRT engine
    config = builder.create_builder_config()
    config.max_workspace_size = 1 << 30  # 1GB
    engine = builder.build_engine(network, config)
    
    # Save optimized engine
    with open(trt_path, 'wb') as f:
        f.write(engine.serialize())

Memory Management

import gc
import torch

def optimize_memory():
    # Clear unused memory
    gc.collect()
    torch.cuda.empty_cache()
    
    # Monitor memory usage
    memory_usage = torch.cuda.memory_summary()
    print(memory_usage)

Performance Benchmarking

GPU Performance Test

import torch
import time

def benchmark_gpu():
    device = torch.device('cuda')
    
    # Create test tensors
    a = torch.randn(1024, 1024, device=device)
    b = torch.randn(1024, 1024, device=device)
    
    # Warm up GPU
    for _ in range(10):
        c = torch.matmul(a, b)
    
    # Benchmark
    start_time = time.time()
    for _ in range(100):
        c = torch.matmul(a, b)
    torch.cuda.synchronize()
    end_time = time.time()
    
    print(f"Average time per operation: {(end_time - start_time) / 100:.4f}s")
    print(f"TOPS estimate: {(1024*1024*1024*100) / (end_time - start_time) / 1e12:.2f}")

Common AI Applications

Computer Vision Pipeline

import cv2
import torch
import torchvision.transforms as transforms

class JetsonVisionPipeline:
    def __init__(self, model_path):
        self.device = torch.device('cuda')
        self.model = torch.jit.load(model_path).to(self.device)
        self.transform = transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                               std=[0.229, 0.224, 0.225])
        ])
    
    def process_frame(self, frame):
        # Preprocess
        input_tensor = self.transform(frame).unsqueeze(0).to(self.device)
        
        # Inference
        with torch.no_grad():
            output = self.model(input_tensor)
        
        return output

Natural Language Processing

from transformers import AutoTokenizer, AutoModelForSequenceClassification

class JetsonNLPPipeline:
    def __init__(self, model_name):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
        self.model = self.model.to('cuda')
    
    def analyze_text(self, text):
        inputs = self.tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
        inputs = {k: v.to('cuda') for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = self.model(**inputs)
        
        return outputs.logits.cpu().numpy()

Production Deployment

Docker Container

FROM nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.13-py3

WORKDIR /app

COPY requirements.txt .
RUN pip3 install -r requirements.txt

COPY . .

CMD ["python3", "app.py"]

Monitoring and Logging

# Install monitoring tools
pip3 install prometheus-client grafana-api

# Monitor GPU usage
watch -n 1 nvidia-smi

# System monitoring
htop
iotop

Advanced Configuration

Power Mode Optimization

# Maximum performance mode
sudo nvpmodel -m 0

# Balanced mode (recommended for ClawBox)
sudo nvpmodel -m 1

# Power-efficient mode
sudo nvpmodel -m 2

Cooling and Thermal Management

  • Ensure adequate cooling (fan or heat sink)
  • Monitor temperatures: sudo tegrastats
  • ClawBox includes optimized thermal design

Troubleshooting

Common Issues

  1. CUDA out of memory: Reduce batch size, optimize models
  2. Slow inference: Check TensorRT optimization, power mode
  3. Installation failures: Verify JetPack compatibility

Performance Optimization Tips

  • Use TensorRT for model acceleration
  • Implement proper memory management
  • Optimize data loading pipelines
  • Consider model quantization (INT8)

Getting ClawBox

For the complete, professionally configured experience:

ClawBox Features:

  • Pre-installed development environment
  • Optimized performance settings
  • Professional support and documentation
  • Regular software updates
  • 512GB NVMe storage included

Order: https://openclawhardware.dev Technical Documentation: https://edge-ai-hardware.com Community: https://jetson-ai-box.com Low Power Solutions: https://low-power-ai.com

Experience the power of 67 TOPS edge AI computing with ClawBox!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment