Skip to content

Instantly share code, notes, and snippets.

@TechHutTV
Created February 28, 2026 21:25
Show Gist options
  • Select an option

  • Save TechHutTV/7f78ac0a9f70ad1435831839ea58483b to your computer and use it in GitHub Desktop.

Select an option

Save TechHutTV/7f78ac0a9f70ad1435831839ea58483b to your computer and use it in GitHub Desktop.
Complete benchmark guide for comparing the Intel Arc Pro B60 and Nvidia RTX 3090 in home server workloads; transcoding, AI inference, and power consumption. Fedora-based, single machine swap, with scoring system, scripts, and troubleshooting notes.

Intel Arc Pro B60 vs Nvidia RTX 3090: Complete Benchmark Guide

A step-by-step testing guide for home server workloads — transcoding and local AI inference. Both GPUs are tested in the same machine. Nvidia 3090 goes first, then swap to the Intel Arc Pro B60.

Workflow Overview

1. Install base software & download test media (GPU-agnostic)
2. Install RTX 3090 → Install Nvidia drivers → Run ALL Nvidia benchmarks
3. Power off → Physically swap to Arc Pro B60 → Install Intel drivers → Run ALL Intel benchmarks
4. Compare results with the scorecard

Why Nvidia first? Nvidia's Fedora driver setup is more straightforward and well-documented. Getting all your test scripts validated on the 3090 first means any issues you hit on the Intel side are definitely Intel-related, not script bugs.


Phase 0: Base System Setup (Before Any GPU Testing)

This phase installs everything that doesn't depend on which GPU is in the machine. Do this once.

Base Packages (Fedora)

# System essentials
sudo dnf install -y ffmpeg git curl wget htop jq bc python3-pip lm_sensors \
  libva-utils docker-ce docker-ce-cli containerd.io

# If Docker isn't in your repos yet:
sudo dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
# Log out and back in for the docker group to take effect

# Python venvs (we'll use these for AI tests)
sudo dnf install -y python3-devel python3-virtualenv

Create Directory Structure

mkdir -p ~/benchmark/{media,media/audio,scripts,results,logs}
mkdir -p ~/benchmark/results/{ffmpeg,streams,llm,sd,whisper,yolo}

Scorecard Setup

This is the heart of your record-keeping. Create it now, fill it in as you go.

cat > ~/benchmark/results/scorecard.csv << 'EOF'
test_id,test_name,metric,unit,higher_is_better,nvidia_3090,intel_b60,winner,notes
1.1a,HEVC 4K Encode,fps,FPS,yes,,,, 
1.1b,HEVC 4K→1080p Transcode,fps,FPS,yes,,,,
1.2a,AV1 4K Encode,fps,FPS,yes,,,,
1.2b,AV1 4K→1080p Transcode,fps,FPS,yes,,,,
1.3a,HEVC Output Quality,vmaf_score,VMAF,yes,,,,
1.3b,AV1 Output Quality,vmaf_score,VMAF,yes,,,,
1.4,HDR Tone Map 4K→1080p,fps,FPS,yes,,,,
1.5,Max 4K→1080p Streams,stream_count,streams,yes,,,,
2.1a,LLM Llama3 8B Q4_K_M,gen_tokens_per_sec,tok/s,yes,,,,
2.1b,LLM Llama3 8B Q8,gen_tokens_per_sec,tok/s,yes,,,,
2.1c,LLM Mistral 7B Q4_K_M,gen_tokens_per_sec,tok/s,yes,,,,
2.1d,LLM Gemma2 27B Q4_K_M,gen_tokens_per_sec,tok/s,yes,,,,
2.2,SDXL Image Gen 1024x1024,time_per_image,seconds,no,,,,
2.3a,Whisper small,realtime_factor,RTF,no,,,,
2.3b,Whisper medium,realtime_factor,RTF,no,,,,
2.3c,Whisper large-v3,realtime_factor,RTF,no,,,,
2.4a,YOLO v8n Detection,fps,FPS,yes,,,,
2.4b,YOLO v8s Detection,fps,FPS,yes,,,,
P.1,System Idle Power,watts,W,no,,,,
P.2,Peak Transcode Power,watts,W,no,,,,
P.3,Peak AI Inference Power,watts,W,no,,,,
P.4,Card Purchase Price,price,USD,no,,,,
EOF

echo "Scorecard created at ~/benchmark/results/scorecard.csv"

Score Recording Helper Script

Use this after every test to log results into the scorecard.

cat > ~/benchmark/scripts/record_score.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./record_score.sh <test_id> <gpu: nvidia|intel> <value> [notes]
# Example: ./record_score.sh 1.1a nvidia 245.3 "stable, no throttling"

TEST_ID="$1"
GPU="$2"
VALUE="$3"
NOTES="${4:-}"

SCORECARD="$HOME/benchmark/results/scorecard.csv"

if [ -z "$TEST_ID" ] || [ -z "$GPU" ] || [ -z "$VALUE" ]; then
    echo "Usage: ./record_score.sh <test_id> <nvidia|intel> <value> [notes]"
    echo ""
    echo "Available test IDs:"
    awk -F',' 'NR>1 {printf "  %-8s %s (%s)\
", $1, $2, $4}' "$SCORECARD"
    exit 1
fi

# Determine which column to update (6 = nvidia, 7 = intel)
if [ "$GPU" = "nvidia" ]; then
    COL=6
elif [ "$GPU" = "intel" ]; then
    COL=7
else
    echo "Error: GPU must be 'nvidia' or 'intel'"
    exit 1
fi

# Update the scorecard
awk -F',' -v id="$TEST_ID" -v col="$COL" -v val="$VALUE" -v notes="$NOTES" '
BEGIN {OFS=","; found=0}
{
    if ($1 == id) {
        $col = val
        if (notes != "") $9 = notes
        # Auto-determine winner if both values present
        if ($6 != "" && $7 != "") {
            if ($5 == "yes") {
                $8 = ($6+0 > $7+0) ? "3090" : ($7+0 > $6+0) ? "B60" : "TIE"
            } else {
                $8 = ($6+0 < $7+0) ? "3090" : ($7+0 < $6+0) ? "B60" : "TIE"
            }
        }
        found=1
    }
    print
}
END { if (!found) print "WARNING: test_id " id " not found" > "/dev/stderr" }
' "$SCORECARD" > "${SCORECARD}.tmp" && mv "${SCORECARD}.tmp" "$SCORECARD"

# Show what was recorded
echo "Recorded: test=$TEST_ID gpu=$GPU value=$VALUE"
grep "^${TEST_ID}," "$SCORECARD"
SCRIPT

chmod +x ~/benchmark/scripts/record_score.sh

View Scorecard Script

cat > ~/benchmark/scripts/show_scores.sh << 'SCRIPT'
#!/bin/bash
# Pretty-print the current scorecard

SCORECARD="$HOME/benchmark/results/scorecard.csv"

echo ""
echo "╔══════════════════════════════════════════════════════════════════════════════╗"
echo "║              ARC PRO B60 vs RTX 3090 — BENCHMARK SCORECARD                 ║"
echo "╚══════════════════════════════════════════════════════════════════════════════╝"
echo ""

printf "%-8s %-30s %-8s %-12s %-12s %-8s\
" \
  "ID" "Test" "Unit" "RTX 3090" "Arc B60" "Winner"
printf "%-8s %-30s %-8s %-12s %-12s %-8s\
" \
  "----" "----" "----" "--------" "-------" "------"

awk -F',' 'NR>1 {
    printf "%-8s %-30s %-8s %-12s %-12s %-8s\
", $1, $2, $4, $6, $7, $8
}' "$SCORECARD"

echo ""

# Tally
NVIDIA_WINS=$(awk -F',' '$8=="3090"' "$SCORECARD" | wc -l)
INTEL_WINS=$(awk -F',' '$8=="B60"' "$SCORECARD" | wc -l)
TIES=$(awk -F',' '$8=="TIE"' "$SCORECARD" | wc -l)
PENDING=$(awk -F',' 'NR>1 && $8==""' "$SCORECARD" | wc -l)

echo "TALLY:  RTX 3090: ${NVIDIA_WINS} wins  |  Arc B60: ${INTEL_WINS} wins  |  Ties: ${TIES}  |  Pending: ${PENDING}"
echo ""
SCRIPT

chmod +x ~/benchmark/scripts/show_scores.sh

Test Media Files

All test media uses freely available, openly licensed content. Download once before any GPU testing.

# 1. Jellyfish clips — industry-standard bitrate test files
# 4K H.264 (120 Mbps) — primary source file for most tests
wget -O ~/benchmark/media/4k_h264.mkv \
  "https://repo.jellyfin.org/archive/jellyfish/media/jellyfish-120-mbps-4k-uhd-h264.mkv"

# 4K HEVC 10-bit (120 Mbps) — for HEVC decode/re-encode tests
wget -O ~/benchmark/media/4k_hevc_10bit.mkv \
  "https://repo.jellyfin.org/archive/jellyfish/media/jellyfish-120-mbps-4k-uhd-hevc-10bit.mkv"

# 2. Blender Open Movies — real-world content with motion, grain, effects
# Tears of Steel (4K, ~12 min, sci-fi VFX — great stress test)
wget -O ~/benchmark/media/tears_of_steel_4k.mov \
  "https://download.blender.org/demo/movies/ToS/ToS-4k-1920.mov"

# Big Buck Bunny (4K, ~10 min, animation — tests flat color encoding)
wget -O ~/benchmark/media/big_buck_bunny_4k.mp4 \
  "https://mirrors.kodi.tv/demo-files/BBB/bbb_sunflower_2160p_60fps_normal.mp4"

# Sintel (4K, ~15 min, mixed live-action style — good variety)
wget -O ~/benchmark/media/sintel_4k.mkv \
  "https://download.blender.org/durian/movies/Sintel.2010.4k.mkv"

Generate Derived Test Files

These use software encoding so they don't need a GPU.

# 1080p H.264 (for lighter transcode tests)
ffmpeg -i ~/benchmark/media/4k_h264.mkv \
  -vf scale=1920:1080 -c:v libx264 -crf 18 -preset slow \
  -an ~/benchmark/media/1080p_h264.mkv

# 4K AV1 (software encode — slow but only needs to be done once)
ffmpeg -i ~/benchmark/media/4k_h264.mkv \
  -c:v libsvtav1 -crf 30 -preset 6 \
  -an ~/benchmark/media/4k_av1.mkv

# Synthetic HDR test file (adds PQ/BT.2020 metadata to 10-bit source)
ffmpeg -i ~/benchmark/media/4k_hevc_10bit.mkv \
  -c:v libx265 -crf 20 -preset slow \
  -x265-params "colorprim=bt2020:transfer=smpte2084:colormatrix=bt2020nc:master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1):max-cll=1000,400" \
  -pix_fmt yuv420p10le \
  -an ~/benchmark/media/4k_hevc_hdr.mkv

# Verify HDR metadata
ffprobe -show_streams ~/benchmark/media/4k_hevc_hdr.mkv 2>&1 | grep -i "color\|master"

# YOLO test clips (Tears of Steel — people, VFX, varied scenes)
ffmpeg -ss 120 -i ~/benchmark/media/tears_of_steel_4k.mov \
  -t 120 -vf scale=1920:1080 -c:v libx264 -crf 18 \
  -an ~/benchmark/media/yolo_test_1080p.mkv

ffmpeg -ss 120 -i ~/benchmark/media/tears_of_steel_4k.mov \
  -t 120 -vf scale=1280:720 -c:v libx264 -crf 18 \
  -an ~/benchmark/media/yolo_test_720p.mkv

Tdarr Batch Library (50+ Files)

SOURCE_FILES=(
  ~/benchmark/media/tears_of_steel_4k.mov
  ~/benchmark/media/big_buck_bunny_4k.mp4
  ~/benchmark/media/sintel_4k.mkv
)

RESOLUTIONS=("3840:2160" "2560:1440" "1920:1080" "1280:720")
CODECS_ENC=("libx264" "libx265" "libsvtav1")
CODECS_EXT=("h264" "hevc" "av1")

COUNT=0
for src in "${SOURCE_FILES[@]}"; do
    BASENAME=$(basename "$src" | cut -d. -f1)
    DURATION=$(ffprobe -v error -show_entries format=duration \
      -of default=noprint_wrappers=1:nokey=1 "$src" | cut -d. -f1)

    for res_idx in "${!RESOLUTIONS[@]}"; do
        RES="${RESOLUTIONS[$res_idx]}"
        RES_LABEL=$(echo "$RES" | tr ':' 'x')

        for codec_idx in "${!CODECS_ENC[@]}"; do
            CODEC="${CODECS_ENC[$codec_idx]}"
            EXT="${CODECS_EXT[$codec_idx]}"

            START=$((RANDOM % (DURATION > 120 ? DURATION - 120 : 1)))
            CLIP_DUR=$((120 + RANDOM % 180))

            OUTFILE="$HOME/benchmark/media/tdarr_library/${BASENAME}_${RES_LABEL}_${EXT}_${COUNT}.mkv"

            echo "Creating clip $COUNT: $BASENAME @ $RES_LABEL ($EXT)"
            ffmpeg -ss "$START" -i "$src" -t "$CLIP_DUR" \
              -vf "scale=$RES" -c:v "$CODEC" -crf 22 -preset fast \
              -an -y "$OUTFILE" 2>/dev/null

            COUNT=$((COUNT + 1))
            [ $COUNT -ge 50 ] && break 3
        done
    done
done

echo "Created $COUNT test files in ~/benchmark/media/tdarr_library/"

Audio for Whisper Tests

# LibriSpeech test-clean dataset — standard speech recognition benchmark
wget -O ~/benchmark/media/audio/librispeech_test.tar.gz \
  "https://www.openslr.org/resources/12/test-clean.tar.gz"

tar -xzf ~/benchmark/media/audio/librispeech_test.tar.gz -C ~/benchmark/media/audio/

# Create a 10-minute WAV
find ~/benchmark/media/audio/LibriSpeech/test-clean -name "*.flac" | sort | head -30 | \
  while read f; do echo "file '$f'"; done > /tmp/audio_list.txt

ffmpeg -f concat -safe 0 -i /tmp/audio_list.txt \
  -t 600 -acodec pcm_s16le -ar 16000 -ac 1 \
  ~/benchmark/media/audio/test_10min.wav

# Create a 30-minute WAV for extended testing
find ~/benchmark/media/audio/LibriSpeech/test-clean -name "*.flac" | sort | \
  while read f; do echo "file '$f'"; done > /tmp/audio_list_full.txt

ffmpeg -f concat -safe 0 -i /tmp/audio_list_full.txt \
  -t 1800 -acodec pcm_s16le -ar 16000 -ac 1 \
  ~/benchmark/media/audio/test_30min.wav

Verify All Test Files

echo "=== Test Media Inventory ==="
echo ""
echo "--- Core files ---"
for f in ~/benchmark/media/*.{mkv,mp4,mov}; do
    [ -f "$f" ] || continue
    SIZE=$(du -h "$f" | cut -f1)
    DURATION=$(ffprobe -v error -show_entries format=duration \
      -of default=noprint_wrappers=1:nokey=1 "$f" 2>/dev/null | cut -d. -f1)
    CODEC=$(ffprobe -v error -select_streams v:0 \
      -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 "$f" 2>/dev/null)
    RES=$(ffprobe -v error -select_streams v:0 \
      -show_entries stream=width,height -of csv=p=0 "$f" 2>/dev/null)
    echo "  $(basename $f): ${SIZE} | ${DURATION}s | ${CODEC} | ${RES}"
done

echo ""
echo "--- Tdarr library: $(ls ~/benchmark/media/tdarr_library/ | wc -l) files ---"
echo "--- Audio: $(ls -lh ~/benchmark/media/audio/test_*.wav) ---"
echo "--- YOLO clips: $(ls -lh ~/benchmark/media/yolo_test_*.mkv) ---"

Monitoring Script

Save this to run in a second terminal during every test.

cat > ~/benchmark/scripts/monitor.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./monitor.sh [intel|nvidia] [test_name]

GPU_TYPE=${1:-intel}
TEST_NAME=${2:-test}
LOG_FILE="$HOME/benchmark/logs/${TEST_NAME}_${GPU_TYPE}_$(date +%Y%m%d_%H%M%S).csv"
mkdir -p ~/benchmark/logs

echo "timestamp,gpu_util,gpu_mem,gpu_temp,gpu_power,cpu_util" > "$LOG_FILE"
echo "Logging to: $LOG_FILE"
echo "Press Ctrl+C to stop."
echo ""

while true; do
    TIMESTAMP=$(date +%H:%M:%S)
    CPU_UTIL=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}')

    if [ "$GPU_TYPE" = "nvidia" ]; then
        GPU_DATA=$(nvidia-smi --query-gpu=utilization.gpu,memory.used,temperature.gpu,power.draw \
          --format=csv,noheader,nounits)
        GPU_UTIL=$(echo "$GPU_DATA" | cut -d',' -f1 | xargs)
        GPU_MEM=$(echo "$GPU_DATA" | cut -d',' -f2 | xargs)
        GPU_TEMP=$(echo "$GPU_DATA" | cut -d',' -f3 | xargs)
        GPU_POWER=$(echo "$GPU_DATA" | cut -d',' -f4 | xargs)
    else
        GPU_UTIL="N/A"  # intel_gpu_top doesn't support xe driver; use nvtop in another terminal
        GPU_MEM="N/A"
        GPU_TEMP=$(sensors 2>/dev/null | grep -i "edge" | awk '{print $2}' | tr -d '+°C' || echo "N/A")
        GPU_POWER="wall_meter"
    fi

    printf "%s | GPU: %s%% | Mem: %sMB | Temp: %s°C | Power: %sW | CPU: %s%%\
" \
      "$TIMESTAMP" "$GPU_UTIL" "$GPU_MEM" "$GPU_TEMP" "$GPU_POWER" "$CPU_UTIL"
    echo "$TIMESTAMP,$GPU_UTIL,$GPU_MEM,$GPU_TEMP,$GPU_POWER,$CPU_UTIL" >> "$LOG_FILE"
    sleep 2
done
SCRIPT

chmod +x ~/benchmark/scripts/monitor.sh

Install Ollama (GPU-agnostic — it auto-detects)

curl -fsSL https://ollama.com/install.sh | sh

Create Python venvs for AI tests

# Whisper venv
python3 -m venv ~/benchmark/venvs/whisper

# YOLO venv
python3 -m venv ~/benchmark/venvs/yolo

# ComfyUI venv (Stable Diffusion)
python3 -m venv ~/benchmark/venvs/comfyui

Phase 1: RTX 3090 Testing

Install the 3090 in your system. Power on.

1.0 — Nvidia Driver Setup (Fedora)

# RPM Fusion is the cleanest way to get Nvidia drivers on Fedora
sudo dnf install -y \
  https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
  https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm

sudo dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda nvidia-vaapi-driver

# Wait for the kernel module to build (can take a few minutes)
# Check status with:
sudo akmods --force
sudo dracut --force

# Reboot
sudo reboot

After reboot:

# Verify
nvidia-smi

# Should show RTX 3090, driver version, CUDA version
# If nvidia-smi fails, check: sudo dmesg | grep -i nvidia

# Patch NVENC session limit (consumer cards have a 5-session cap)
# Clone and apply the nvidia-patch:
cd ~/benchmark
git clone https://github.com/keylase/nvidia-patch.git
cd nvidia-patch
sudo bash patch.sh

# Record idle power
# Note your Kill-A-Watt reading with the system idle at desktop
~/benchmark/scripts/record_score.sh P.1 nvidia <IDLE_WATTS>
~/benchmark/scripts/record_score.sh P.4 nvidia <PURCHASE_PRICE>

1.1 — FFmpeg HEVC Encode (Nvidia)

mkdir -p ~/benchmark/results/ffmpeg

# In a second terminal, start monitoring:
# ~/benchmark/scripts/monitor.sh nvidia hevc_encode

# HEVC encode — 4K to 4K
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i ~/benchmark/media/4k_h264.mkv \
  -c:v hevc_nvenc -preset p5 -rc constqp -qp 25 \
  -an -y ~/benchmark/results/ffmpeg/4k_hevc_nvenc.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/hevc_nvenc_log.txt

# ➜ Record the FPS from the final line of output
~/benchmark/scripts/record_score.sh 1.1a nvidia <FPS>

# HEVC encode — 4K to 1080p transcode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i ~/benchmark/media/4k_h264.mkv \
  -vf 'scale_cuda=1920:1080' \
  -c:v hevc_nvenc -preset p5 -rc constqp -qp 25 \
  -an -y ~/benchmark/results/ffmpeg/4k_to_1080p_hevc_nvenc.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/hevc_nvenc_downscale_log.txt

# ➜ Record FPS
~/benchmark/scripts/record_score.sh 1.1b nvidia <FPS>

# ➜ Record peak wall power from Kill-A-Watt
~/benchmark/scripts/record_score.sh P.2 nvidia <PEAK_WATTS>

1.2 — FFmpeg AV1 Encode (Nvidia)

# AV1 encode — 4K to 4K
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i ~/benchmark/media/4k_h264.mkv \
  -c:v av1_nvenc -preset p5 -rc constqp -qp 30 \
  -an -y ~/benchmark/results/ffmpeg/4k_av1_nvenc.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/av1_nvenc_log.txt

# ➜ Record FPS
~/benchmark/scripts/record_score.sh 1.2a nvidia <FPS>

# AV1 encode — 4K to 1080p
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i ~/benchmark/media/4k_h264.mkv \
  -vf 'scale_cuda=1920:1080' \
  -c:v av1_nvenc -preset p5 -rc constqp -qp 30 \
  -an -y ~/benchmark/results/ffmpeg/4k_to_1080p_av1_nvenc.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/av1_nvenc_downscale_log.txt

# ➜ Record FPS
~/benchmark/scripts/record_score.sh 1.2b nvidia <FPS>

1.3 — VMAF Quality Scoring (Nvidia outputs)

Run these after encoding. VMAF uses the CPU so it doesn't matter which GPU is installed, but we run it now while the encoded files are fresh.

# HEVC quality
ffmpeg -i ~/benchmark/results/ffmpeg/4k_hevc_nvenc.mkv \
  -i ~/benchmark/media/4k_h264.mkv \
  -lavfi libvmaf="model=version=vmaf_v0.6.1:log_path=vmaf_hevc_nvenc.json:log_fmt=json" \
  -f null - 2>&1 | tee ~/benchmark/results/ffmpeg/vmaf_hevc_nvenc.txt

# ➜ Record mean VMAF score from output
~/benchmark/scripts/record_score.sh 1.3a nvidia <VMAF_SCORE>

# AV1 quality
ffmpeg -i ~/benchmark/results/ffmpeg/4k_av1_nvenc.mkv \
  -i ~/benchmark/media/4k_h264.mkv \
  -lavfi libvmaf="model=version=vmaf_v0.6.1:log_path=vmaf_av1_nvenc.json:log_fmt=json" \
  -f null - 2>&1 | tee ~/benchmark/results/ffmpeg/vmaf_av1_nvenc.txt

# ➜ Record mean VMAF score
~/benchmark/scripts/record_score.sh 1.3b nvidia <VMAF_SCORE>

VMAF scores range 0–100. Above 93 is excellent. This tells you which encoder produces better-looking video at the same effort level.

1.4 — HDR Tone Mapping Transcode (Nvidia)

Real-world scenario — HDR content transcoded to SDR for most displays.

# Option A: Full GPU pipeline (requires ffmpeg built with scale_cuda + tonemap_cuda)
ffmpeg -init_hw_device cuda=hw -filter_hw_device hw \
  -i ~/benchmark/media/4k_hevc_hdr.mkv \
  -vf 'hwupload_cuda,scale_cuda=w=1920:h=1080:format=nv12,tonemap_cuda=tonemap=hable:peak=100:desat=0' \
  -c:v hevc_nvenc -preset p5 -rc constqp -qp 25 \
  -an -y ~/benchmark/results/ffmpeg/hdr_tonemap_nvenc.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/hdr_tonemap_nvenc_log.txt

# Option B: CPU tonemap + GPU encode (Fedora — no scale_cuda)
ffmpeg -hwaccel cuda \
  -i ~/benchmark/media/4k_hevc_hdr.mkv \
  -vf 'zscale=t=linear:npl=100,format=gbrpf32le,zscale=p=bt709:t=bt709:m=bt709:r=tv,format=yuv420p,scale=1920:1080' \
  -c:v hevc_nvenc -preset p5 -rc constqp -qp 25 \
  -an -y ~/benchmark/results/ffmpeg/hdr_tonemap_nvenc.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/hdr_tonemap_nvenc_log.txt

# ➜ Record FPS
~/benchmark/scripts/record_score.sh 1.4 nvidia <FPS>

Note: FFmpeg's HDR tone mapping pipeline varies by version. If this errors out, you may need a build with --enable-libplacebo. Check your FFmpeg version with ffmpeg -version.

1.5 — Maximum Simultaneous Streams (Nvidia)

The headline number. "How many 4K streams can the 3090 handle?"

cat > ~/benchmark/scripts/stream_stress.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./stream_stress.sh [intel|nvidia] [max_streams]

GPU_TYPE=${1:-nvidia}
MAX_STREAMS=${2:-20}
SOURCE_FILE="$HOME/benchmark/media/4k_hevc_hdr.mkv"
RESULTS_DIR="$HOME/benchmark/results/streams"
mkdir -p "$RESULTS_DIR"

# Clean up old stream files
rm -f "${RESULTS_DIR}"/stream_*.mkv "${RESULTS_DIR}"/stream_*_log.txt

PIDS=()

for i in $(seq 1 $MAX_STREAMS); do
    echo "Starting stream $i..."

    if [ "$GPU_TYPE" = "intel" ]; then
        ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
          -stream_loop -1 -i "$SOURCE_FILE" \
          -vf 'vpp_qsv=w=1920:h=1080' \
          -c:v hevc_qsv -preset fast -global_quality 28 \
          -t 120 -an -y "${RESULTS_DIR}/stream_${i}.mkv" \
          > "${RESULTS_DIR}/stream_${i}_log.txt" 2>&1 &
    else
        ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
          -stream_loop -1 -i "$SOURCE_FILE" \
          -vf 'scale_cuda=1920:1080' \
          -c:v hevc_nvenc -preset p4 -rc constqp -qp 28 \
          -t 120 -an -y "${RESULTS_DIR}/stream_${i}.mkv" \
          > "${RESULTS_DIR}/stream_${i}_log.txt" 2>&1 &
    fi

    PIDS+=($!)
    sleep 3

    if ! kill -0 ${PIDS[-1]} 2>/dev/null; then
        echo "Stream $i FAILED. Max streams: $((i-1))"
        break
    fi

    echo "--- Stream $i started (PID: ${PIDS[-1]}) ---"
done

echo ""
echo "All streams launched. Running for 2 minutes..."
echo "Monitor GPU in another terminal."
echo "PIDs: ${PIDS[@]}"
echo ""
echo "To stop all: kill ${PIDS[@]}"

wait

# Check FPS of each stream from logs
echo ""
echo "=== Stream Results ==="
for i in $(seq 1 ${#PIDS[@]}); do
    FPS=$(grep -oP 'fps=\s*\K[0-9.]+' "${RESULTS_DIR}/stream_${i}_log.txt" | tail -1)
    echo "  Stream $i: ${FPS:-FAILED} FPS"
done
SCRIPT

chmod +x ~/benchmark/scripts/stream_stress.sh
# Terminal 1: Monitor
~/benchmark/scripts/monitor.sh nvidia stream_stress

# Terminal 2: Run test — start with 10, increase if all succeed
~/benchmark/scripts/stream_stress.sh nvidia 10

# ➜ Record max streams that maintained real-time FPS (24+ for film, 30+ for TV)
~/benchmark/scripts/record_score.sh 1.5 nvidia <MAX_STREAMS>

Nvidia session limit: Even after patching, verify with more than 5 streams. If streams fail at exactly 5, the patch didn't apply. Re-run sudo bash patch.sh after checking your driver version.

1.6 — Tdarr Batch Library Conversion (Nvidia)

# Reset Tdarr library to original state (copy fresh if you have a backup)
# Then launch Tdarr with Nvidia GPU access

docker run -d \
  --name tdarr-nvidia \
  --gpus all \
  -p 8265:8265 \
  -p 8266:8266 \
  -v ~/benchmark/tdarr/server:/app/server \
  -v ~/benchmark/tdarr/configs:/app/configs \
  -v ~/benchmark/tdarr/logs:/app/logs \
  -v ~/benchmark/media/tdarr_library:/media \
  -v ~/benchmark/results/tdarr:/temp \
  ghcr.io/haveagitgat/tdarr:latest

Test procedure:

  1. Open Tdarr at http://localhost:8265
  2. Configure library pointing to /media
  3. Set transcode plugin: convert everything to HEVC at a target bitrate
  4. Set GPU worker count to 1
  5. Start a timer (time in terminal or stopwatch)
  6. Let it process the entire 50-file library
  7. Record total time when complete
# ➜ Record total time in seconds
~/benchmark/scripts/record_score.sh 1.6 nvidia <TOTAL_SECONDS>

# Clean up
docker stop tdarr-nvidia && docker rm tdarr-nvidia

2.1 — LLM Inference (Nvidia)

# Pull test models
ollama pull llama3:8b-instruct-q4_K_M
ollama pull llama3:8b-instruct-q8_0
ollama pull mistral:7b-instruct-q4_K_M
ollama pull gemma2:27b-instruct-q4_K_M

# Verify GPU is being used
ollama run llama3:8b-instruct-q4_K_M "Hello" --verbose 2>&1 | head -5
# Should show CUDA in the output
cat > ~/benchmark/scripts/llm_bench.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./llm_bench.sh <model_name> <gpu_label>
# Example: ./llm_bench.sh llama3:8b-instruct-q4_K_M nvidia

MODEL=${1:-"llama3:8b-instruct-q4_K_M"}
GPU_LABEL=${2:-"unknown"}
RESULTS_DIR="$HOME/benchmark/results/llm"
mkdir -p "$RESULTS_DIR"

PROMPTS=(
    "Write a detailed guide on setting up a home server with Proxmox, including network configuration and VM creation."
    "Explain the differences between ZFS and Btrfs for a NAS. Include performance considerations, snapshot capabilities, and hardware requirements."
    "Write a Python script that monitors Docker container health, restarts unhealthy containers, and sends notifications via webhook."
)

SAFE_MODEL=$(echo "$MODEL" | tr '/:' '__')
OUTFILE="${RESULTS_DIR}/${SAFE_MODEL}_${GPU_LABEL}.csv"
echo "model,gpu,prompt_num,prompt_tokens,prompt_tps,gen_tokens,gen_tps,total_sec" > "$OUTFILE"

echo ""
echo "========================================"
echo "Testing: $MODEL on $GPU_LABEL"
echo "========================================"

GEN_TPS_SUM=0
GEN_TPS_COUNT=0

for i in "${!PROMPTS[@]}"; do
    PROMPT="${PROMPTS[$i]}"
    PROMPT_NUM=$((i+1))
    echo ""
    echo "--- Prompt $PROMPT_NUM: ${PROMPT:0:60}... ---"

    RESPONSE=$(curl -s http://localhost:11434/api/generate \
      -d "{
        \"model\": \"$MODEL\",
        \"prompt\": \"$PROMPT\",
        \"stream\": false
      }")

    TOTAL_DURATION=$(echo "$RESPONSE" | jq -r '.total_duration // 0')
    PROMPT_EVAL_COUNT=$(echo "$RESPONSE" | jq -r '.prompt_eval_count // 0')
    PROMPT_EVAL_DURATION=$(echo "$RESPONSE" | jq -r '.prompt_eval_duration // 0')
    EVAL_COUNT=$(echo "$RESPONSE" | jq -r '.eval_count // 0')
    EVAL_DURATION=$(echo "$RESPONSE" | jq -r '.eval_duration // 0')

    if [ "$PROMPT_EVAL_DURATION" -gt 0 ]; then
        PROMPT_TPS=$(echo "scale=2; $PROMPT_EVAL_COUNT / ($PROMPT_EVAL_DURATION / 1000000000)" | bc)
    else
        PROMPT_TPS="0"
    fi

    if [ "$EVAL_DURATION" -gt 0 ]; then
        GEN_TPS=$(echo "scale=2; $EVAL_COUNT / ($EVAL_DURATION / 1000000000)" | bc)
    else
        GEN_TPS="0"
    fi

    TOTAL_SEC=$(echo "scale=2; $TOTAL_DURATION / 1000000000" | bc)

    echo "  Prompt: ${PROMPT_EVAL_COUNT} tokens @ ${PROMPT_TPS} tok/s"
    echo "  Generation: ${EVAL_COUNT} tokens @ ${GEN_TPS} tok/s"
    echo "  Total: ${TOTAL_SEC}s"

    echo "$MODEL,$GPU_LABEL,$PROMPT_NUM,$PROMPT_EVAL_COUNT,$PROMPT_TPS,$EVAL_COUNT,$GEN_TPS,$TOTAL_SEC" >> "$OUTFILE"

    GEN_TPS_SUM=$(echo "$GEN_TPS_SUM + $GEN_TPS" | bc)
    GEN_TPS_COUNT=$((GEN_TPS_COUNT + 1))

    sleep 5
done

AVG_GEN_TPS=$(echo "scale=2; $GEN_TPS_SUM / $GEN_TPS_COUNT" | bc)
echo ""
echo ">>> AVERAGE generation speed: ${AVG_GEN_TPS} tok/s <<<"
echo ">>> Use this value for the scorecard <<<"
echo ""
echo "Results saved to: $OUTFILE"
SCRIPT

chmod +x ~/benchmark/scripts/llm_bench.sh
# Run all models on Nvidia
~/benchmark/scripts/llm_bench.sh "llama3:8b-instruct-q4_K_M" nvidia
# ➜ Record the average gen tok/s
~/benchmark/scripts/record_score.sh 2.1a nvidia <AVG_GEN_TPS>

sleep 10

~/benchmark/scripts/llm_bench.sh "llama3:8b-instruct-q8_0" nvidia
~/benchmark/scripts/record_score.sh 2.1b nvidia <AVG_GEN_TPS>

sleep 10

~/benchmark/scripts/llm_bench.sh "mistral:7b-instruct-q4_K_M" nvidia
~/benchmark/scripts/record_score.sh 2.1c nvidia <AVG_GEN_TPS>

sleep 10

~/benchmark/scripts/llm_bench.sh "gemma2:27b-instruct-q4_K_M" nvidia
~/benchmark/scripts/record_score.sh 2.1d nvidia <AVG_GEN_TPS>

# ➜ Record peak AI inference power
~/benchmark/scripts/record_score.sh P.3 nvidia <PEAK_WATTS>

2.2 — Stable Diffusion / Image Gen (Nvidia)

cd ~/benchmark
git clone https://github.com/comfyanonymous/ComfyUI.git comfyui
cd comfyui

source ~/benchmark/venvs/comfyui/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Download SDXL model (you'll reuse this for the Intel round)
# Place sd_xl_base_1.0.safetensors in ~/benchmark/comfyui/models/checkpoints/

# Start ComfyUI
python main.py --listen 0.0.0.0 &
sleep 10
cat > ~/benchmark/scripts/sd_bench.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./sd_bench.sh <gpu_label>

GPU_LABEL=${1:-"unknown"}
RESULTS_DIR="$HOME/benchmark/results/sd"
mkdir -p "$RESULTS_DIR"

PROMPTS=(
    "A photorealistic image of a rack-mounted home server setup in a clean room, blue LED lights, cable management, 8K"
    "An isometric illustration of a network diagram showing VPN tunnels between data centers, technical style"
    "A cinematic landscape of a futuristic data center built into a mountain, volumetric lighting, ultra detailed"
)

STEPS=30
CFG=7.0
WIDTH=1024
HEIGHT=1024
SEED=42

TIME_SUM=0
COUNT=0

for i in "${!PROMPTS[@]}"; do
    PROMPT="${PROMPTS[$i]}"
    echo "--- Image $((i+1)): ${PROMPT:0:60}... ---"

    START=$(date +%s%N)

    curl -s -X POST http://127.0.0.1:8188/prompt \
      -H "Content-Type: application/json" \
      -d "{
        \"prompt\": {
          \"1\": {\"class_type\": \"CheckpointLoaderSimple\", \"inputs\": {\"ckpt_name\": \"sd_xl_base_1.0.safetensors\"}},
          \"2\": {\"class_type\": \"CLIPTextEncode\", \"inputs\": {\"text\": \"$PROMPT\", \"clip\": [\"1\", 1]}},
          \"3\": {\"class_type\": \"CLIPTextEncode\", \"inputs\": {\"text\": \"bad quality, blurry\", \"clip\": [\"1\", 1]}},
          \"4\": {\"class_type\": \"EmptyLatentImage\", \"inputs\": {\"width\": $WIDTH, \"height\": $HEIGHT, \"batch_size\": 1}},
          \"5\": {\"class_type\": \"KSampler\", \"inputs\": {\"model\": [\"1\", 0], \"positive\": [\"2\", 0], \"negative\": [\"3\", 0], \"latent_image\": [\"4\", 0], \"seed\": $SEED, \"steps\": $STEPS, \"cfg\": $CFG, \"sampler_name\": \"euler\", \"scheduler\": \"normal\", \"denoise\": 1.0}},
          \"6\": {\"class_type\": \"VAEDecode\", \"inputs\": {\"samples\": [\"5\", 0], \"vae\": [\"1\", 2]}},
          \"7\": {\"class_type\": \"SaveImage\", \"inputs\": {\"images\": [\"6\", 0], \"filename_prefix\": \"bench_${GPU_LABEL}_$((i+1))\"}}
        }
      }" > /dev/null

    while [ "$(curl -s http://127.0.0.1:8188/queue | jq '.queue_running | length')" -gt 0 ]; do
        sleep 1
    done

    END=$(date +%s%N)
    ELAPSED=$(echo "scale=2; ($END - $START) / 1000000000" | bc)
    IT_PER_SEC=$(echo "scale=2; $STEPS / $ELAPSED" | bc)

    echo "  Time: ${ELAPSED}s | Speed: ${IT_PER_SEC} it/s"
    echo "prompt_$((i+1)),$GPU_LABEL,$ELAPSED,$IT_PER_SEC,$STEPS,$WIDTH,$HEIGHT" >> "$RESULTS_DIR/sdxl_${GPU_LABEL}.csv"

    TIME_SUM=$(echo "$TIME_SUM + $ELAPSED" | bc)
    COUNT=$((COUNT + 1))
    sleep 5
done

AVG_TIME=$(echo "scale=2; $TIME_SUM / $COUNT" | bc)
echo ""
echo ">>> AVERAGE time per image: ${AVG_TIME}s <<<"
SCRIPT

chmod +x ~/benchmark/scripts/sd_bench.sh
~/benchmark/scripts/sd_bench.sh nvidia

# ➜ Record average seconds per image
~/benchmark/scripts/record_score.sh 2.2 nvidia <AVG_SECONDS>

# Kill ComfyUI
pkill -f "python main.py"

Alternative: If ComfyUI is finicky, use stable-diffusion.cpp which has both CUDA and SYCL backends for a more apples-to-apples CLI comparison.

2.3 — Whisper Transcription (Nvidia)

source ~/benchmark/venvs/whisper/bin/activate
pip install faster-whisper nvidia-cublas-cu12 nvidia-cudnn-cu12

# CTranslate2 needs to find the CUDA libs installed in the venv
export LD_LIBRARY_PATH=$(python3 -c "import nvidia.cublas.lib; print(nvidia.cublas.lib.__path__[0])"):$(python3 -c "import nvidia.cudnn.lib; print(nvidia.cudnn.lib.__path__[0])"):$LD_LIBRARY_PATH nvidia-cublas-cu12 nvidia-cudnn-cu12

# CTranslate2 needs to find the CUDA libs installed in the venv
export LD_LIBRARY_PATH=$(python3 -c "import nvidia.cublas.lib; print(nvidia.cublas.lib.__path__[0])"):$(python3 -c "import nvidia.cudnn.lib; print(nvidia.cudnn.lib.__path__[0])"):$LD_LIBRARY_PATH

cat > ~/benchmark/scripts/whisper_bench.py << 'PYEOF'
import sys
import time
from faster_whisper import WhisperModel

gpu_label = sys.argv[1] if len(sys.argv) > 1 else "unknown"
device = sys.argv[2] if len(sys.argv) > 2 else "cuda"
compute = sys.argv[3] if len(sys.argv) > 3 else "float16"

audio_file = f"{__import__('os').path.expanduser('~')}/benchmark/media/audio/test_10min.wav"
results_dir = f"{__import__('os').path.expanduser('~')}/benchmark/results/whisper"

models_to_test = ["small", "medium", "large-v3"]

print(f"\
Whisper benchmark — GPU: {gpu_label}, device: {device}, compute: {compute}")
print("=" * 60)

for model_name in models_to_test:
    print(f"\
--- Testing {model_name} ---")

    model = WhisperModel(model_name, device=device, compute_type=compute)

    start = time.time()
    segments, info = model.transcribe(audio_file, beam_size=5)
    for segment in segments:
        pass
    elapsed = time.time() - start

    audio_duration = info.duration
    rtf = elapsed / audio_duration

    print(f"  Audio: {audio_duration:.1f}s | Time: {elapsed:.1f}s | RTF: {rtf:.3f}x")

    with open(f"{results_dir}/whisper_{gpu_label}.csv", "a") as f:
        f.write(f"{model_name},{gpu_label},{audio_duration:.1f},{elapsed:.1f},{rtf:.3f}\
")

    del model
    time.sleep(5)

print("\
Done! Record the RTF values (lower = faster)")
PYEOF
mkdir -p ~/benchmark/results/whisper
python3 ~/benchmark/scripts/whisper_bench.py nvidia cuda float16

# ➜ Record RTF for each model size
~/benchmark/scripts/record_score.sh 2.3a nvidia <RTF_SMALL>
~/benchmark/scripts/record_score.sh 2.3b nvidia <RTF_MEDIUM>
~/benchmark/scripts/record_score.sh 2.3c nvidia RTF_HERE

deactivate
unset LD_LIBRARY_PATH

2.4 — YOLO Object Detection (Nvidia)

source ~/benchmark/venvs/yolo/bin/activate
pip install ultralytics opencv-python-headless

cat > ~/benchmark/scripts/yolo_bench.py << 'PYEOF'
import sys
import time
from ultralytics import YOLO
import cv2

gpu_label = sys.argv[1] if len(sys.argv) > 1 else "unknown"
backend = sys.argv[2] if len(sys.argv) > 2 else "cuda"  # "cuda" or "openvino"
home = __import__('os').path.expanduser('~')

video_path = f"{home}/benchmark/media/yolo_test_1080p.mkv"
results_dir = f"{home}/benchmark/results/yolo"

model_names = ["yolov8n", "yolov8s"]

for model_name in model_names:
    print(f"\
--- {model_name} on {gpu_label} ({backend}) ---")

    if backend == "openvino":
        base_model = YOLO(f"{model_name}.pt")
        base_model.export(format="openvino")
        model = YOLO(f"{model_name}_openvino_model/", task="detect")
    else:
        model = YOLO(f"{model_name}.pt")
        model.to("cuda")

    cap = cv2.VideoCapture(video_path)
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    # Warm up
    for _ in range(10):
        ret, frame = cap.read()
        if ret:
            model(frame, verbose=False)
    cap.set(cv2.CAP_PROP_POS_FRAMES, 0)

    frame_count = 0
    total_inference_time = 0

    if backend == "cuda":
        import torch

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        if backend == "cuda":
            torch.cuda.synchronize()
        start = time.time()
        results = model(frame, verbose=False)
        if backend == "cuda":
            torch.cuda.synchronize()
        elapsed = time.time() - start

        total_inference_time += elapsed
        frame_count += 1

        if frame_count % 200 == 0:
            print(f"  Frame {frame_count}/{total_frames} | Avg FPS: {frame_count/total_inference_time:.1f}")

    cap.release()

    avg_fps = frame_count / total_inference_time
    avg_ms = (total_inference_time / frame_count) * 1000

    print(f"  Result: {avg_fps:.1f} FPS ({avg_ms:.1f}ms/frame)")

    with open(f"{results_dir}/yolo_{gpu_label}.csv", "a") as f:
        f.write(f"{model_name},{gpu_label},{frame_count},{total_inference_time:.1f},{avg_fps:.1f},{avg_ms:.1f}\
")

    del model
    time.sleep(5)
PYEOF
mkdir -p ~/benchmark/results/yolo
python3 ~/benchmark/scripts/yolo_bench.py nvidia cuda

# ➜ Record FPS for each model
~/benchmark/scripts/record_score.sh 2.4a nvidia <YOLOV8N_FPS>
~/benchmark/scripts/record_score.sh 2.4b nvidia <YOLOV8S_FPS>

deactivate

Nvidia Phase Complete — Check Scores

~/benchmark/scripts/show_scores.sh

You should see all Nvidia columns filled in. Take a screenshot of the scorecard for your video.


GPU Swap

  1. Power off the system completely
  2. Remove the RTX 3090
  3. Install the Intel Arc Pro B60
  4. Power on

Important: Your Nvidia drivers will still be installed but won't load without the hardware. This is fine — they won't interfere with Intel drivers. If you want a clean slate, you can sudo dnf remove akmod-nvidia* before swapping, but it's not strictly necessary.


Phase 2: Intel Arc Pro B60 Testing

2.0 — BIOS Setup (Required)

The Arc Pro B60 will not be detected by the system without these BIOS settings:

  • Enable "Above 4G Decoding" — Required for Intel Arc on AMD boards
  • Enable "Re-Size BAR Support" — Appears after enabling Above 4G
  • Disable CSM (Compatibility Support Module) — Arc requires UEFI boot
  • Set primary display to PCIe if available

Verify the card is detected after BIOS changes:

lspci | grep -i vga
# Should show: Intel Corporation Battlemage G21 [Arc Pro B60]
ls /dev/dri/
# Should show: card0  renderD128

2.0b — Intel Driver Setup (Fedora)

# Intel media driver (VA-API), compute runtime (Level Zero), and GPU monitoring
sudo dnf install -y intel-media-driver intel-compute-runtime \
  intel-level-zero oneapi-level-zero igt-gpu-tools nvtop

# Verify VA-API (the B60 uses the xe kernel driver, not i915)
vainfo --display drm --device /dev/dri/renderD128 2>&1 | head -10

# GPU monitoring — intel_gpu_top does NOT work with xe driver, use nvtop instead
nvtop

# Install OneAPI (for AI workloads)
cat > /tmp/oneAPI.repo << 'REPO'
[oneAPI]
name=Intel oneAPI
baseurl=https://yum.repos.intel.com/oneapi
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
REPO
sudo mv /tmp/oneAPI.repo /etc/yum.repos.d/
sudo dnf install -y intel-basekit

sudo reboot

After reboot:

vainfo --display drm --device /dev/dri/renderD128 2>&1 | head -5

~/benchmark/scripts/record_score.sh P.1 intel IDLE_WATTS_HERE
~/benchmark/scripts/record_score.sh P.4 intel PRICE_HERE

1.1 — FFmpeg HEVC Encode (Intel)

# Terminal 2: ~/benchmark/scripts/monitor.sh intel hevc_encode

# HEVC encode — 4K to 4K
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
  -i ~/benchmark/media/4k_h264.mkv \
  -c:v hevc_qsv -preset medium -global_quality 25 \
  -an -y ~/benchmark/results/ffmpeg/4k_hevc_qsv.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/hevc_qsv_log.txt

~/benchmark/scripts/record_score.sh 1.1a intel <FPS>

# HEVC encode — 4K to 1080p transcode
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
  -i ~/benchmark/media/4k_h264.mkv \
  -vf 'vpp_qsv=w=1920:h=1080' \
  -c:v hevc_qsv -preset medium -global_quality 25 \
  -an -y ~/benchmark/results/ffmpeg/4k_to_1080p_hevc_qsv.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/hevc_qsv_downscale_log.txt

~/benchmark/scripts/record_score.sh 1.1b intel <FPS>
~/benchmark/scripts/record_score.sh P.2 intel <PEAK_WATTS>

1.2 — FFmpeg AV1 Encode (Intel)

This is where the B60 should show its strength.

# AV1 encode — 4K to 4K
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
  -i ~/benchmark/media/4k_h264.mkv \
  -c:v av1_qsv -preset medium -global_quality 30 \
  -an -y ~/benchmark/results/ffmpeg/4k_av1_qsv.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/av1_qsv_log.txt

~/benchmark/scripts/record_score.sh 1.2a intel <FPS>

# AV1 encode — 4K to 1080p
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
  -i ~/benchmark/media/4k_h264.mkv \
  -vf 'vpp_qsv=w=1920:h=1080' \
  -c:v av1_qsv -preset medium -global_quality 30 \
  -an -y ~/benchmark/results/ffmpeg/4k_to_1080p_av1_qsv.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/av1_qsv_downscale_log.txt

~/benchmark/scripts/record_score.sh 1.2b intel <FPS>

1.3 — VMAF Quality Scoring (Intel outputs)

ffmpeg -i ~/benchmark/results/ffmpeg/4k_hevc_qsv.mkv \
  -i ~/benchmark/media/4k_h264.mkv \
  -lavfi libvmaf="model=version=vmaf_v0.6.1:log_path=vmaf_hevc_qsv.json:log_fmt=json" \
  -f null - 2>&1 | tee ~/benchmark/results/ffmpeg/vmaf_hevc_qsv.txt

~/benchmark/scripts/record_score.sh 1.3a intel <VMAF_SCORE>

ffmpeg -i ~/benchmark/results/ffmpeg/4k_av1_qsv.mkv \
  -i ~/benchmark/media/4k_h264.mkv \
  -lavfi libvmaf="model=version=vmaf_v0.6.1:log_path=vmaf_av1_qsv.json:log_fmt=json" \
  -f null - 2>&1 | tee ~/benchmark/results/ffmpeg/vmaf_av1_qsv.txt

~/benchmark/scripts/record_score.sh 1.3b intel <VMAF_SCORE>

1.4 — HDR Tone Mapping Transcode (Intel)

ffmpeg -init_hw_device qsv=hw -filter_hw_device hw \
  -i ~/benchmark/media/4k_hevc_hdr.mkv \
  -vf 'hwupload=extra_hw_frames=64,vpp_qsv=w=1920:h=1080:tonemap=1' \
  -c:v hevc_qsv -preset medium -global_quality 25 \
  -an -y ~/benchmark/results/ffmpeg/hdr_tonemap_qsv.mkv \
  2>&1 | tee ~/benchmark/results/ffmpeg/hdr_tonemap_qsv_log.txt

~/benchmark/scripts/record_score.sh 1.4 intel <FPS>

1.5 — Maximum Simultaneous Streams (Intel)

# Terminal 1: ~/benchmark/scripts/monitor.sh intel stream_stress
# Terminal 2:
~/benchmark/scripts/stream_stress.sh intel 10

# ➜ Record max real-time streams
~/benchmark/scripts/record_score.sh 1.5 intel <MAX_STREAMS>

No session limit patch needed for Intel — unlimited encode sessions out of the box. That's worth mentioning in the video.

1.6 — Tdarr Batch Library Conversion (Intel)

# Reset the Tdarr library files to their original state first!
# (Re-run the batch library generation script from Phase 0 if needed)

docker run -d \
  --name tdarr-intel \
  --device /dev/dri:/dev/dri \
  -p 8265:8265 \
  -p 8266:8266 \
  -v ~/benchmark/tdarr/server-intel:/app/server \
  -v ~/benchmark/tdarr/configs-intel:/app/configs \
  -v ~/benchmark/tdarr/logs-intel:/app/logs \
  -v ~/benchmark/media/tdarr_library:/media \
  -v ~/benchmark/results/tdarr:/temp \
  ghcr.io/haveagitgat/tdarr:latest

# Same procedure: configure library, set HEVC target, 1 GPU worker, time it

~/benchmark/scripts/record_score.sh 1.6 intel <TOTAL_SECONDS>

docker stop tdarr-intel && docker rm tdarr-intel

2.1 — LLM Inference (Intel)

Important: Ollama does NOT support Intel Arc GPUs — it will fall back to 100% CPU. Use llama.cpp built with the SYCL backend instead.

# Source OneAPI environment (required for SYCL)
source /opt/intel/oneapi/setvars.sh

# Build llama.cpp with SYCL backend
cd ~/benchmark
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build
cmake .. -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
cmake --build . --config Release -j$(nproc)

Run benchmarks using Ollama model blobs (they're standard GGUF files):

# Find model file paths from Ollama's cache
ollama show llama3:8b-instruct-q4_K_M --modelfile 2>&1 | grep "^FROM /"
# Returns something like: FROM /home/brandon/.ollama/models/blobs/sha256-...

# Run llama-bench for each model
echo "=== Llama3 8B Q4 ==="
./build/bin/llama-bench \
  -m $(ollama show llama3:8b-instruct-q4_K_M --modelfile 2>&1 | grep "^FROM /" | awk '{print $2}')
~/benchmark/scripts/record_score.sh 2.1a intel <GEN_TPS>

echo "=== Llama3 8B Q8 ==="
./build/bin/llama-bench \
  -m $(ollama show llama3:8b-instruct-q8_0 --modelfile 2>&1 | grep "^FROM /" | awk '{print $2}')
~/benchmark/scripts/record_score.sh 2.1b intel <GEN_TPS>

echo "=== Mistral 7B Q4 ==="
./build/bin/llama-bench \
  -m $(ollama show mistral:7b-instruct-q4_K_M --modelfile 2>&1 | grep "^FROM /" | awk '{print $2}')
~/benchmark/scripts/record_score.sh 2.1c intel <GEN_TPS>

echo "=== Gemma2 27B Q4 ==="
./build/bin/llama-bench \
  -m $(ollama show gemma2:27b-instruct-q4_K_M --modelfile 2>&1 | grep "^FROM /" | awk '{print $2}')
~/benchmark/scripts/record_score.sh 2.1d intel <GEN_TPS>

~/benchmark/scripts/record_score.sh P.3 intel <PEAK_WATTS>

2.2 — Stable Diffusion / Image Gen (Intel)

cd ~/benchmark/comfyui

# Reinstall PyTorch with Intel support
source ~/benchmark/venvs/comfyui/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install openvino optimum[openvino]
# Follow ComfyUI OpenVINO integration instructions for your version

python main.py --listen 0.0.0.0 &
sleep 10

~/benchmark/scripts/sd_bench.sh intel
~/benchmark/scripts/record_score.sh 2.2 intel <AVG_SECONDS>

pkill -f "python main.py"
deactivate

Alternative: If ComfyUI + OpenVINO is too finicky, use stable-diffusion.cpp with the SYCL backend for Intel GPU support.

2.3 — Whisper Transcription (Intel)

Note: faster-whisper (CTranslate2) has no Intel XPU backend — it falls back to CPU. Use whisper.cpp with SYCL backend instead.

# Source OneAPI environment
source /opt/intel/oneapi/setvars.sh

# Build whisper.cpp with SYCL backend (same pattern as llama.cpp)
cd ~/benchmark
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
mkdir build && cd build
cmake .. -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
cmake --build . --config Release -j$(nproc)

# Download models
bash ../models/download-ggml-model.sh small
bash ../models/download-ggml-model.sh medium
bash ../models/download-ggml-model.sh large-v3

# Run benchmarks
for MODEL in small medium large-v3; do
    echo "=== Whisper $MODEL ==="
    time ./build/bin/whisper-cli \
      -m models/ggml-${MODEL}.bin \
      -f ~/benchmark/media/audio/test_10min.wav \
      --no-prints
done

# Calculate RTF = processing_time / audio_duration (600s)
~/benchmark/scripts/record_score.sh 2.3a intel <RTF_SMALL>
~/benchmark/scripts/record_score.sh 2.3b intel <RTF_MEDIUM>
~/benchmark/scripts/record_score.sh 2.3c intel <RTF_LARGE_V3>

2.4 — YOLO Object Detection (Intel)

source ~/benchmark/venvs/yolo/bin/activate
pip install openvino

python3 ~/benchmark/scripts/yolo_bench.py intel openvino

~/benchmark/scripts/record_score.sh 2.4a intel <YOLOV8N_FPS>
~/benchmark/scripts/record_score.sh 2.4b intel <YOLOV8S_FPS>

deactivate

Phase 3: Final Results

View the Scorecard

~/benchmark/scripts/show_scores.sh

Generate Efficiency Report

cat > ~/benchmark/scripts/efficiency.sh << 'SCRIPT'
#!/bin/bash
SCORECARD="$HOME/benchmark/results/scorecard.csv"

echo ""
echo "╔══════════════════════════════════════════════════════════════════════════════╗"
echo "║                        EFFICIENCY ANALYSIS                                  ║"
echo "╚══════════════════════════════════════════════════════════════════════════════╝"

# Extract power and price
N_PEAK=$(awk -F',' '$1=="P.2" {print $6}' "$SCORECARD")
I_PEAK=$(awk -F',' '$1=="P.2" {print $7}' "$SCORECARD")
N_PRICE=$(awk -F',' '$1=="P.4" {print $6}' "$SCORECARD")
I_PRICE=$(awk -F',' '$1=="P.4" {print $7}' "$SCORECARD")

echo ""
echo "Power: 3090=${N_PEAK}W peak | B60=${I_PEAK}W peak"
echo "Price: 3090=\$${N_PRICE} | B60=\$${I_PRICE}"
echo ""

printf "%-30s %-15s %-15s %-10s\
" "Test" "3090 perf/W" "B60 perf/W" "Winner"
printf "%-30s %-15s %-15s %-10s\
" "----" "-----------" "----------" "------"

# Calculate perf/watt for "higher is better" tests
awk -F',' -v np="$N_PEAK" -v ip="$I_PEAK" '
NR>1 && $5=="yes" && $6!="" && $7!="" {
    n_ppw = $6 / (np > 0 ? np : 1)
    i_ppw = $7 / (ip > 0 ? ip : 1)
    winner = (n_ppw > i_ppw) ? "3090" : "B60"
    printf "%-30s %-15.3f %-15.3f %-10s\
", $2, n_ppw, i_ppw, winner
}' "$SCORECARD"

echo ""
printf "%-30s %-15s %-15s %-10s\
" "Test" "3090 perf/\$" "B60 perf/\$" "Winner"
printf "%-30s %-15s %-15s %-10s\
" "----" "-----------" "----------" "------"

awk -F',' -v np="$N_PRICE" -v ip="$I_PRICE" '
NR>1 && $5=="yes" && $6!="" && $7!="" {
    n_ppd = $6 / (np > 0 ? np : 1)
    i_ppd = $7 / (ip > 0 ? ip : 1)
    winner = (n_ppd > i_ppd) ? "3090" : "B60"
    printf "%-30s %-15.4f %-15.4f %-10s\
", $2, n_ppd, i_ppd, winner
}' "$SCORECARD"

echo ""
SCRIPT

chmod +x ~/benchmark/scripts/efficiency.sh
~/benchmark/scripts/efficiency.sh

Export Scorecard as Formatted Table

# Quick copy-paste friendly version for your video description or blog post
echo ""
echo "=== COPY-PASTE RESULTS TABLE ==="
echo ""
column -t -s',' ~/benchmark/results/scorecard.csv | head -30


Troubleshooting Notes & Lessons Learned

FFmpeg / Encoding

  • Fedora system ffmpeg has NVENC encode and cuvid decoders but does NOT have CUDA filters (scale_cuda, tonemap_cuda). Use h264_cuvid -resize WxH for GPU scaling instead.
  • Static ffmpeg builds (johnvansickle.com) lack NVENC entirely — GPL-only, no proprietary libs.
  • Jellyfin-ffmpeg from Docker has full CUDA support BUT is statically linked, so nvidia-patch can't reach it. System ffmpeg + cuvid is the better approach since the patch applies to system libnvidia-encode.so.
  • scale_cuda requires named params: scale_cuda=w=1920:h=1080 not scale_cuda=1920:1080.
  • Angle brackets in shell scripts — never use <PLACEHOLDER> syntax in bash, it parses as shell redirects. Use PLACEHOLDER_HERE instead.

NVIDIA Specific

  • nvidia-patch removes the 5-session NVENC limit but does NOT persist across reboots. Must re-run sudo bash patch.sh after each boot.
  • Stale backup detection: If re-patching fails, sudo rm /opt/nvidia/libnvidia-encode-backup/* then re-patch.
  • RTX 3090 (Ampere) has no AV1 hardware encode — AV1 NVENC started with RTX 4000 series (Ada Lovelace). Record 0 for AV1 tests.
  • 3090 real hardware stream limit: ~5 simultaneous 4K→1080p HEVC transcodes regardless of nvidia-patch (GPU bandwidth ceiling, not session lock).

Intel Arc Pro B60 Specific

  • BIOS settings are MANDATORY — card will not enumerate on the PCI bus without:
    • Above 4G Decoding enabled
    • Re-Size BAR Support enabled
    • CSM disabled (Arc requires UEFI)
  • Kernel driver is xe, not i915intel_gpu_top does NOT work. Use nvtop for GPU monitoring.
  • Package names on Fedora: Use intel-level-zero + oneapi-level-zero (not intel-level-zero-gpu + level-zero which don't exist).
  • intel-media-driver comes from rpmfusion-nonfree, not base Fedora repos.
  • vainfo over SSH requires explicit device: vainfo --display drm --device /dev/dri/renderD128.
  • No artificial NVENC-style session limit — unlimited encode sessions out of the box. The B60 hit 25+ simultaneous 4K→1080p streams at realtime.

LLM Inference

  • Ollama does NOT support Intel Arc GPUs — shows 100% CPU in ollama ps. Use llama.cpp built with SYCL backend instead.
  • llama.cpp SYCL build requires OneAPI: cmake -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx.
  • Ollama model blobs can be found with: ollama show MODEL --modelfile 2>&1 | grep "^FROM /" and passed directly to llama-bench.
  • Q8 quantization on B60 (9.81 tok/s) is disproportionately slow vs Q4 (53.89 tok/s) — much larger drop than on the 3090. May be a SYCL backend optimization issue.

Image Generation (SDXL)

  • Intel XPU PyTorch on Fedora is brokenlibpti_view.so.0.10 not found anywhere on system. PyTorch XPU wheels only available for Python 3.12 (Fedora 43 ships 3.14). ComfyUI and diffusers both failed.
  • Record N/A for Intel SDXL — this is a toolchain maturity issue, not a hardware limitation.

Whisper / Speech-to-Text

  • faster-whisper (CTranslate2) has no Intel XPU backend — falls back to CPU.
  • whisper.cpp with SYCL backend is the proper Intel GPU approach (same build pattern as llama.cpp).
  • CUDA lib leak between venvs — whisper venv sets LD_LIBRARY_PATH for nvidia-cublas/cudnn which breaks YOLO. Always unset LD_LIBRARY_PATH after deactivating whisper venv.

Stream Stress Testing

  • -stream_loop -1 with short clips kills cuvid performance — decoder reinitializes every loop. Use Tears of Steel (734s) instead of jellyfish (30s) to avoid constant reinit.
  • Without -stream_loop, 30s clips finish before all streams ramp up, giving unreliable averages (early streams show inflated FPS).
  • Stream 1 always shows inflated FPS because it runs alone before other streams start (3s delay between launches). Look at the lowest stream for the real limit.

VMAF Quality Testing

  • VMAF is CPU-only — no GPU implementation exists. ~3 FPS at 4K is normal (5 min for a 30s clip).
  • File order matters: distorted file first, reference file second. Swapping them gives nonsensical scores (e.g., 3.4 instead of 85+).

Python Environment

  • Fedora 43 ships Python 3.14 — many ML packages don't have wheels for it yet (Intel XPU PyTorch, IPEX, etc.). Install python3.12 for Intel ML workloads.
  • Intel PyTorch XPU wheel deps aren't on Intel's pip index — pre-install filelock setuptools typing-extensions sympy networkx jinja2 fsspec from PyPI, then pip install --no-deps torch --index-url from Intel.
  • OneAPI setvars.sh must be sourced before running any SYCL workload. Add to .bashrc or run manually each session.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment