A step-by-step testing guide for home server workloads — transcoding and local AI inference. Both GPUs are tested in the same machine. Nvidia 3090 goes first, then swap to the Intel Arc Pro B60.
1. Install base software & download test media (GPU-agnostic)
2. Install RTX 3090 → Install Nvidia drivers → Run ALL Nvidia benchmarks
3. Power off → Physically swap to Arc Pro B60 → Install Intel drivers → Run ALL Intel benchmarks
4. Compare results with the scorecard
Why Nvidia first? Nvidia's Fedora driver setup is more straightforward and well-documented. Getting all your test scripts validated on the 3090 first means any issues you hit on the Intel side are definitely Intel-related, not script bugs.
This phase installs everything that doesn't depend on which GPU is in the machine. Do this once.
# System essentials
sudo dnf install -y ffmpeg git curl wget htop jq bc python3-pip lm_sensors \
libva-utils docker-ce docker-ce-cli containerd.io
# If Docker isn't in your repos yet:
sudo dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
# Log out and back in for the docker group to take effect
# Python venvs (we'll use these for AI tests)
sudo dnf install -y python3-devel python3-virtualenvmkdir -p ~/benchmark/{media,media/audio,scripts,results,logs}
mkdir -p ~/benchmark/results/{ffmpeg,streams,llm,sd,whisper,yolo}This is the heart of your record-keeping. Create it now, fill it in as you go.
cat > ~/benchmark/results/scorecard.csv << 'EOF'
test_id,test_name,metric,unit,higher_is_better,nvidia_3090,intel_b60,winner,notes
1.1a,HEVC 4K Encode,fps,FPS,yes,,,,
1.1b,HEVC 4K→1080p Transcode,fps,FPS,yes,,,,
1.2a,AV1 4K Encode,fps,FPS,yes,,,,
1.2b,AV1 4K→1080p Transcode,fps,FPS,yes,,,,
1.3a,HEVC Output Quality,vmaf_score,VMAF,yes,,,,
1.3b,AV1 Output Quality,vmaf_score,VMAF,yes,,,,
1.4,HDR Tone Map 4K→1080p,fps,FPS,yes,,,,
1.5,Max 4K→1080p Streams,stream_count,streams,yes,,,,
2.1a,LLM Llama3 8B Q4_K_M,gen_tokens_per_sec,tok/s,yes,,,,
2.1b,LLM Llama3 8B Q8,gen_tokens_per_sec,tok/s,yes,,,,
2.1c,LLM Mistral 7B Q4_K_M,gen_tokens_per_sec,tok/s,yes,,,,
2.1d,LLM Gemma2 27B Q4_K_M,gen_tokens_per_sec,tok/s,yes,,,,
2.2,SDXL Image Gen 1024x1024,time_per_image,seconds,no,,,,
2.3a,Whisper small,realtime_factor,RTF,no,,,,
2.3b,Whisper medium,realtime_factor,RTF,no,,,,
2.3c,Whisper large-v3,realtime_factor,RTF,no,,,,
2.4a,YOLO v8n Detection,fps,FPS,yes,,,,
2.4b,YOLO v8s Detection,fps,FPS,yes,,,,
P.1,System Idle Power,watts,W,no,,,,
P.2,Peak Transcode Power,watts,W,no,,,,
P.3,Peak AI Inference Power,watts,W,no,,,,
P.4,Card Purchase Price,price,USD,no,,,,
EOF
echo "Scorecard created at ~/benchmark/results/scorecard.csv"Use this after every test to log results into the scorecard.
cat > ~/benchmark/scripts/record_score.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./record_score.sh <test_id> <gpu: nvidia|intel> <value> [notes]
# Example: ./record_score.sh 1.1a nvidia 245.3 "stable, no throttling"
TEST_ID="$1"
GPU="$2"
VALUE="$3"
NOTES="${4:-}"
SCORECARD="$HOME/benchmark/results/scorecard.csv"
if [ -z "$TEST_ID" ] || [ -z "$GPU" ] || [ -z "$VALUE" ]; then
echo "Usage: ./record_score.sh <test_id> <nvidia|intel> <value> [notes]"
echo ""
echo "Available test IDs:"
awk -F',' 'NR>1 {printf " %-8s %s (%s)\
", $1, $2, $4}' "$SCORECARD"
exit 1
fi
# Determine which column to update (6 = nvidia, 7 = intel)
if [ "$GPU" = "nvidia" ]; then
COL=6
elif [ "$GPU" = "intel" ]; then
COL=7
else
echo "Error: GPU must be 'nvidia' or 'intel'"
exit 1
fi
# Update the scorecard
awk -F',' -v id="$TEST_ID" -v col="$COL" -v val="$VALUE" -v notes="$NOTES" '
BEGIN {OFS=","; found=0}
{
if ($1 == id) {
$col = val
if (notes != "") $9 = notes
# Auto-determine winner if both values present
if ($6 != "" && $7 != "") {
if ($5 == "yes") {
$8 = ($6+0 > $7+0) ? "3090" : ($7+0 > $6+0) ? "B60" : "TIE"
} else {
$8 = ($6+0 < $7+0) ? "3090" : ($7+0 < $6+0) ? "B60" : "TIE"
}
}
found=1
}
print
}
END { if (!found) print "WARNING: test_id " id " not found" > "/dev/stderr" }
' "$SCORECARD" > "${SCORECARD}.tmp" && mv "${SCORECARD}.tmp" "$SCORECARD"
# Show what was recorded
echo "Recorded: test=$TEST_ID gpu=$GPU value=$VALUE"
grep "^${TEST_ID}," "$SCORECARD"
SCRIPT
chmod +x ~/benchmark/scripts/record_score.shcat > ~/benchmark/scripts/show_scores.sh << 'SCRIPT'
#!/bin/bash
# Pretty-print the current scorecard
SCORECARD="$HOME/benchmark/results/scorecard.csv"
echo ""
echo "╔══════════════════════════════════════════════════════════════════════════════╗"
echo "║ ARC PRO B60 vs RTX 3090 — BENCHMARK SCORECARD ║"
echo "╚══════════════════════════════════════════════════════════════════════════════╝"
echo ""
printf "%-8s %-30s %-8s %-12s %-12s %-8s\
" \
"ID" "Test" "Unit" "RTX 3090" "Arc B60" "Winner"
printf "%-8s %-30s %-8s %-12s %-12s %-8s\
" \
"----" "----" "----" "--------" "-------" "------"
awk -F',' 'NR>1 {
printf "%-8s %-30s %-8s %-12s %-12s %-8s\
", $1, $2, $4, $6, $7, $8
}' "$SCORECARD"
echo ""
# Tally
NVIDIA_WINS=$(awk -F',' '$8=="3090"' "$SCORECARD" | wc -l)
INTEL_WINS=$(awk -F',' '$8=="B60"' "$SCORECARD" | wc -l)
TIES=$(awk -F',' '$8=="TIE"' "$SCORECARD" | wc -l)
PENDING=$(awk -F',' 'NR>1 && $8==""' "$SCORECARD" | wc -l)
echo "TALLY: RTX 3090: ${NVIDIA_WINS} wins | Arc B60: ${INTEL_WINS} wins | Ties: ${TIES} | Pending: ${PENDING}"
echo ""
SCRIPT
chmod +x ~/benchmark/scripts/show_scores.shAll test media uses freely available, openly licensed content. Download once before any GPU testing.
# 1. Jellyfish clips — industry-standard bitrate test files
# 4K H.264 (120 Mbps) — primary source file for most tests
wget -O ~/benchmark/media/4k_h264.mkv \
"https://repo.jellyfin.org/archive/jellyfish/media/jellyfish-120-mbps-4k-uhd-h264.mkv"
# 4K HEVC 10-bit (120 Mbps) — for HEVC decode/re-encode tests
wget -O ~/benchmark/media/4k_hevc_10bit.mkv \
"https://repo.jellyfin.org/archive/jellyfish/media/jellyfish-120-mbps-4k-uhd-hevc-10bit.mkv"
# 2. Blender Open Movies — real-world content with motion, grain, effects
# Tears of Steel (4K, ~12 min, sci-fi VFX — great stress test)
wget -O ~/benchmark/media/tears_of_steel_4k.mov \
"https://download.blender.org/demo/movies/ToS/ToS-4k-1920.mov"
# Big Buck Bunny (4K, ~10 min, animation — tests flat color encoding)
wget -O ~/benchmark/media/big_buck_bunny_4k.mp4 \
"https://mirrors.kodi.tv/demo-files/BBB/bbb_sunflower_2160p_60fps_normal.mp4"
# Sintel (4K, ~15 min, mixed live-action style — good variety)
wget -O ~/benchmark/media/sintel_4k.mkv \
"https://download.blender.org/durian/movies/Sintel.2010.4k.mkv"These use software encoding so they don't need a GPU.
# 1080p H.264 (for lighter transcode tests)
ffmpeg -i ~/benchmark/media/4k_h264.mkv \
-vf scale=1920:1080 -c:v libx264 -crf 18 -preset slow \
-an ~/benchmark/media/1080p_h264.mkv
# 4K AV1 (software encode — slow but only needs to be done once)
ffmpeg -i ~/benchmark/media/4k_h264.mkv \
-c:v libsvtav1 -crf 30 -preset 6 \
-an ~/benchmark/media/4k_av1.mkv
# Synthetic HDR test file (adds PQ/BT.2020 metadata to 10-bit source)
ffmpeg -i ~/benchmark/media/4k_hevc_10bit.mkv \
-c:v libx265 -crf 20 -preset slow \
-x265-params "colorprim=bt2020:transfer=smpte2084:colormatrix=bt2020nc:master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1):max-cll=1000,400" \
-pix_fmt yuv420p10le \
-an ~/benchmark/media/4k_hevc_hdr.mkv
# Verify HDR metadata
ffprobe -show_streams ~/benchmark/media/4k_hevc_hdr.mkv 2>&1 | grep -i "color\|master"
# YOLO test clips (Tears of Steel — people, VFX, varied scenes)
ffmpeg -ss 120 -i ~/benchmark/media/tears_of_steel_4k.mov \
-t 120 -vf scale=1920:1080 -c:v libx264 -crf 18 \
-an ~/benchmark/media/yolo_test_1080p.mkv
ffmpeg -ss 120 -i ~/benchmark/media/tears_of_steel_4k.mov \
-t 120 -vf scale=1280:720 -c:v libx264 -crf 18 \
-an ~/benchmark/media/yolo_test_720p.mkvSOURCE_FILES=(
~/benchmark/media/tears_of_steel_4k.mov
~/benchmark/media/big_buck_bunny_4k.mp4
~/benchmark/media/sintel_4k.mkv
)
RESOLUTIONS=("3840:2160" "2560:1440" "1920:1080" "1280:720")
CODECS_ENC=("libx264" "libx265" "libsvtav1")
CODECS_EXT=("h264" "hevc" "av1")
COUNT=0
for src in "${SOURCE_FILES[@]}"; do
BASENAME=$(basename "$src" | cut -d. -f1)
DURATION=$(ffprobe -v error -show_entries format=duration \
-of default=noprint_wrappers=1:nokey=1 "$src" | cut -d. -f1)
for res_idx in "${!RESOLUTIONS[@]}"; do
RES="${RESOLUTIONS[$res_idx]}"
RES_LABEL=$(echo "$RES" | tr ':' 'x')
for codec_idx in "${!CODECS_ENC[@]}"; do
CODEC="${CODECS_ENC[$codec_idx]}"
EXT="${CODECS_EXT[$codec_idx]}"
START=$((RANDOM % (DURATION > 120 ? DURATION - 120 : 1)))
CLIP_DUR=$((120 + RANDOM % 180))
OUTFILE="$HOME/benchmark/media/tdarr_library/${BASENAME}_${RES_LABEL}_${EXT}_${COUNT}.mkv"
echo "Creating clip $COUNT: $BASENAME @ $RES_LABEL ($EXT)"
ffmpeg -ss "$START" -i "$src" -t "$CLIP_DUR" \
-vf "scale=$RES" -c:v "$CODEC" -crf 22 -preset fast \
-an -y "$OUTFILE" 2>/dev/null
COUNT=$((COUNT + 1))
[ $COUNT -ge 50 ] && break 3
done
done
done
echo "Created $COUNT test files in ~/benchmark/media/tdarr_library/"# LibriSpeech test-clean dataset — standard speech recognition benchmark
wget -O ~/benchmark/media/audio/librispeech_test.tar.gz \
"https://www.openslr.org/resources/12/test-clean.tar.gz"
tar -xzf ~/benchmark/media/audio/librispeech_test.tar.gz -C ~/benchmark/media/audio/
# Create a 10-minute WAV
find ~/benchmark/media/audio/LibriSpeech/test-clean -name "*.flac" | sort | head -30 | \
while read f; do echo "file '$f'"; done > /tmp/audio_list.txt
ffmpeg -f concat -safe 0 -i /tmp/audio_list.txt \
-t 600 -acodec pcm_s16le -ar 16000 -ac 1 \
~/benchmark/media/audio/test_10min.wav
# Create a 30-minute WAV for extended testing
find ~/benchmark/media/audio/LibriSpeech/test-clean -name "*.flac" | sort | \
while read f; do echo "file '$f'"; done > /tmp/audio_list_full.txt
ffmpeg -f concat -safe 0 -i /tmp/audio_list_full.txt \
-t 1800 -acodec pcm_s16le -ar 16000 -ac 1 \
~/benchmark/media/audio/test_30min.wavecho "=== Test Media Inventory ==="
echo ""
echo "--- Core files ---"
for f in ~/benchmark/media/*.{mkv,mp4,mov}; do
[ -f "$f" ] || continue
SIZE=$(du -h "$f" | cut -f1)
DURATION=$(ffprobe -v error -show_entries format=duration \
-of default=noprint_wrappers=1:nokey=1 "$f" 2>/dev/null | cut -d. -f1)
CODEC=$(ffprobe -v error -select_streams v:0 \
-show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 "$f" 2>/dev/null)
RES=$(ffprobe -v error -select_streams v:0 \
-show_entries stream=width,height -of csv=p=0 "$f" 2>/dev/null)
echo " $(basename $f): ${SIZE} | ${DURATION}s | ${CODEC} | ${RES}"
done
echo ""
echo "--- Tdarr library: $(ls ~/benchmark/media/tdarr_library/ | wc -l) files ---"
echo "--- Audio: $(ls -lh ~/benchmark/media/audio/test_*.wav) ---"
echo "--- YOLO clips: $(ls -lh ~/benchmark/media/yolo_test_*.mkv) ---"Save this to run in a second terminal during every test.
cat > ~/benchmark/scripts/monitor.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./monitor.sh [intel|nvidia] [test_name]
GPU_TYPE=${1:-intel}
TEST_NAME=${2:-test}
LOG_FILE="$HOME/benchmark/logs/${TEST_NAME}_${GPU_TYPE}_$(date +%Y%m%d_%H%M%S).csv"
mkdir -p ~/benchmark/logs
echo "timestamp,gpu_util,gpu_mem,gpu_temp,gpu_power,cpu_util" > "$LOG_FILE"
echo "Logging to: $LOG_FILE"
echo "Press Ctrl+C to stop."
echo ""
while true; do
TIMESTAMP=$(date +%H:%M:%S)
CPU_UTIL=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}')
if [ "$GPU_TYPE" = "nvidia" ]; then
GPU_DATA=$(nvidia-smi --query-gpu=utilization.gpu,memory.used,temperature.gpu,power.draw \
--format=csv,noheader,nounits)
GPU_UTIL=$(echo "$GPU_DATA" | cut -d',' -f1 | xargs)
GPU_MEM=$(echo "$GPU_DATA" | cut -d',' -f2 | xargs)
GPU_TEMP=$(echo "$GPU_DATA" | cut -d',' -f3 | xargs)
GPU_POWER=$(echo "$GPU_DATA" | cut -d',' -f4 | xargs)
else
GPU_UTIL="N/A" # intel_gpu_top doesn't support xe driver; use nvtop in another terminal
GPU_MEM="N/A"
GPU_TEMP=$(sensors 2>/dev/null | grep -i "edge" | awk '{print $2}' | tr -d '+°C' || echo "N/A")
GPU_POWER="wall_meter"
fi
printf "%s | GPU: %s%% | Mem: %sMB | Temp: %s°C | Power: %sW | CPU: %s%%\
" \
"$TIMESTAMP" "$GPU_UTIL" "$GPU_MEM" "$GPU_TEMP" "$GPU_POWER" "$CPU_UTIL"
echo "$TIMESTAMP,$GPU_UTIL,$GPU_MEM,$GPU_TEMP,$GPU_POWER,$CPU_UTIL" >> "$LOG_FILE"
sleep 2
done
SCRIPT
chmod +x ~/benchmark/scripts/monitor.shcurl -fsSL https://ollama.com/install.sh | sh# Whisper venv
python3 -m venv ~/benchmark/venvs/whisper
# YOLO venv
python3 -m venv ~/benchmark/venvs/yolo
# ComfyUI venv (Stable Diffusion)
python3 -m venv ~/benchmark/venvs/comfyuiInstall the 3090 in your system. Power on.
# RPM Fusion is the cleanest way to get Nvidia drivers on Fedora
sudo dnf install -y \
https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda nvidia-vaapi-driver
# Wait for the kernel module to build (can take a few minutes)
# Check status with:
sudo akmods --force
sudo dracut --force
# Reboot
sudo rebootAfter reboot:
# Verify
nvidia-smi
# Should show RTX 3090, driver version, CUDA version
# If nvidia-smi fails, check: sudo dmesg | grep -i nvidia
# Patch NVENC session limit (consumer cards have a 5-session cap)
# Clone and apply the nvidia-patch:
cd ~/benchmark
git clone https://github.com/keylase/nvidia-patch.git
cd nvidia-patch
sudo bash patch.sh
# Record idle power
# Note your Kill-A-Watt reading with the system idle at desktop
~/benchmark/scripts/record_score.sh P.1 nvidia <IDLE_WATTS>
~/benchmark/scripts/record_score.sh P.4 nvidia <PURCHASE_PRICE>mkdir -p ~/benchmark/results/ffmpeg
# In a second terminal, start monitoring:
# ~/benchmark/scripts/monitor.sh nvidia hevc_encode
# HEVC encode — 4K to 4K
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i ~/benchmark/media/4k_h264.mkv \
-c:v hevc_nvenc -preset p5 -rc constqp -qp 25 \
-an -y ~/benchmark/results/ffmpeg/4k_hevc_nvenc.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/hevc_nvenc_log.txt
# ➜ Record the FPS from the final line of output
~/benchmark/scripts/record_score.sh 1.1a nvidia <FPS>
# HEVC encode — 4K to 1080p transcode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i ~/benchmark/media/4k_h264.mkv \
-vf 'scale_cuda=1920:1080' \
-c:v hevc_nvenc -preset p5 -rc constqp -qp 25 \
-an -y ~/benchmark/results/ffmpeg/4k_to_1080p_hevc_nvenc.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/hevc_nvenc_downscale_log.txt
# ➜ Record FPS
~/benchmark/scripts/record_score.sh 1.1b nvidia <FPS>
# ➜ Record peak wall power from Kill-A-Watt
~/benchmark/scripts/record_score.sh P.2 nvidia <PEAK_WATTS># AV1 encode — 4K to 4K
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i ~/benchmark/media/4k_h264.mkv \
-c:v av1_nvenc -preset p5 -rc constqp -qp 30 \
-an -y ~/benchmark/results/ffmpeg/4k_av1_nvenc.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/av1_nvenc_log.txt
# ➜ Record FPS
~/benchmark/scripts/record_score.sh 1.2a nvidia <FPS>
# AV1 encode — 4K to 1080p
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i ~/benchmark/media/4k_h264.mkv \
-vf 'scale_cuda=1920:1080' \
-c:v av1_nvenc -preset p5 -rc constqp -qp 30 \
-an -y ~/benchmark/results/ffmpeg/4k_to_1080p_av1_nvenc.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/av1_nvenc_downscale_log.txt
# ➜ Record FPS
~/benchmark/scripts/record_score.sh 1.2b nvidia <FPS>Run these after encoding. VMAF uses the CPU so it doesn't matter which GPU is installed, but we run it now while the encoded files are fresh.
# HEVC quality
ffmpeg -i ~/benchmark/results/ffmpeg/4k_hevc_nvenc.mkv \
-i ~/benchmark/media/4k_h264.mkv \
-lavfi libvmaf="model=version=vmaf_v0.6.1:log_path=vmaf_hevc_nvenc.json:log_fmt=json" \
-f null - 2>&1 | tee ~/benchmark/results/ffmpeg/vmaf_hevc_nvenc.txt
# ➜ Record mean VMAF score from output
~/benchmark/scripts/record_score.sh 1.3a nvidia <VMAF_SCORE>
# AV1 quality
ffmpeg -i ~/benchmark/results/ffmpeg/4k_av1_nvenc.mkv \
-i ~/benchmark/media/4k_h264.mkv \
-lavfi libvmaf="model=version=vmaf_v0.6.1:log_path=vmaf_av1_nvenc.json:log_fmt=json" \
-f null - 2>&1 | tee ~/benchmark/results/ffmpeg/vmaf_av1_nvenc.txt
# ➜ Record mean VMAF score
~/benchmark/scripts/record_score.sh 1.3b nvidia <VMAF_SCORE>VMAF scores range 0–100. Above 93 is excellent. This tells you which encoder produces better-looking video at the same effort level.
Real-world scenario — HDR content transcoded to SDR for most displays.
# Option A: Full GPU pipeline (requires ffmpeg built with scale_cuda + tonemap_cuda)
ffmpeg -init_hw_device cuda=hw -filter_hw_device hw \
-i ~/benchmark/media/4k_hevc_hdr.mkv \
-vf 'hwupload_cuda,scale_cuda=w=1920:h=1080:format=nv12,tonemap_cuda=tonemap=hable:peak=100:desat=0' \
-c:v hevc_nvenc -preset p5 -rc constqp -qp 25 \
-an -y ~/benchmark/results/ffmpeg/hdr_tonemap_nvenc.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/hdr_tonemap_nvenc_log.txt
# Option B: CPU tonemap + GPU encode (Fedora — no scale_cuda)
ffmpeg -hwaccel cuda \
-i ~/benchmark/media/4k_hevc_hdr.mkv \
-vf 'zscale=t=linear:npl=100,format=gbrpf32le,zscale=p=bt709:t=bt709:m=bt709:r=tv,format=yuv420p,scale=1920:1080' \
-c:v hevc_nvenc -preset p5 -rc constqp -qp 25 \
-an -y ~/benchmark/results/ffmpeg/hdr_tonemap_nvenc.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/hdr_tonemap_nvenc_log.txt
# ➜ Record FPS
~/benchmark/scripts/record_score.sh 1.4 nvidia <FPS>Note: FFmpeg's HDR tone mapping pipeline varies by version. If this errors out, you may need a build with
--enable-libplacebo. Check your FFmpeg version withffmpeg -version.
The headline number. "How many 4K streams can the 3090 handle?"
cat > ~/benchmark/scripts/stream_stress.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./stream_stress.sh [intel|nvidia] [max_streams]
GPU_TYPE=${1:-nvidia}
MAX_STREAMS=${2:-20}
SOURCE_FILE="$HOME/benchmark/media/4k_hevc_hdr.mkv"
RESULTS_DIR="$HOME/benchmark/results/streams"
mkdir -p "$RESULTS_DIR"
# Clean up old stream files
rm -f "${RESULTS_DIR}"/stream_*.mkv "${RESULTS_DIR}"/stream_*_log.txt
PIDS=()
for i in $(seq 1 $MAX_STREAMS); do
echo "Starting stream $i..."
if [ "$GPU_TYPE" = "intel" ]; then
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-stream_loop -1 -i "$SOURCE_FILE" \
-vf 'vpp_qsv=w=1920:h=1080' \
-c:v hevc_qsv -preset fast -global_quality 28 \
-t 120 -an -y "${RESULTS_DIR}/stream_${i}.mkv" \
> "${RESULTS_DIR}/stream_${i}_log.txt" 2>&1 &
else
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-stream_loop -1 -i "$SOURCE_FILE" \
-vf 'scale_cuda=1920:1080' \
-c:v hevc_nvenc -preset p4 -rc constqp -qp 28 \
-t 120 -an -y "${RESULTS_DIR}/stream_${i}.mkv" \
> "${RESULTS_DIR}/stream_${i}_log.txt" 2>&1 &
fi
PIDS+=($!)
sleep 3
if ! kill -0 ${PIDS[-1]} 2>/dev/null; then
echo "Stream $i FAILED. Max streams: $((i-1))"
break
fi
echo "--- Stream $i started (PID: ${PIDS[-1]}) ---"
done
echo ""
echo "All streams launched. Running for 2 minutes..."
echo "Monitor GPU in another terminal."
echo "PIDs: ${PIDS[@]}"
echo ""
echo "To stop all: kill ${PIDS[@]}"
wait
# Check FPS of each stream from logs
echo ""
echo "=== Stream Results ==="
for i in $(seq 1 ${#PIDS[@]}); do
FPS=$(grep -oP 'fps=\s*\K[0-9.]+' "${RESULTS_DIR}/stream_${i}_log.txt" | tail -1)
echo " Stream $i: ${FPS:-FAILED} FPS"
done
SCRIPT
chmod +x ~/benchmark/scripts/stream_stress.sh# Terminal 1: Monitor
~/benchmark/scripts/monitor.sh nvidia stream_stress
# Terminal 2: Run test — start with 10, increase if all succeed
~/benchmark/scripts/stream_stress.sh nvidia 10
# ➜ Record max streams that maintained real-time FPS (24+ for film, 30+ for TV)
~/benchmark/scripts/record_score.sh 1.5 nvidia <MAX_STREAMS>Nvidia session limit: Even after patching, verify with more than 5 streams. If streams fail at exactly 5, the patch didn't apply. Re-run
sudo bash patch.shafter checking your driver version.
# Reset Tdarr library to original state (copy fresh if you have a backup)
# Then launch Tdarr with Nvidia GPU access
docker run -d \
--name tdarr-nvidia \
--gpus all \
-p 8265:8265 \
-p 8266:8266 \
-v ~/benchmark/tdarr/server:/app/server \
-v ~/benchmark/tdarr/configs:/app/configs \
-v ~/benchmark/tdarr/logs:/app/logs \
-v ~/benchmark/media/tdarr_library:/media \
-v ~/benchmark/results/tdarr:/temp \
ghcr.io/haveagitgat/tdarr:latestTest procedure:
- Open Tdarr at
http://localhost:8265 - Configure library pointing to
/media - Set transcode plugin: convert everything to HEVC at a target bitrate
- Set GPU worker count to 1
- Start a timer (
timein terminal or stopwatch) - Let it process the entire 50-file library
- Record total time when complete
# ➜ Record total time in seconds
~/benchmark/scripts/record_score.sh 1.6 nvidia <TOTAL_SECONDS>
# Clean up
docker stop tdarr-nvidia && docker rm tdarr-nvidia# Pull test models
ollama pull llama3:8b-instruct-q4_K_M
ollama pull llama3:8b-instruct-q8_0
ollama pull mistral:7b-instruct-q4_K_M
ollama pull gemma2:27b-instruct-q4_K_M
# Verify GPU is being used
ollama run llama3:8b-instruct-q4_K_M "Hello" --verbose 2>&1 | head -5
# Should show CUDA in the outputcat > ~/benchmark/scripts/llm_bench.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./llm_bench.sh <model_name> <gpu_label>
# Example: ./llm_bench.sh llama3:8b-instruct-q4_K_M nvidia
MODEL=${1:-"llama3:8b-instruct-q4_K_M"}
GPU_LABEL=${2:-"unknown"}
RESULTS_DIR="$HOME/benchmark/results/llm"
mkdir -p "$RESULTS_DIR"
PROMPTS=(
"Write a detailed guide on setting up a home server with Proxmox, including network configuration and VM creation."
"Explain the differences between ZFS and Btrfs for a NAS. Include performance considerations, snapshot capabilities, and hardware requirements."
"Write a Python script that monitors Docker container health, restarts unhealthy containers, and sends notifications via webhook."
)
SAFE_MODEL=$(echo "$MODEL" | tr '/:' '__')
OUTFILE="${RESULTS_DIR}/${SAFE_MODEL}_${GPU_LABEL}.csv"
echo "model,gpu,prompt_num,prompt_tokens,prompt_tps,gen_tokens,gen_tps,total_sec" > "$OUTFILE"
echo ""
echo "========================================"
echo "Testing: $MODEL on $GPU_LABEL"
echo "========================================"
GEN_TPS_SUM=0
GEN_TPS_COUNT=0
for i in "${!PROMPTS[@]}"; do
PROMPT="${PROMPTS[$i]}"
PROMPT_NUM=$((i+1))
echo ""
echo "--- Prompt $PROMPT_NUM: ${PROMPT:0:60}... ---"
RESPONSE=$(curl -s http://localhost:11434/api/generate \
-d "{
\"model\": \"$MODEL\",
\"prompt\": \"$PROMPT\",
\"stream\": false
}")
TOTAL_DURATION=$(echo "$RESPONSE" | jq -r '.total_duration // 0')
PROMPT_EVAL_COUNT=$(echo "$RESPONSE" | jq -r '.prompt_eval_count // 0')
PROMPT_EVAL_DURATION=$(echo "$RESPONSE" | jq -r '.prompt_eval_duration // 0')
EVAL_COUNT=$(echo "$RESPONSE" | jq -r '.eval_count // 0')
EVAL_DURATION=$(echo "$RESPONSE" | jq -r '.eval_duration // 0')
if [ "$PROMPT_EVAL_DURATION" -gt 0 ]; then
PROMPT_TPS=$(echo "scale=2; $PROMPT_EVAL_COUNT / ($PROMPT_EVAL_DURATION / 1000000000)" | bc)
else
PROMPT_TPS="0"
fi
if [ "$EVAL_DURATION" -gt 0 ]; then
GEN_TPS=$(echo "scale=2; $EVAL_COUNT / ($EVAL_DURATION / 1000000000)" | bc)
else
GEN_TPS="0"
fi
TOTAL_SEC=$(echo "scale=2; $TOTAL_DURATION / 1000000000" | bc)
echo " Prompt: ${PROMPT_EVAL_COUNT} tokens @ ${PROMPT_TPS} tok/s"
echo " Generation: ${EVAL_COUNT} tokens @ ${GEN_TPS} tok/s"
echo " Total: ${TOTAL_SEC}s"
echo "$MODEL,$GPU_LABEL,$PROMPT_NUM,$PROMPT_EVAL_COUNT,$PROMPT_TPS,$EVAL_COUNT,$GEN_TPS,$TOTAL_SEC" >> "$OUTFILE"
GEN_TPS_SUM=$(echo "$GEN_TPS_SUM + $GEN_TPS" | bc)
GEN_TPS_COUNT=$((GEN_TPS_COUNT + 1))
sleep 5
done
AVG_GEN_TPS=$(echo "scale=2; $GEN_TPS_SUM / $GEN_TPS_COUNT" | bc)
echo ""
echo ">>> AVERAGE generation speed: ${AVG_GEN_TPS} tok/s <<<"
echo ">>> Use this value for the scorecard <<<"
echo ""
echo "Results saved to: $OUTFILE"
SCRIPT
chmod +x ~/benchmark/scripts/llm_bench.sh# Run all models on Nvidia
~/benchmark/scripts/llm_bench.sh "llama3:8b-instruct-q4_K_M" nvidia
# ➜ Record the average gen tok/s
~/benchmark/scripts/record_score.sh 2.1a nvidia <AVG_GEN_TPS>
sleep 10
~/benchmark/scripts/llm_bench.sh "llama3:8b-instruct-q8_0" nvidia
~/benchmark/scripts/record_score.sh 2.1b nvidia <AVG_GEN_TPS>
sleep 10
~/benchmark/scripts/llm_bench.sh "mistral:7b-instruct-q4_K_M" nvidia
~/benchmark/scripts/record_score.sh 2.1c nvidia <AVG_GEN_TPS>
sleep 10
~/benchmark/scripts/llm_bench.sh "gemma2:27b-instruct-q4_K_M" nvidia
~/benchmark/scripts/record_score.sh 2.1d nvidia <AVG_GEN_TPS>
# ➜ Record peak AI inference power
~/benchmark/scripts/record_score.sh P.3 nvidia <PEAK_WATTS>cd ~/benchmark
git clone https://github.com/comfyanonymous/ComfyUI.git comfyui
cd comfyui
source ~/benchmark/venvs/comfyui/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
# Download SDXL model (you'll reuse this for the Intel round)
# Place sd_xl_base_1.0.safetensors in ~/benchmark/comfyui/models/checkpoints/
# Start ComfyUI
python main.py --listen 0.0.0.0 &
sleep 10cat > ~/benchmark/scripts/sd_bench.sh << 'SCRIPT'
#!/bin/bash
# Usage: ./sd_bench.sh <gpu_label>
GPU_LABEL=${1:-"unknown"}
RESULTS_DIR="$HOME/benchmark/results/sd"
mkdir -p "$RESULTS_DIR"
PROMPTS=(
"A photorealistic image of a rack-mounted home server setup in a clean room, blue LED lights, cable management, 8K"
"An isometric illustration of a network diagram showing VPN tunnels between data centers, technical style"
"A cinematic landscape of a futuristic data center built into a mountain, volumetric lighting, ultra detailed"
)
STEPS=30
CFG=7.0
WIDTH=1024
HEIGHT=1024
SEED=42
TIME_SUM=0
COUNT=0
for i in "${!PROMPTS[@]}"; do
PROMPT="${PROMPTS[$i]}"
echo "--- Image $((i+1)): ${PROMPT:0:60}... ---"
START=$(date +%s%N)
curl -s -X POST http://127.0.0.1:8188/prompt \
-H "Content-Type: application/json" \
-d "{
\"prompt\": {
\"1\": {\"class_type\": \"CheckpointLoaderSimple\", \"inputs\": {\"ckpt_name\": \"sd_xl_base_1.0.safetensors\"}},
\"2\": {\"class_type\": \"CLIPTextEncode\", \"inputs\": {\"text\": \"$PROMPT\", \"clip\": [\"1\", 1]}},
\"3\": {\"class_type\": \"CLIPTextEncode\", \"inputs\": {\"text\": \"bad quality, blurry\", \"clip\": [\"1\", 1]}},
\"4\": {\"class_type\": \"EmptyLatentImage\", \"inputs\": {\"width\": $WIDTH, \"height\": $HEIGHT, \"batch_size\": 1}},
\"5\": {\"class_type\": \"KSampler\", \"inputs\": {\"model\": [\"1\", 0], \"positive\": [\"2\", 0], \"negative\": [\"3\", 0], \"latent_image\": [\"4\", 0], \"seed\": $SEED, \"steps\": $STEPS, \"cfg\": $CFG, \"sampler_name\": \"euler\", \"scheduler\": \"normal\", \"denoise\": 1.0}},
\"6\": {\"class_type\": \"VAEDecode\", \"inputs\": {\"samples\": [\"5\", 0], \"vae\": [\"1\", 2]}},
\"7\": {\"class_type\": \"SaveImage\", \"inputs\": {\"images\": [\"6\", 0], \"filename_prefix\": \"bench_${GPU_LABEL}_$((i+1))\"}}
}
}" > /dev/null
while [ "$(curl -s http://127.0.0.1:8188/queue | jq '.queue_running | length')" -gt 0 ]; do
sleep 1
done
END=$(date +%s%N)
ELAPSED=$(echo "scale=2; ($END - $START) / 1000000000" | bc)
IT_PER_SEC=$(echo "scale=2; $STEPS / $ELAPSED" | bc)
echo " Time: ${ELAPSED}s | Speed: ${IT_PER_SEC} it/s"
echo "prompt_$((i+1)),$GPU_LABEL,$ELAPSED,$IT_PER_SEC,$STEPS,$WIDTH,$HEIGHT" >> "$RESULTS_DIR/sdxl_${GPU_LABEL}.csv"
TIME_SUM=$(echo "$TIME_SUM + $ELAPSED" | bc)
COUNT=$((COUNT + 1))
sleep 5
done
AVG_TIME=$(echo "scale=2; $TIME_SUM / $COUNT" | bc)
echo ""
echo ">>> AVERAGE time per image: ${AVG_TIME}s <<<"
SCRIPT
chmod +x ~/benchmark/scripts/sd_bench.sh~/benchmark/scripts/sd_bench.sh nvidia
# ➜ Record average seconds per image
~/benchmark/scripts/record_score.sh 2.2 nvidia <AVG_SECONDS>
# Kill ComfyUI
pkill -f "python main.py"Alternative: If ComfyUI is finicky, use
stable-diffusion.cppwhich has both CUDA and SYCL backends for a more apples-to-apples CLI comparison.
source ~/benchmark/venvs/whisper/bin/activate
pip install faster-whisper nvidia-cublas-cu12 nvidia-cudnn-cu12
# CTranslate2 needs to find the CUDA libs installed in the venv
export LD_LIBRARY_PATH=$(python3 -c "import nvidia.cublas.lib; print(nvidia.cublas.lib.__path__[0])"):$(python3 -c "import nvidia.cudnn.lib; print(nvidia.cudnn.lib.__path__[0])"):$LD_LIBRARY_PATH nvidia-cublas-cu12 nvidia-cudnn-cu12
# CTranslate2 needs to find the CUDA libs installed in the venv
export LD_LIBRARY_PATH=$(python3 -c "import nvidia.cublas.lib; print(nvidia.cublas.lib.__path__[0])"):$(python3 -c "import nvidia.cudnn.lib; print(nvidia.cudnn.lib.__path__[0])"):$LD_LIBRARY_PATH
cat > ~/benchmark/scripts/whisper_bench.py << 'PYEOF'
import sys
import time
from faster_whisper import WhisperModel
gpu_label = sys.argv[1] if len(sys.argv) > 1 else "unknown"
device = sys.argv[2] if len(sys.argv) > 2 else "cuda"
compute = sys.argv[3] if len(sys.argv) > 3 else "float16"
audio_file = f"{__import__('os').path.expanduser('~')}/benchmark/media/audio/test_10min.wav"
results_dir = f"{__import__('os').path.expanduser('~')}/benchmark/results/whisper"
models_to_test = ["small", "medium", "large-v3"]
print(f"\
Whisper benchmark — GPU: {gpu_label}, device: {device}, compute: {compute}")
print("=" * 60)
for model_name in models_to_test:
print(f"\
--- Testing {model_name} ---")
model = WhisperModel(model_name, device=device, compute_type=compute)
start = time.time()
segments, info = model.transcribe(audio_file, beam_size=5)
for segment in segments:
pass
elapsed = time.time() - start
audio_duration = info.duration
rtf = elapsed / audio_duration
print(f" Audio: {audio_duration:.1f}s | Time: {elapsed:.1f}s | RTF: {rtf:.3f}x")
with open(f"{results_dir}/whisper_{gpu_label}.csv", "a") as f:
f.write(f"{model_name},{gpu_label},{audio_duration:.1f},{elapsed:.1f},{rtf:.3f}\
")
del model
time.sleep(5)
print("\
Done! Record the RTF values (lower = faster)")
PYEOFmkdir -p ~/benchmark/results/whisper
python3 ~/benchmark/scripts/whisper_bench.py nvidia cuda float16
# ➜ Record RTF for each model size
~/benchmark/scripts/record_score.sh 2.3a nvidia <RTF_SMALL>
~/benchmark/scripts/record_score.sh 2.3b nvidia <RTF_MEDIUM>
~/benchmark/scripts/record_score.sh 2.3c nvidia RTF_HERE
deactivate
unset LD_LIBRARY_PATHsource ~/benchmark/venvs/yolo/bin/activate
pip install ultralytics opencv-python-headless
cat > ~/benchmark/scripts/yolo_bench.py << 'PYEOF'
import sys
import time
from ultralytics import YOLO
import cv2
gpu_label = sys.argv[1] if len(sys.argv) > 1 else "unknown"
backend = sys.argv[2] if len(sys.argv) > 2 else "cuda" # "cuda" or "openvino"
home = __import__('os').path.expanduser('~')
video_path = f"{home}/benchmark/media/yolo_test_1080p.mkv"
results_dir = f"{home}/benchmark/results/yolo"
model_names = ["yolov8n", "yolov8s"]
for model_name in model_names:
print(f"\
--- {model_name} on {gpu_label} ({backend}) ---")
if backend == "openvino":
base_model = YOLO(f"{model_name}.pt")
base_model.export(format="openvino")
model = YOLO(f"{model_name}_openvino_model/", task="detect")
else:
model = YOLO(f"{model_name}.pt")
model.to("cuda")
cap = cv2.VideoCapture(video_path)
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
# Warm up
for _ in range(10):
ret, frame = cap.read()
if ret:
model(frame, verbose=False)
cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
frame_count = 0
total_inference_time = 0
if backend == "cuda":
import torch
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
if backend == "cuda":
torch.cuda.synchronize()
start = time.time()
results = model(frame, verbose=False)
if backend == "cuda":
torch.cuda.synchronize()
elapsed = time.time() - start
total_inference_time += elapsed
frame_count += 1
if frame_count % 200 == 0:
print(f" Frame {frame_count}/{total_frames} | Avg FPS: {frame_count/total_inference_time:.1f}")
cap.release()
avg_fps = frame_count / total_inference_time
avg_ms = (total_inference_time / frame_count) * 1000
print(f" Result: {avg_fps:.1f} FPS ({avg_ms:.1f}ms/frame)")
with open(f"{results_dir}/yolo_{gpu_label}.csv", "a") as f:
f.write(f"{model_name},{gpu_label},{frame_count},{total_inference_time:.1f},{avg_fps:.1f},{avg_ms:.1f}\
")
del model
time.sleep(5)
PYEOFmkdir -p ~/benchmark/results/yolo
python3 ~/benchmark/scripts/yolo_bench.py nvidia cuda
# ➜ Record FPS for each model
~/benchmark/scripts/record_score.sh 2.4a nvidia <YOLOV8N_FPS>
~/benchmark/scripts/record_score.sh 2.4b nvidia <YOLOV8S_FPS>
deactivate~/benchmark/scripts/show_scores.shYou should see all Nvidia columns filled in. Take a screenshot of the scorecard for your video.
- Power off the system completely
- Remove the RTX 3090
- Install the Intel Arc Pro B60
- Power on
Important: Your Nvidia drivers will still be installed but won't load without the hardware. This is fine — they won't interfere with Intel drivers. If you want a clean slate, you can
sudo dnf remove akmod-nvidia*before swapping, but it's not strictly necessary.
The Arc Pro B60 will not be detected by the system without these BIOS settings:
- Enable "Above 4G Decoding" — Required for Intel Arc on AMD boards
- Enable "Re-Size BAR Support" — Appears after enabling Above 4G
- Disable CSM (Compatibility Support Module) — Arc requires UEFI boot
- Set primary display to PCIe if available
Verify the card is detected after BIOS changes:
lspci | grep -i vga
# Should show: Intel Corporation Battlemage G21 [Arc Pro B60]
ls /dev/dri/
# Should show: card0 renderD128# Intel media driver (VA-API), compute runtime (Level Zero), and GPU monitoring
sudo dnf install -y intel-media-driver intel-compute-runtime \
intel-level-zero oneapi-level-zero igt-gpu-tools nvtop
# Verify VA-API (the B60 uses the xe kernel driver, not i915)
vainfo --display drm --device /dev/dri/renderD128 2>&1 | head -10
# GPU monitoring — intel_gpu_top does NOT work with xe driver, use nvtop instead
nvtop
# Install OneAPI (for AI workloads)
cat > /tmp/oneAPI.repo << 'REPO'
[oneAPI]
name=Intel oneAPI
baseurl=https://yum.repos.intel.com/oneapi
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
REPO
sudo mv /tmp/oneAPI.repo /etc/yum.repos.d/
sudo dnf install -y intel-basekit
sudo rebootAfter reboot:
vainfo --display drm --device /dev/dri/renderD128 2>&1 | head -5
~/benchmark/scripts/record_score.sh P.1 intel IDLE_WATTS_HERE
~/benchmark/scripts/record_score.sh P.4 intel PRICE_HERE# Terminal 2: ~/benchmark/scripts/monitor.sh intel hevc_encode
# HEVC encode — 4K to 4K
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-i ~/benchmark/media/4k_h264.mkv \
-c:v hevc_qsv -preset medium -global_quality 25 \
-an -y ~/benchmark/results/ffmpeg/4k_hevc_qsv.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/hevc_qsv_log.txt
~/benchmark/scripts/record_score.sh 1.1a intel <FPS>
# HEVC encode — 4K to 1080p transcode
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-i ~/benchmark/media/4k_h264.mkv \
-vf 'vpp_qsv=w=1920:h=1080' \
-c:v hevc_qsv -preset medium -global_quality 25 \
-an -y ~/benchmark/results/ffmpeg/4k_to_1080p_hevc_qsv.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/hevc_qsv_downscale_log.txt
~/benchmark/scripts/record_score.sh 1.1b intel <FPS>
~/benchmark/scripts/record_score.sh P.2 intel <PEAK_WATTS>This is where the B60 should show its strength.
# AV1 encode — 4K to 4K
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-i ~/benchmark/media/4k_h264.mkv \
-c:v av1_qsv -preset medium -global_quality 30 \
-an -y ~/benchmark/results/ffmpeg/4k_av1_qsv.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/av1_qsv_log.txt
~/benchmark/scripts/record_score.sh 1.2a intel <FPS>
# AV1 encode — 4K to 1080p
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-i ~/benchmark/media/4k_h264.mkv \
-vf 'vpp_qsv=w=1920:h=1080' \
-c:v av1_qsv -preset medium -global_quality 30 \
-an -y ~/benchmark/results/ffmpeg/4k_to_1080p_av1_qsv.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/av1_qsv_downscale_log.txt
~/benchmark/scripts/record_score.sh 1.2b intel <FPS>ffmpeg -i ~/benchmark/results/ffmpeg/4k_hevc_qsv.mkv \
-i ~/benchmark/media/4k_h264.mkv \
-lavfi libvmaf="model=version=vmaf_v0.6.1:log_path=vmaf_hevc_qsv.json:log_fmt=json" \
-f null - 2>&1 | tee ~/benchmark/results/ffmpeg/vmaf_hevc_qsv.txt
~/benchmark/scripts/record_score.sh 1.3a intel <VMAF_SCORE>
ffmpeg -i ~/benchmark/results/ffmpeg/4k_av1_qsv.mkv \
-i ~/benchmark/media/4k_h264.mkv \
-lavfi libvmaf="model=version=vmaf_v0.6.1:log_path=vmaf_av1_qsv.json:log_fmt=json" \
-f null - 2>&1 | tee ~/benchmark/results/ffmpeg/vmaf_av1_qsv.txt
~/benchmark/scripts/record_score.sh 1.3b intel <VMAF_SCORE>ffmpeg -init_hw_device qsv=hw -filter_hw_device hw \
-i ~/benchmark/media/4k_hevc_hdr.mkv \
-vf 'hwupload=extra_hw_frames=64,vpp_qsv=w=1920:h=1080:tonemap=1' \
-c:v hevc_qsv -preset medium -global_quality 25 \
-an -y ~/benchmark/results/ffmpeg/hdr_tonemap_qsv.mkv \
2>&1 | tee ~/benchmark/results/ffmpeg/hdr_tonemap_qsv_log.txt
~/benchmark/scripts/record_score.sh 1.4 intel <FPS># Terminal 1: ~/benchmark/scripts/monitor.sh intel stream_stress
# Terminal 2:
~/benchmark/scripts/stream_stress.sh intel 10
# ➜ Record max real-time streams
~/benchmark/scripts/record_score.sh 1.5 intel <MAX_STREAMS>No session limit patch needed for Intel — unlimited encode sessions out of the box. That's worth mentioning in the video.
# Reset the Tdarr library files to their original state first!
# (Re-run the batch library generation script from Phase 0 if needed)
docker run -d \
--name tdarr-intel \
--device /dev/dri:/dev/dri \
-p 8265:8265 \
-p 8266:8266 \
-v ~/benchmark/tdarr/server-intel:/app/server \
-v ~/benchmark/tdarr/configs-intel:/app/configs \
-v ~/benchmark/tdarr/logs-intel:/app/logs \
-v ~/benchmark/media/tdarr_library:/media \
-v ~/benchmark/results/tdarr:/temp \
ghcr.io/haveagitgat/tdarr:latest
# Same procedure: configure library, set HEVC target, 1 GPU worker, time it
~/benchmark/scripts/record_score.sh 1.6 intel <TOTAL_SECONDS>
docker stop tdarr-intel && docker rm tdarr-intelImportant: Ollama does NOT support Intel Arc GPUs — it will fall back to 100% CPU. Use llama.cpp built with the SYCL backend instead.
# Source OneAPI environment (required for SYCL)
source /opt/intel/oneapi/setvars.sh
# Build llama.cpp with SYCL backend
cd ~/benchmark
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build
cmake .. -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
cmake --build . --config Release -j$(nproc)Run benchmarks using Ollama model blobs (they're standard GGUF files):
# Find model file paths from Ollama's cache
ollama show llama3:8b-instruct-q4_K_M --modelfile 2>&1 | grep "^FROM /"
# Returns something like: FROM /home/brandon/.ollama/models/blobs/sha256-...
# Run llama-bench for each model
echo "=== Llama3 8B Q4 ==="
./build/bin/llama-bench \
-m $(ollama show llama3:8b-instruct-q4_K_M --modelfile 2>&1 | grep "^FROM /" | awk '{print $2}')
~/benchmark/scripts/record_score.sh 2.1a intel <GEN_TPS>
echo "=== Llama3 8B Q8 ==="
./build/bin/llama-bench \
-m $(ollama show llama3:8b-instruct-q8_0 --modelfile 2>&1 | grep "^FROM /" | awk '{print $2}')
~/benchmark/scripts/record_score.sh 2.1b intel <GEN_TPS>
echo "=== Mistral 7B Q4 ==="
./build/bin/llama-bench \
-m $(ollama show mistral:7b-instruct-q4_K_M --modelfile 2>&1 | grep "^FROM /" | awk '{print $2}')
~/benchmark/scripts/record_score.sh 2.1c intel <GEN_TPS>
echo "=== Gemma2 27B Q4 ==="
./build/bin/llama-bench \
-m $(ollama show gemma2:27b-instruct-q4_K_M --modelfile 2>&1 | grep "^FROM /" | awk '{print $2}')
~/benchmark/scripts/record_score.sh 2.1d intel <GEN_TPS>
~/benchmark/scripts/record_score.sh P.3 intel <PEAK_WATTS>cd ~/benchmark/comfyui
# Reinstall PyTorch with Intel support
source ~/benchmark/venvs/comfyui/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install openvino optimum[openvino]
# Follow ComfyUI OpenVINO integration instructions for your version
python main.py --listen 0.0.0.0 &
sleep 10
~/benchmark/scripts/sd_bench.sh intel
~/benchmark/scripts/record_score.sh 2.2 intel <AVG_SECONDS>
pkill -f "python main.py"
deactivateAlternative: If ComfyUI + OpenVINO is too finicky, use
stable-diffusion.cppwith the SYCL backend for Intel GPU support.
Note: faster-whisper (CTranslate2) has no Intel XPU backend — it falls back to CPU. Use whisper.cpp with SYCL backend instead.
# Source OneAPI environment
source /opt/intel/oneapi/setvars.sh
# Build whisper.cpp with SYCL backend (same pattern as llama.cpp)
cd ~/benchmark
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
mkdir build && cd build
cmake .. -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
cmake --build . --config Release -j$(nproc)
# Download models
bash ../models/download-ggml-model.sh small
bash ../models/download-ggml-model.sh medium
bash ../models/download-ggml-model.sh large-v3
# Run benchmarks
for MODEL in small medium large-v3; do
echo "=== Whisper $MODEL ==="
time ./build/bin/whisper-cli \
-m models/ggml-${MODEL}.bin \
-f ~/benchmark/media/audio/test_10min.wav \
--no-prints
done
# Calculate RTF = processing_time / audio_duration (600s)
~/benchmark/scripts/record_score.sh 2.3a intel <RTF_SMALL>
~/benchmark/scripts/record_score.sh 2.3b intel <RTF_MEDIUM>
~/benchmark/scripts/record_score.sh 2.3c intel <RTF_LARGE_V3>source ~/benchmark/venvs/yolo/bin/activate
pip install openvino
python3 ~/benchmark/scripts/yolo_bench.py intel openvino
~/benchmark/scripts/record_score.sh 2.4a intel <YOLOV8N_FPS>
~/benchmark/scripts/record_score.sh 2.4b intel <YOLOV8S_FPS>
deactivate~/benchmark/scripts/show_scores.shcat > ~/benchmark/scripts/efficiency.sh << 'SCRIPT'
#!/bin/bash
SCORECARD="$HOME/benchmark/results/scorecard.csv"
echo ""
echo "╔══════════════════════════════════════════════════════════════════════════════╗"
echo "║ EFFICIENCY ANALYSIS ║"
echo "╚══════════════════════════════════════════════════════════════════════════════╝"
# Extract power and price
N_PEAK=$(awk -F',' '$1=="P.2" {print $6}' "$SCORECARD")
I_PEAK=$(awk -F',' '$1=="P.2" {print $7}' "$SCORECARD")
N_PRICE=$(awk -F',' '$1=="P.4" {print $6}' "$SCORECARD")
I_PRICE=$(awk -F',' '$1=="P.4" {print $7}' "$SCORECARD")
echo ""
echo "Power: 3090=${N_PEAK}W peak | B60=${I_PEAK}W peak"
echo "Price: 3090=\$${N_PRICE} | B60=\$${I_PRICE}"
echo ""
printf "%-30s %-15s %-15s %-10s\
" "Test" "3090 perf/W" "B60 perf/W" "Winner"
printf "%-30s %-15s %-15s %-10s\
" "----" "-----------" "----------" "------"
# Calculate perf/watt for "higher is better" tests
awk -F',' -v np="$N_PEAK" -v ip="$I_PEAK" '
NR>1 && $5=="yes" && $6!="" && $7!="" {
n_ppw = $6 / (np > 0 ? np : 1)
i_ppw = $7 / (ip > 0 ? ip : 1)
winner = (n_ppw > i_ppw) ? "3090" : "B60"
printf "%-30s %-15.3f %-15.3f %-10s\
", $2, n_ppw, i_ppw, winner
}' "$SCORECARD"
echo ""
printf "%-30s %-15s %-15s %-10s\
" "Test" "3090 perf/\$" "B60 perf/\$" "Winner"
printf "%-30s %-15s %-15s %-10s\
" "----" "-----------" "----------" "------"
awk -F',' -v np="$N_PRICE" -v ip="$I_PRICE" '
NR>1 && $5=="yes" && $6!="" && $7!="" {
n_ppd = $6 / (np > 0 ? np : 1)
i_ppd = $7 / (ip > 0 ? ip : 1)
winner = (n_ppd > i_ppd) ? "3090" : "B60"
printf "%-30s %-15.4f %-15.4f %-10s\
", $2, n_ppd, i_ppd, winner
}' "$SCORECARD"
echo ""
SCRIPT
chmod +x ~/benchmark/scripts/efficiency.sh
~/benchmark/scripts/efficiency.sh# Quick copy-paste friendly version for your video description or blog post
echo ""
echo "=== COPY-PASTE RESULTS TABLE ==="
echo ""
column -t -s',' ~/benchmark/results/scorecard.csv | head -30- Fedora system ffmpeg has NVENC encode and cuvid decoders but does NOT have CUDA filters (
scale_cuda,tonemap_cuda). Useh264_cuvid -resize WxHfor GPU scaling instead. - Static ffmpeg builds (johnvansickle.com) lack NVENC entirely — GPL-only, no proprietary libs.
- Jellyfin-ffmpeg from Docker has full CUDA support BUT is statically linked, so
nvidia-patchcan't reach it. System ffmpeg + cuvid is the better approach since the patch applies to systemlibnvidia-encode.so. scale_cudarequires named params:scale_cuda=w=1920:h=1080notscale_cuda=1920:1080.- Angle brackets in shell scripts — never use
<PLACEHOLDER>syntax in bash, it parses as shell redirects. UsePLACEHOLDER_HEREinstead.
- nvidia-patch removes the 5-session NVENC limit but does NOT persist across reboots. Must re-run
sudo bash patch.shafter each boot. - Stale backup detection: If re-patching fails,
sudo rm /opt/nvidia/libnvidia-encode-backup/*then re-patch. - RTX 3090 (Ampere) has no AV1 hardware encode — AV1 NVENC started with RTX 4000 series (Ada Lovelace). Record 0 for AV1 tests.
- 3090 real hardware stream limit: ~5 simultaneous 4K→1080p HEVC transcodes regardless of nvidia-patch (GPU bandwidth ceiling, not session lock).
- BIOS settings are MANDATORY — card will not enumerate on the PCI bus without:
- Above 4G Decoding enabled
- Re-Size BAR Support enabled
- CSM disabled (Arc requires UEFI)
- Kernel driver is
xe, noti915—intel_gpu_topdoes NOT work. Usenvtopfor GPU monitoring. - Package names on Fedora: Use
intel-level-zero+oneapi-level-zero(notintel-level-zero-gpu+level-zerowhich don't exist). intel-media-drivercomes from rpmfusion-nonfree, not base Fedora repos.vainfoover SSH requires explicit device:vainfo --display drm --device /dev/dri/renderD128.- No artificial NVENC-style session limit — unlimited encode sessions out of the box. The B60 hit 25+ simultaneous 4K→1080p streams at realtime.
- Ollama does NOT support Intel Arc GPUs — shows 100% CPU in
ollama ps. Use llama.cpp built with SYCL backend instead. - llama.cpp SYCL build requires OneAPI:
cmake -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx. - Ollama model blobs can be found with:
ollama show MODEL --modelfile 2>&1 | grep "^FROM /"and passed directly to llama-bench. - Q8 quantization on B60 (9.81 tok/s) is disproportionately slow vs Q4 (53.89 tok/s) — much larger drop than on the 3090. May be a SYCL backend optimization issue.
- Intel XPU PyTorch on Fedora is broken —
libpti_view.so.0.10not found anywhere on system. PyTorch XPU wheels only available for Python 3.12 (Fedora 43 ships 3.14). ComfyUI and diffusers both failed. - Record N/A for Intel SDXL — this is a toolchain maturity issue, not a hardware limitation.
- faster-whisper (CTranslate2) has no Intel XPU backend — falls back to CPU.
- whisper.cpp with SYCL backend is the proper Intel GPU approach (same build pattern as llama.cpp).
- CUDA lib leak between venvs — whisper venv sets
LD_LIBRARY_PATHfor nvidia-cublas/cudnn which breaks YOLO. Alwaysunset LD_LIBRARY_PATHafter deactivating whisper venv.
-stream_loop -1with short clips kills cuvid performance — decoder reinitializes every loop. Use Tears of Steel (734s) instead of jellyfish (30s) to avoid constant reinit.- Without
-stream_loop, 30s clips finish before all streams ramp up, giving unreliable averages (early streams show inflated FPS). - Stream 1 always shows inflated FPS because it runs alone before other streams start (3s delay between launches). Look at the lowest stream for the real limit.
- VMAF is CPU-only — no GPU implementation exists. ~3 FPS at 4K is normal (5 min for a 30s clip).
- File order matters: distorted file first, reference file second. Swapping them gives nonsensical scores (e.g., 3.4 instead of 85+).
- Fedora 43 ships Python 3.14 — many ML packages don't have wheels for it yet (Intel XPU PyTorch, IPEX, etc.). Install
python3.12for Intel ML workloads. - Intel PyTorch XPU wheel deps aren't on Intel's pip index — pre-install
filelock setuptools typing-extensions sympy networkx jinja2 fsspecfrom PyPI, thenpip install --no-deps torch --index-urlfrom Intel. - OneAPI
setvars.shmust be sourced before running any SYCL workload. Add to.bashrcor run manually each session.