Skip to content

Instantly share code, notes, and snippets.

@Yearly1825
Forked from taslabs-net/PVE9_TB4_Guide_Updated.md
Last active December 5, 2025 03:29
Show Gist options
  • Select an option

  • Save Yearly1825/1e2798cbe4fb0e0d0574551da7dab0a0 to your computer and use it in GitHub Desktop.

Select an option

Save Yearly1825/1e2798cbe4fb0e0d0574551da7dab0a0 to your computer and use it in GitHub Desktop.
Thunderbolt4 mesh network

Complete Thunderbolt 4 + Ceph Guide: Setup for Proxmox VE 9

Acknowledgments

This builds upon excellent foundational work by @scyto.

Key contributions from @scyto's work:

  • TB4 hardware detection and kernel module strategies
  • Systemd networking and udev automation techniques
  • MTU optimization and performance tuning approaches

Changelog - Fork Updates January 2025 - Fork by Yearly1825 Improvements and fixes based on PVE 9.0.10 testing:

  • Fixed script error handling: Replaced problematic || syntax with proper if/then/else statements in interface bringup scripts (pve-en05.sh and pve-en06.sh) for more reliable error handling and retry logic
  • Optimized Ceph networking: Changed configuration to use Thunderbolt network (10.100.0.0/24) for both public and cluster networks instead of split networks, resolving performance degradation issues discovered in production (ref: Proxmox forum thread #170091)
  • Simplified network topology: Removed confusing and unnecessary /30 point-to-point subnet configuration - OpenFabric handles mesh routing automatically without manual subnet assignments
  • Improved /etc/network/interfaces instructions: Changed from appending to file to properly inserting configuration above the source directive to prevent conflicts
  • Enhanced script reliability: Added proper bash conditionals for more predictable behavior during interface initialization

Overview

This guide provides a step-by-step, tested setup for building a high-performance Thunderbolt 4 + Ceph cluster on Proxmox VE 9.

Lab Results:

  • TB4 Mesh Performance: Sub-millisecond latency, 65520 MTU, full mesh connectivity
  • Ceph Performance: 1,300+ MB/s write, 1,760+ MB/s read with optimizations
  • Reliability: 0% packet loss, automatic failover, persistent configuration
  • Integration: Full Proxmox GUI visibility and management

Hardware Environment:

  • Nodes: 3x systems with dual TB4 ports (tested on MS01 mini-PCs)
  • Memory: 64GB RAM per node (optimal for high-performance Ceph)
  • CPU: 13th Gen Intel (or equivalent high-performance processors)
  • Storage: NVMe drives for Ceph OSDs
  • Network: TB4 mesh (10.100.0.0/24) + management (10.11.12.0/24)

Software Stack:

  • Proxmox VE: 9.0 with native SDN OpenFabric support
  • Ceph: Reef with BlueStore, LZ4 compression, 2:1 replication
  • OpenFabric: IPv4-only mesh routing for simplicity and performance

Prerequisites: What You Need

Physical Requirements

  • 3 nodes minimum: Each with dual TB4 ports (tested with MS01 mini-PCs)
  • TB4 cables: Quality TB4 cables for mesh connectivity
  • Ring topology: Physical connections n2→n3→n4→n2 (or similar mesh pattern)
  • Management network: Standard Ethernet for initial setup and management

Software Requirements

  • Proxmox VE 9.0
  • Root access to all nodes
  • Basic Linux networking knowledge
  • Patience: TB4 mesh setup requires careful attention to detail!

Network Planning

  • Management network: 10.11.12.0/24 (adjust to your environment)
  • TB4 cluster network: 10.100.0.0/24 (for Ceph cluster traffic)
  • Router IDs: 10.100.0.12 (n2), 10.100.0.13 (n3), 10.100.0.14 (n4)

Phase 1: Thunderbolt Foundation Setup

Step 1: Prepare All Nodes

Critical: Perform these steps on ALL mesh nodes (n2, n3, n4).

Load TB4 kernel modules:

# Execute on each node:
echo 'thunderbolt' >> /etc/modules
echo 'thunderbolt-net' >> /etc/modules
modprobe thunderbolt && modprobe thunderbolt-net

Verify modules loaded:

lsmod | grep thunderbolt

Expected output: Both thunderbolt and thunderbolt_net modules present.

Step 2: Identify TB4 Hardware

Find TB4 controllers and interfaces:

lspci | grep -i thunderbolt
ip link show | grep -E '(en0[5-9]|thunderbolt)'

Expected: TB4 PCI controllers detected, TB4 network interfaces visible.

Step 3: Create Systemd Link Files

Critical: Create interface renaming rules based on PCI paths for consistent naming.

# Create systemd link file for first TB4 interface:
cat > /etc/systemd/network/00-thunderbolt0.link << 'EOF'
[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en05
EOF

# Create systemd link file for second TB4 interface:
cat > /etc/systemd/network/00-thunderbolt1.link << 'EOF'
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en06
EOF

Note: Adjust PCI paths if different on your hardware (check with lspci | grep -i thunderbolt)

Step 4: Configure Network Interfaces

Add TB4 interfaces to network configuration with optimal settings:

vim /etc/network/interfaces 

Add the below to /etc/network/interfaces above this line source /etc/network/interfaces.d/*

auto en05
iface en05 inet manual
    mtu 65520

auto en06
iface en06 inet manual
    mtu 65520

Step 5: Enable systemd-networkd

Required for systemd link files to work:

systemctl enable systemd-networkd
systemctl start systemd-networkd

Step 6: Create Udev Rules and Scripts

Automation for reliable interface bringup on cable insertion:

Create udev rules:

cat > /etc/udev/rules.d/10-tb-en.rules << 'EOF'
ACTION=="add|move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="add|move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
EOF

Create en05 bringup script:

cat > /usr/local/bin/pve-en05.sh << 'EOF'
#!/bin/bash
LOGFILE="/tmp/udev-debug.log"
echo "$(date): en05 bringup triggered" >> "$LOGFILE"
for i in {1..5}; do
    if ip link set en05 up mtu 65520; then
        echo "$(date): en05 up successful on attempt $i" >> "$LOGFILE"
        break
    else
        echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
        sleep 3
    fi
done
EOF
chmod +x /usr/local/bin/pve-en05.sh

Create en06 bringup script:

cat > /usr/local/bin/pve-en06.sh << 'EOF'
#!/bin/bash
LOGFILE="/tmp/udev-debug.log"
echo "$(date): en06 bringup triggered" >> "$LOGFILE"
for i in {1..5}; do
    if ip link set en06 up mtu 65520; then
        echo "$(date): en06 up successful on attempt $i" >> "$LOGFILE"
        break
    else
        echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
        sleep 3
    fi
done
EOF
chmod +x /usr/local/bin/pve-en06.sh

Step 7: Update Initramfs and Reboot

Apply all TB4 configuration changes:

# Update initramfs:
update-initramfs -u -k all

# Reboot to apply changes:
reboot

After reboot, verify TB4 interfaces:

ip link show | grep -E '(en05|en06)'

Expected result: TB4 interfaces should be named en05 and en06 with proper MTU settings.

Step 8: Enable IPv4 Forwarding

Essential: TB4 mesh requires IPv4 forwarding for OpenFabric routing.

echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p

Verify forwarding enabled:

sysctl net.ipv4.ip_forward

Expected: net.ipv4.ip_forward = 1

Step 9: Create Systemd Service for Boot Reliability

Ensure TB4 interfaces come up automatically on boot:

Create systemd service:

cat > /etc/systemd/system/thunderbolt-interfaces.service << 'EOF'
[Unit]
Description=Configure Thunderbolt Network Interfaces
After=network.target thunderbolt.service
Wants=network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/thunderbolt-startup.sh

[Install]
WantedBy=multi-user.target
EOF

Create startup script:

cat > /usr/local/bin/thunderbolt-startup.sh << 'EOF'
#!/bin/bash
# Thunderbolt interface startup script
LOGFILE="/var/log/thunderbolt-startup.log"

echo "$(date): Starting Thunderbolt interface configuration" >> "$LOGFILE"

# Wait up to 30 seconds for interfaces to appear
for i in {1..30}; do
    if ip link show en05 &>/dev/null && ip link show en06 &>/dev/null; then
        echo "$(date): Thunderbolt interfaces found" >> "$LOGFILE"
        break
    fi
    echo "$(date): Waiting for Thunderbolt interfaces... ($i/30)" >> "$LOGFILE"
    sleep 1
done

# Configure interfaces if they exist
if ip link show en05 &>/dev/null; then
    /usr/local/bin/pve-en05.sh
    echo "$(date): en05 configured" >> "$LOGFILE"
fi

if ip link show en06 &>/dev/null; then
    /usr/local/bin/pve-en06.sh
    echo "$(date): en06 configured" >> "$LOGFILE"
fi

echo "$(date): Thunderbolt configuration completed" >> "$LOGFILE"
EOF

chmod +x /usr/local/bin/thunderbolt-startup.sh

Enable the service:

systemctl daemon-reload
systemctl enable thunderbolt-interfaces.service

Phase 2: Proxmox SDN Configuration

Step 1: Create OpenFabric Fabric in GUI

Location: Datacenter → SDN → Fabrics

  1. Click: "Add Fabric" → "OpenFabric"
  2. Configure in the dialog:
    • Name: tb4
    • IPv4 Prefix: 10.100.0.0/24
    • IPv6 Prefix: (leave empty for IPv4-only)
    • Hello Interval: 3 (default)
    • CSNP Interval: 10 (default)
  3. Click: "OK"

Expected result: You should see a fabric named tb4 with Protocol OpenFabric and IPv4 10.100.0.0/24

Step 2: Add Nodes to Fabric

Still in: Datacenter → SDN → Fabrics → (select tb4 fabric)

  1. Click: "Add Node"
  2. Configure for n2:
    • Node: n2
    • IPv4: 10.100.0.12
    • IPv6: (leave empty)
    • Interfaces: Select en05 and en06 from the interface list
  3. Click: "OK"
  4. Repeat for n3: IPv4: 10.100.0.13, interfaces: en05, en06
  5. Repeat for n4: IPv4: 10.100.0.14, interfaces: en05, en06

Important: Configure /30 point-to-point addresses on the en05 and en06 interfaces:

  • n2: en05: 10.100.0.1/30, en06: 10.100.0.5/30
  • n3: en05: 10.100.0.9/30, en06: 10.100.0.13/30
  • n4: en05: 10.100.0.17/30, en06: 10.100.0.21/30

Step 3: Apply SDN Configuration

Critical: This activates the mesh - nothing works until you apply!

In GUI: Datacenter → SDN → "Apply" (button in top toolbar)

Expected result: Status table shows all nodes with "OK" status

Step 4: Start FRR Service

Critical: OpenFabric routing requires FRR (Free Range Routing) to be running.

systemctl start frr
systemctl enable frr

Verify FRR is running:

systemctl status frr | grep Active

Expected output: Active: active (running)

Phase 3: Mesh Verification and Testing

Step 1: Verify Interface Configuration

Check TB4 interfaces are up with correct settings:

ip addr show | grep -E '(en05|en06|10\.100\.0\.)'

Step 2: Test OpenFabric Mesh Connectivity

Critical test: Verify full mesh communication works.

# Test router ID connectivity from current node:
ping -c 3 10.100.0.12
ping -c 3 10.100.0.13
ping -c 3 10.100.0.14

Expected: All pings succeed with sub-millisecond latency (~0.6ms)

If connectivity fails: TB4 interfaces may need manual bring-up:

ip link set en05 up mtu 65520
ip link set en06 up mtu 65520
ifreload -a

Phase 4: High-Performance Ceph Integration

Step 1: Install Ceph

Install Ceph packages:

pveceph install --repository no-subscription

Step 2: Create Ceph Directory Structure

Essential: Proper directory structure and ownership:

mkdir -p /var/lib/ceph && chown ceph:ceph /var/lib/ceph
mkdir -p /etc/ceph && chown ceph:ceph /etc/ceph

Step 3: Create First Monitor and Manager

On the first node (n2) only:

pveceph mon create

Verify monitor creation:

ceph -s

Step 4: Configure Network Settings

Set public and cluster networks for optimal TB4 performance:

Changed to use the thunderbolt network for everything

ceph config set global public_network 10.100.0.0/24
ceph config set global cluster_network 10.100.0.0/24
ceph config set mon public_network 10.100.0.0/24
ceph config set mon cluster_network 10.100.0.0/24

Original Used a different public/cluster network which led to degraded performance (https://forum.proxmox.com/threads/low-ceph-performance-on-3-node-proxmox-9-cluster-with-sata-ssds.170091/)

  • ceph config set global public_network 10.11.12.0/24
  • ceph config set global cluster_network 10.100.0.0/24
  • ceph config set mon public_network 10.11.12.0/24
  • ceph config set mon cluster_network 10.100.0.0/24

Step 5: Create Additional Monitors

On n3 and n4 nodes:

pveceph mon create

Verify 3-monitor quorum:

ceph quorum_status

Step 6: Create OSDs

Create OSDs on NVMe drives (adjust device names as needed):

# Create two OSDs per node:
pveceph osd create /dev/nvme0n1
pveceph osd create /dev/nvme1n1

Verify all OSDs are up:

ceph osd tree

Phase 5: High-Performance Optimizations

Memory Optimizations (64GB RAM Nodes)

# Set OSD memory target to 8GB per OSD:
ceph config set osd osd_memory_target 8589934592

# Set BlueStore cache sizes for NVMe performance:
ceph config set osd bluestore_cache_size_ssd 4294967296

# Set memory allocation optimizations:
ceph config set osd osd_memory_cache_min 1073741824
ceph config set osd osd_memory_cache_resize_interval 1

CPU and Threading Optimizations

# Set CPU threading optimizations:
ceph config set osd osd_op_num_threads_per_shard 2
ceph config set osd osd_op_num_shards 8

# Set BlueStore threading for NVMe:
ceph config set osd bluestore_sync_submit_transaction false
ceph config set osd bluestore_throttle_bytes 268435456
ceph config set osd bluestore_throttle_deferred_bytes 134217728

# Set CPU-specific optimizations:
ceph config set osd osd_client_message_cap 1000
ceph config set osd osd_client_message_size_cap 1073741824

Network Optimizations for TB4 Mesh

# Set network optimizations for TB4 mesh:
ceph config set global ms_tcp_nodelay true
ceph config set global ms_tcp_rcvbuf 134217728
ceph config set global ms_tcp_prefetch_max_size 65536

# Set cluster network optimizations:
ceph config set global ms_cluster_mode crc
ceph config set global ms_async_op_threads 8
ceph config set global ms_dispatch_throttle_bytes 1073741824

# Set heartbeat optimizations:
ceph config set osd osd_heartbeat_interval 6
ceph config set osd osd_heartbeat_grace 20

BlueStore and NVMe Optimizations

# Set BlueStore optimizations for NVMe drives:
ceph config set osd bluestore_compression_algorithm lz4
ceph config set osd bluestore_compression_mode aggressive
ceph config set osd bluestore_compression_required_ratio 0.7

# Set NVMe-specific optimizations:
ceph config set osd bluestore_cache_trim_interval 200

# Set WAL and DB optimizations:
ceph config set osd bluestore_block_db_size 5368709120
ceph config set osd bluestore_block_wal_size 1073741824

Scrubbing and Maintenance Optimizations

# Set scrubbing optimizations:
ceph config set osd osd_scrub_during_recovery false
ceph config set osd osd_scrub_begin_hour 2
ceph config set osd osd_scrub_end_hour 6

# Set deep scrub optimizations:
ceph config set osd osd_deep_scrub_interval 1209600
ceph config set osd osd_scrub_max_interval 1209600
ceph config set osd osd_scrub_min_interval 86400

# Set recovery optimizations:
ceph config set osd osd_recovery_max_active 8
ceph config set osd osd_max_backfills 4
ceph config set osd osd_recovery_op_priority 1

Phase 6: Storage Pool Creation

Create High-Performance Storage Pool

# Create pool with optimal PG count for 6 OSDs:
ceph osd pool create cephtb4 256 256

# Set 2:1 replication ratio:
ceph osd pool set cephtb4 size 2
ceph osd pool set cephtb4 min_size 1

# Enable RBD application:
ceph osd pool application enable cephtb4 rbd

Verify Cluster Health

ceph -s

Expected results:

  • Health: HEALTH_OK
  • OSDs: 6 osds: 6 up, 6 in
  • PGs: All PGs active+clean

Phase 7: Performance Testing

Test Optimized Cluster Performance

# Test write performance:
rados -p cephtb4 bench 10 write --no-cleanup -b 4M -t 16

# Test read performance:
rados -p cephtb4 bench 10 rand -t 16

# Clean up test data:
rados -p cephtb4 cleanup

Expected Results:

  • Write Performance: ~1,300 MB/s average, 2,000+ MB/s peak
  • Read Performance: ~1,760 MB/s average, 2,400+ MB/s peak

System-Level Performance Optimizations (Optional)

# Network tuning:
echo 'net.core.rmem_max = 268435456' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 268435456' >> /etc/sysctl.conf
echo 'net.core.netdev_max_backlog = 30000' >> /etc/sysctl.conf

# Memory tuning:
echo 'vm.swappiness = 1' >> /etc/sysctl.conf
echo 'vm.min_free_kbytes = 4194304' >> /etc/sysctl.conf

# Apply settings:
sysctl -p

Troubleshooting Common Issues

TB4 Mesh Issues

Problem: TB4 interfaces not coming up after reboot

Quick Fix: Manually bring up interfaces:

ip link set en05 up mtu 65520
ip link set en06 up mtu 65520
ifreload -a

Permanent Fix: Check systemd service:

systemctl status thunderbolt-interfaces.service

# Check if scripts are corrupted:
wc -l /usr/local/bin/pve-en*.sh

# Check for shebang errors:
head -1 /usr/local/bin/*.sh | grep -E 'thunderbolt|pve-en'

# Fix shebang if corrupted:
sed -i '1s/#\\!/#!/' /usr/local/bin/thunderbolt-startup.sh
sed -i '1s/#\\!/#!/' /usr/local/bin/pve-en05.sh
sed -i '1s/#\\!/#!/' /usr/local/bin/pve-en06.sh

Problem: Mesh connectivity fails between nodes

# Check interface status:
ip addr show | grep -E '(en05|en06|10\.100\.0\.)'

# Verify FRR routing service:
systemctl status frr

Ceph Issues

Problem: OSDs going down after creation

# Restart OSD services after fixing mesh:
systemctl restart ceph-osd@*.service

Problem: Ceph cluster shows OSDs down after reboot

# 1. Bring up TB4 interfaces:
/usr/local/bin/pve-en05.sh
/usr/local/bin/pve-en06.sh

# 2. Wait for interfaces to stabilize:
sleep 10

# 3. Restart Ceph OSDs:
systemctl restart ceph-osd@*.service

# 4. Monitor recovery:
watch ceph -s

Problem: Inactive PGs or slow performance

# Check cluster status:
ceph -s

# Verify optimizations are applied:
ceph config dump | grep -E '(memory_target|cache_size|compression)'

# Check network binding:
ceph config get osd cluster_network
ceph config get osd public_network

Changelog

January 27, 2025

  • Cleaned up commands for direct terminal execution (no SSH wrappers)
  • Fixed formatting and organization
  • Updated Ceph version references (Nautilus → Reef)
  • Clarified step-by-step execution flow
@scloder
Copy link

scloder commented Dec 5, 2025

if you don't set the ips for en05/06 in the sdn ui and they aren't set anywhere else, what do they get set to? can you share your working /etc/network/interfaces.d/sdn
because when i repeat that, my en05/06 get the same ip as the iface dummy_tb4 inet static which breaks everything, i think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment