Skip to content

Instantly share code, notes, and snippets.

@Yearly1825
Forked from taslabs-net/PVE9_TB4_Guide_Updated.md
Last active December 5, 2025 03:29
Show Gist options
  • Select an option

  • Save Yearly1825/1e2798cbe4fb0e0d0574551da7dab0a0 to your computer and use it in GitHub Desktop.

Select an option

Save Yearly1825/1e2798cbe4fb0e0d0574551da7dab0a0 to your computer and use it in GitHub Desktop.
Thunderbolt4 mesh network

Complete Thunderbolt 4 + Ceph Guide: Setup for Proxmox VE 9

Acknowledgments

This builds upon excellent foundational work by @scyto.

Key contributions from @scyto's work:

  • TB4 hardware detection and kernel module strategies
  • Systemd networking and udev automation techniques
  • MTU optimization and performance tuning approaches

Changelog - Fork Updates January 2025 - Fork by Yearly1825 Improvements and fixes based on PVE 9.0.10 testing:

  • Fixed script error handling: Replaced problematic || syntax with proper if/then/else statements in interface bringup scripts (pve-en05.sh and pve-en06.sh) for more reliable error handling and retry logic
  • Optimized Ceph networking: Changed configuration to use Thunderbolt network (10.100.0.0/24) for both public and cluster networks instead of split networks, resolving performance degradation issues discovered in production (ref: Proxmox forum thread #170091)
  • Simplified network topology: Removed confusing and unnecessary /30 point-to-point subnet configuration - OpenFabric handles mesh routing automatically without manual subnet assignments
  • Improved /etc/network/interfaces instructions: Changed from appending to file to properly inserting configuration above the source directive to prevent conflicts
  • Enhanced script reliability: Added proper bash conditionals for more predictable behavior during interface initialization

Overview

This guide provides a step-by-step, tested setup for building a high-performance Thunderbolt 4 + Ceph cluster on Proxmox VE 9.

Lab Results:

  • TB4 Mesh Performance: Sub-millisecond latency, 65520 MTU, full mesh connectivity
  • Ceph Performance: 1,300+ MB/s write, 1,760+ MB/s read with optimizations
  • Reliability: 0% packet loss, automatic failover, persistent configuration
  • Integration: Full Proxmox GUI visibility and management

Hardware Environment:

  • Nodes: 3x systems with dual TB4 ports (tested on MS01 mini-PCs)
  • Memory: 64GB RAM per node (optimal for high-performance Ceph)
  • CPU: 13th Gen Intel (or equivalent high-performance processors)
  • Storage: NVMe drives for Ceph OSDs
  • Network: TB4 mesh (10.100.0.0/24) + management (10.11.12.0/24)

Software Stack:

  • Proxmox VE: 9.0 with native SDN OpenFabric support
  • Ceph: Reef with BlueStore, LZ4 compression, 2:1 replication
  • OpenFabric: IPv4-only mesh routing for simplicity and performance

Prerequisites: What You Need

Physical Requirements

  • 3 nodes minimum: Each with dual TB4 ports (tested with MS01 mini-PCs)
  • TB4 cables: Quality TB4 cables for mesh connectivity
  • Ring topology: Physical connections n2→n3→n4→n2 (or similar mesh pattern)
  • Management network: Standard Ethernet for initial setup and management

Software Requirements

  • Proxmox VE 9.0
  • Root access to all nodes
  • Basic Linux networking knowledge
  • Patience: TB4 mesh setup requires careful attention to detail!

Network Planning

  • Management network: 10.11.12.0/24 (adjust to your environment)
  • TB4 cluster network: 10.100.0.0/24 (for Ceph cluster traffic)
  • Router IDs: 10.100.0.12 (n2), 10.100.0.13 (n3), 10.100.0.14 (n4)

Phase 1: Thunderbolt Foundation Setup

Step 1: Prepare All Nodes

Critical: Perform these steps on ALL mesh nodes (n2, n3, n4).

Load TB4 kernel modules:

# Execute on each node:
echo 'thunderbolt' >> /etc/modules
echo 'thunderbolt-net' >> /etc/modules
modprobe thunderbolt && modprobe thunderbolt-net

Verify modules loaded:

lsmod | grep thunderbolt

Expected output: Both thunderbolt and thunderbolt_net modules present.

Step 2: Identify TB4 Hardware

Find TB4 controllers and interfaces:

lspci | grep -i thunderbolt
ip link show | grep -E '(en0[5-9]|thunderbolt)'

Expected: TB4 PCI controllers detected, TB4 network interfaces visible.

Step 3: Create Systemd Link Files

Critical: Create interface renaming rules based on PCI paths for consistent naming.

# Create systemd link file for first TB4 interface:
cat > /etc/systemd/network/00-thunderbolt0.link << 'EOF'
[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en05
EOF

# Create systemd link file for second TB4 interface:
cat > /etc/systemd/network/00-thunderbolt1.link << 'EOF'
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en06
EOF

Note: Adjust PCI paths if different on your hardware (check with lspci | grep -i thunderbolt)

Step 4: Configure Network Interfaces

Add TB4 interfaces to network configuration with optimal settings:

vim /etc/network/interfaces 

Add the below to /etc/network/interfaces above this line source /etc/network/interfaces.d/*

auto en05
iface en05 inet manual
    mtu 65520

auto en06
iface en06 inet manual
    mtu 65520

Step 5: Enable systemd-networkd

Required for systemd link files to work:

systemctl enable systemd-networkd
systemctl start systemd-networkd

Step 6: Create Udev Rules and Scripts

Automation for reliable interface bringup on cable insertion:

Create udev rules:

cat > /etc/udev/rules.d/10-tb-en.rules << 'EOF'
ACTION=="add|move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="add|move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
EOF

Create en05 bringup script:

cat > /usr/local/bin/pve-en05.sh << 'EOF'
#!/bin/bash
LOGFILE="/tmp/udev-debug.log"
echo "$(date): en05 bringup triggered" >> "$LOGFILE"
for i in {1..5}; do
    if ip link set en05 up mtu 65520; then
        echo "$(date): en05 up successful on attempt $i" >> "$LOGFILE"
        break
    else
        echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
        sleep 3
    fi
done
EOF
chmod +x /usr/local/bin/pve-en05.sh

Create en06 bringup script:

cat > /usr/local/bin/pve-en06.sh << 'EOF'
#!/bin/bash
LOGFILE="/tmp/udev-debug.log"
echo "$(date): en06 bringup triggered" >> "$LOGFILE"
for i in {1..5}; do
    if ip link set en06 up mtu 65520; then
        echo "$(date): en06 up successful on attempt $i" >> "$LOGFILE"
        break
    else
        echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
        sleep 3
    fi
done
EOF
chmod +x /usr/local/bin/pve-en06.sh

Step 7: Update Initramfs and Reboot

Apply all TB4 configuration changes:

# Update initramfs:
update-initramfs -u -k all

# Reboot to apply changes:
reboot

After reboot, verify TB4 interfaces:

ip link show | grep -E '(en05|en06)'

Expected result: TB4 interfaces should be named en05 and en06 with proper MTU settings.

Step 8: Enable IPv4 Forwarding

Essential: TB4 mesh requires IPv4 forwarding for OpenFabric routing.

echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p

Verify forwarding enabled:

sysctl net.ipv4.ip_forward

Expected: net.ipv4.ip_forward = 1

Step 9: Create Systemd Service for Boot Reliability

Ensure TB4 interfaces come up automatically on boot:

Create systemd service:

cat > /etc/systemd/system/thunderbolt-interfaces.service << 'EOF'
[Unit]
Description=Configure Thunderbolt Network Interfaces
After=network.target thunderbolt.service
Wants=network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/thunderbolt-startup.sh

[Install]
WantedBy=multi-user.target
EOF

Create startup script:

cat > /usr/local/bin/thunderbolt-startup.sh << 'EOF'
#!/bin/bash
# Thunderbolt interface startup script
LOGFILE="/var/log/thunderbolt-startup.log"

echo "$(date): Starting Thunderbolt interface configuration" >> "$LOGFILE"

# Wait up to 30 seconds for interfaces to appear
for i in {1..30}; do
    if ip link show en05 &>/dev/null && ip link show en06 &>/dev/null; then
        echo "$(date): Thunderbolt interfaces found" >> "$LOGFILE"
        break
    fi
    echo "$(date): Waiting for Thunderbolt interfaces... ($i/30)" >> "$LOGFILE"
    sleep 1
done

# Configure interfaces if they exist
if ip link show en05 &>/dev/null; then
    /usr/local/bin/pve-en05.sh
    echo "$(date): en05 configured" >> "$LOGFILE"
fi

if ip link show en06 &>/dev/null; then
    /usr/local/bin/pve-en06.sh
    echo "$(date): en06 configured" >> "$LOGFILE"
fi

echo "$(date): Thunderbolt configuration completed" >> "$LOGFILE"
EOF

chmod +x /usr/local/bin/thunderbolt-startup.sh

Enable the service:

systemctl daemon-reload
systemctl enable thunderbolt-interfaces.service

Phase 2: Proxmox SDN Configuration

Step 1: Create OpenFabric Fabric in GUI

Location: Datacenter → SDN → Fabrics

  1. Click: "Add Fabric" → "OpenFabric"
  2. Configure in the dialog:
    • Name: tb4
    • IPv4 Prefix: 10.100.0.0/24
    • IPv6 Prefix: (leave empty for IPv4-only)
    • Hello Interval: 3 (default)
    • CSNP Interval: 10 (default)
  3. Click: "OK"

Expected result: You should see a fabric named tb4 with Protocol OpenFabric and IPv4 10.100.0.0/24

Step 2: Add Nodes to Fabric

Still in: Datacenter → SDN → Fabrics → (select tb4 fabric)

  1. Click: "Add Node"
  2. Configure for n2:
    • Node: n2
    • IPv4: 10.100.0.12
    • IPv6: (leave empty)
    • Interfaces: Select en05 and en06 from the interface list
  3. Click: "OK"
  4. Repeat for n3: IPv4: 10.100.0.13, interfaces: en05, en06
  5. Repeat for n4: IPv4: 10.100.0.14, interfaces: en05, en06

Important: Configure /30 point-to-point addresses on the en05 and en06 interfaces:

  • n2: en05: 10.100.0.1/30, en06: 10.100.0.5/30
  • n3: en05: 10.100.0.9/30, en06: 10.100.0.13/30
  • n4: en05: 10.100.0.17/30, en06: 10.100.0.21/30

Step 3: Apply SDN Configuration

Critical: This activates the mesh - nothing works until you apply!

In GUI: Datacenter → SDN → "Apply" (button in top toolbar)

Expected result: Status table shows all nodes with "OK" status

Step 4: Start FRR Service

Critical: OpenFabric routing requires FRR (Free Range Routing) to be running.

systemctl start frr
systemctl enable frr

Verify FRR is running:

systemctl status frr | grep Active

Expected output: Active: active (running)

Phase 3: Mesh Verification and Testing

Step 1: Verify Interface Configuration

Check TB4 interfaces are up with correct settings:

ip addr show | grep -E '(en05|en06|10\.100\.0\.)'

Step 2: Test OpenFabric Mesh Connectivity

Critical test: Verify full mesh communication works.

# Test router ID connectivity from current node:
ping -c 3 10.100.0.12
ping -c 3 10.100.0.13
ping -c 3 10.100.0.14

Expected: All pings succeed with sub-millisecond latency (~0.6ms)

If connectivity fails: TB4 interfaces may need manual bring-up:

ip link set en05 up mtu 65520
ip link set en06 up mtu 65520
ifreload -a

Phase 4: High-Performance Ceph Integration

Step 1: Install Ceph

Install Ceph packages:

pveceph install --repository no-subscription

Step 2: Create Ceph Directory Structure

Essential: Proper directory structure and ownership:

mkdir -p /var/lib/ceph && chown ceph:ceph /var/lib/ceph
mkdir -p /etc/ceph && chown ceph:ceph /etc/ceph

Step 3: Create First Monitor and Manager

On the first node (n2) only:

pveceph mon create

Verify monitor creation:

ceph -s

Step 4: Configure Network Settings

Set public and cluster networks for optimal TB4 performance:

Changed to use the thunderbolt network for everything

ceph config set global public_network 10.100.0.0/24
ceph config set global cluster_network 10.100.0.0/24
ceph config set mon public_network 10.100.0.0/24
ceph config set mon cluster_network 10.100.0.0/24

Original Used a different public/cluster network which led to degraded performance (https://forum.proxmox.com/threads/low-ceph-performance-on-3-node-proxmox-9-cluster-with-sata-ssds.170091/)

  • ceph config set global public_network 10.11.12.0/24
  • ceph config set global cluster_network 10.100.0.0/24
  • ceph config set mon public_network 10.11.12.0/24
  • ceph config set mon cluster_network 10.100.0.0/24

Step 5: Create Additional Monitors

On n3 and n4 nodes:

pveceph mon create

Verify 3-monitor quorum:

ceph quorum_status

Step 6: Create OSDs

Create OSDs on NVMe drives (adjust device names as needed):

# Create two OSDs per node:
pveceph osd create /dev/nvme0n1
pveceph osd create /dev/nvme1n1

Verify all OSDs are up:

ceph osd tree

Phase 5: High-Performance Optimizations

Memory Optimizations (64GB RAM Nodes)

# Set OSD memory target to 8GB per OSD:
ceph config set osd osd_memory_target 8589934592

# Set BlueStore cache sizes for NVMe performance:
ceph config set osd bluestore_cache_size_ssd 4294967296

# Set memory allocation optimizations:
ceph config set osd osd_memory_cache_min 1073741824
ceph config set osd osd_memory_cache_resize_interval 1

CPU and Threading Optimizations

# Set CPU threading optimizations:
ceph config set osd osd_op_num_threads_per_shard 2
ceph config set osd osd_op_num_shards 8

# Set BlueStore threading for NVMe:
ceph config set osd bluestore_sync_submit_transaction false
ceph config set osd bluestore_throttle_bytes 268435456
ceph config set osd bluestore_throttle_deferred_bytes 134217728

# Set CPU-specific optimizations:
ceph config set osd osd_client_message_cap 1000
ceph config set osd osd_client_message_size_cap 1073741824

Network Optimizations for TB4 Mesh

# Set network optimizations for TB4 mesh:
ceph config set global ms_tcp_nodelay true
ceph config set global ms_tcp_rcvbuf 134217728
ceph config set global ms_tcp_prefetch_max_size 65536

# Set cluster network optimizations:
ceph config set global ms_cluster_mode crc
ceph config set global ms_async_op_threads 8
ceph config set global ms_dispatch_throttle_bytes 1073741824

# Set heartbeat optimizations:
ceph config set osd osd_heartbeat_interval 6
ceph config set osd osd_heartbeat_grace 20

BlueStore and NVMe Optimizations

# Set BlueStore optimizations for NVMe drives:
ceph config set osd bluestore_compression_algorithm lz4
ceph config set osd bluestore_compression_mode aggressive
ceph config set osd bluestore_compression_required_ratio 0.7

# Set NVMe-specific optimizations:
ceph config set osd bluestore_cache_trim_interval 200

# Set WAL and DB optimizations:
ceph config set osd bluestore_block_db_size 5368709120
ceph config set osd bluestore_block_wal_size 1073741824

Scrubbing and Maintenance Optimizations

# Set scrubbing optimizations:
ceph config set osd osd_scrub_during_recovery false
ceph config set osd osd_scrub_begin_hour 2
ceph config set osd osd_scrub_end_hour 6

# Set deep scrub optimizations:
ceph config set osd osd_deep_scrub_interval 1209600
ceph config set osd osd_scrub_max_interval 1209600
ceph config set osd osd_scrub_min_interval 86400

# Set recovery optimizations:
ceph config set osd osd_recovery_max_active 8
ceph config set osd osd_max_backfills 4
ceph config set osd osd_recovery_op_priority 1

Phase 6: Storage Pool Creation

Create High-Performance Storage Pool

# Create pool with optimal PG count for 6 OSDs:
ceph osd pool create cephtb4 256 256

# Set 2:1 replication ratio:
ceph osd pool set cephtb4 size 2
ceph osd pool set cephtb4 min_size 1

# Enable RBD application:
ceph osd pool application enable cephtb4 rbd

Verify Cluster Health

ceph -s

Expected results:

  • Health: HEALTH_OK
  • OSDs: 6 osds: 6 up, 6 in
  • PGs: All PGs active+clean

Phase 7: Performance Testing

Test Optimized Cluster Performance

# Test write performance:
rados -p cephtb4 bench 10 write --no-cleanup -b 4M -t 16

# Test read performance:
rados -p cephtb4 bench 10 rand -t 16

# Clean up test data:
rados -p cephtb4 cleanup

Expected Results:

  • Write Performance: ~1,300 MB/s average, 2,000+ MB/s peak
  • Read Performance: ~1,760 MB/s average, 2,400+ MB/s peak

System-Level Performance Optimizations (Optional)

# Network tuning:
echo 'net.core.rmem_max = 268435456' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 268435456' >> /etc/sysctl.conf
echo 'net.core.netdev_max_backlog = 30000' >> /etc/sysctl.conf

# Memory tuning:
echo 'vm.swappiness = 1' >> /etc/sysctl.conf
echo 'vm.min_free_kbytes = 4194304' >> /etc/sysctl.conf

# Apply settings:
sysctl -p

Troubleshooting Common Issues

TB4 Mesh Issues

Problem: TB4 interfaces not coming up after reboot

Quick Fix: Manually bring up interfaces:

ip link set en05 up mtu 65520
ip link set en06 up mtu 65520
ifreload -a

Permanent Fix: Check systemd service:

systemctl status thunderbolt-interfaces.service

# Check if scripts are corrupted:
wc -l /usr/local/bin/pve-en*.sh

# Check for shebang errors:
head -1 /usr/local/bin/*.sh | grep -E 'thunderbolt|pve-en'

# Fix shebang if corrupted:
sed -i '1s/#\\!/#!/' /usr/local/bin/thunderbolt-startup.sh
sed -i '1s/#\\!/#!/' /usr/local/bin/pve-en05.sh
sed -i '1s/#\\!/#!/' /usr/local/bin/pve-en06.sh

Problem: Mesh connectivity fails between nodes

# Check interface status:
ip addr show | grep -E '(en05|en06|10\.100\.0\.)'

# Verify FRR routing service:
systemctl status frr

Ceph Issues

Problem: OSDs going down after creation

# Restart OSD services after fixing mesh:
systemctl restart ceph-osd@*.service

Problem: Ceph cluster shows OSDs down after reboot

# 1. Bring up TB4 interfaces:
/usr/local/bin/pve-en05.sh
/usr/local/bin/pve-en06.sh

# 2. Wait for interfaces to stabilize:
sleep 10

# 3. Restart Ceph OSDs:
systemctl restart ceph-osd@*.service

# 4. Monitor recovery:
watch ceph -s

Problem: Inactive PGs or slow performance

# Check cluster status:
ceph -s

# Verify optimizations are applied:
ceph config dump | grep -E '(memory_target|cache_size|compression)'

# Check network binding:
ceph config get osd cluster_network
ceph config get osd public_network

Changelog

January 27, 2025

  • Cleaned up commands for direct terminal execution (no SSH wrappers)
  • Fixed formatting and organization
  • Updated Ceph version references (Nautilus → Reef)
  • Clarified step-by-step execution flow
@jhhoffma3
Copy link

Excellent guide! Got me up and running where the other guides seemed to get stuck at some point. Thanks!

For others' reference....this is mainly the same as tas-labs guide (which is based on Scyto), but console-friendly instructions instead of SSH. Seemed to have less issues for me and my 3x MS-01 setup.

@scloder
Copy link

scloder commented Nov 23, 2025

What does your /etc/network/interfaces need to look like before the sdn config. Tried following forked gist but I must be missing something.

You mentioned getting rid of /30 networks so that has me wondering what you used instead of this

Important: Configure /30 point-to-point addresses on the en05 and en06 interfaces:

n2: en05: 10.100.0.1/30, en06: 10.100.0.5/30
n3: en05: 10.100.0.9/30, en06: 10.100.0.13/30
n4: en05: 10.100.0.17/30, en06: 10.100.0.21/30

did you just leave those unconfigured but selected?

@scloder
Copy link

scloder commented Dec 5, 2025

if you don't set the ips for en05/06 in the sdn ui and they aren't set anywhere else, what do they get set to? can you share your working /etc/network/interfaces.d/sdn
because when i repeat that, my en05/06 get the same ip as the iface dummy_tb4 inet static which breaks everything, i think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment