This builds upon excellent foundational work by @scyto.
- Original TB4 research from @scyto: https://gist.github.com/scyto/76e94832927a89d977ea989da157e9dc
- My Original PVE 9 Writeup: https://gist.github.com/taslabs-net/9f6e06ab32833864678a4acbb6dc9131
Key contributions from @scyto's work:
- TB4 hardware detection and kernel module strategies
- Systemd networking and udev automation techniques
- MTU optimization and performance tuning approaches
Changelog - Fork Updates January 2025 - Fork by Yearly1825 Improvements and fixes based on PVE 9.0.10 testing:
- Fixed script error handling: Replaced problematic || syntax with proper if/then/else statements in interface bringup scripts (pve-en05.sh and pve-en06.sh) for more reliable error handling and retry logic
- Optimized Ceph networking: Changed configuration to use Thunderbolt network (10.100.0.0/24) for both public and cluster networks instead of split networks, resolving performance degradation issues discovered in production (ref: Proxmox forum thread #170091)
- Simplified network topology: Removed confusing and unnecessary /30 point-to-point subnet configuration - OpenFabric handles mesh routing automatically without manual subnet assignments
- Improved /etc/network/interfaces instructions: Changed from appending to file to properly inserting configuration above the source directive to prevent conflicts
- Enhanced script reliability: Added proper bash conditionals for more predictable behavior during interface initialization
This guide provides a step-by-step, tested setup for building a high-performance Thunderbolt 4 + Ceph cluster on Proxmox VE 9.
Lab Results:
- TB4 Mesh Performance: Sub-millisecond latency, 65520 MTU, full mesh connectivity
- Ceph Performance: 1,300+ MB/s write, 1,760+ MB/s read with optimizations
- Reliability: 0% packet loss, automatic failover, persistent configuration
- Integration: Full Proxmox GUI visibility and management
Hardware Environment:
- Nodes: 3x systems with dual TB4 ports (tested on MS01 mini-PCs)
- Memory: 64GB RAM per node (optimal for high-performance Ceph)
- CPU: 13th Gen Intel (or equivalent high-performance processors)
- Storage: NVMe drives for Ceph OSDs
- Network: TB4 mesh (10.100.0.0/24) + management (10.11.12.0/24)
Software Stack:
- Proxmox VE: 9.0 with native SDN OpenFabric support
- Ceph: Reef with BlueStore, LZ4 compression, 2:1 replication
- OpenFabric: IPv4-only mesh routing for simplicity and performance
- 3 nodes minimum: Each with dual TB4 ports (tested with MS01 mini-PCs)
- TB4 cables: Quality TB4 cables for mesh connectivity
- Ring topology: Physical connections n2→n3→n4→n2 (or similar mesh pattern)
- Management network: Standard Ethernet for initial setup and management
- Proxmox VE 9.0
- Root access to all nodes
- Basic Linux networking knowledge
- Patience: TB4 mesh setup requires careful attention to detail!
- Management network: 10.11.12.0/24 (adjust to your environment)
- TB4 cluster network: 10.100.0.0/24 (for Ceph cluster traffic)
- Router IDs: 10.100.0.12 (n2), 10.100.0.13 (n3), 10.100.0.14 (n4)
Critical: Perform these steps on ALL mesh nodes (n2, n3, n4).
Load TB4 kernel modules:
# Execute on each node:
echo 'thunderbolt' >> /etc/modules
echo 'thunderbolt-net' >> /etc/modules
modprobe thunderbolt && modprobe thunderbolt-netVerify modules loaded:
lsmod | grep thunderboltExpected output: Both thunderbolt and thunderbolt_net modules present.
Find TB4 controllers and interfaces:
lspci | grep -i thunderbolt
ip link show | grep -E '(en0[5-9]|thunderbolt)'Expected: TB4 PCI controllers detected, TB4 network interfaces visible.
Critical: Create interface renaming rules based on PCI paths for consistent naming.
# Create systemd link file for first TB4 interface:
cat > /etc/systemd/network/00-thunderbolt0.link << 'EOF'
[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
EOF
# Create systemd link file for second TB4 interface:
cat > /etc/systemd/network/00-thunderbolt1.link << 'EOF'
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06
EOFNote: Adjust PCI paths if different on your hardware (check with lspci | grep -i thunderbolt)
Add TB4 interfaces to network configuration with optimal settings:
vim /etc/network/interfaces Add the below to /etc/network/interfaces above this line source /etc/network/interfaces.d/*
auto en05
iface en05 inet manual
mtu 65520
auto en06
iface en06 inet manual
mtu 65520
Required for systemd link files to work:
systemctl enable systemd-networkd
systemctl start systemd-networkdAutomation for reliable interface bringup on cable insertion:
Create udev rules:
cat > /etc/udev/rules.d/10-tb-en.rules << 'EOF'
ACTION=="add|move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="add|move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
EOFCreate en05 bringup script:
cat > /usr/local/bin/pve-en05.sh << 'EOF'
#!/bin/bash
LOGFILE="/tmp/udev-debug.log"
echo "$(date): en05 bringup triggered" >> "$LOGFILE"
for i in {1..5}; do
if ip link set en05 up mtu 65520; then
echo "$(date): en05 up successful on attempt $i" >> "$LOGFILE"
break
else
echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
sleep 3
fi
done
EOF
chmod +x /usr/local/bin/pve-en05.shCreate en06 bringup script:
cat > /usr/local/bin/pve-en06.sh << 'EOF'
#!/bin/bash
LOGFILE="/tmp/udev-debug.log"
echo "$(date): en06 bringup triggered" >> "$LOGFILE"
for i in {1..5}; do
if ip link set en06 up mtu 65520; then
echo "$(date): en06 up successful on attempt $i" >> "$LOGFILE"
break
else
echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
sleep 3
fi
done
EOF
chmod +x /usr/local/bin/pve-en06.shApply all TB4 configuration changes:
# Update initramfs:
update-initramfs -u -k all
# Reboot to apply changes:
rebootAfter reboot, verify TB4 interfaces:
ip link show | grep -E '(en05|en06)'Expected result: TB4 interfaces should be named en05 and en06 with proper MTU settings.
Essential: TB4 mesh requires IPv4 forwarding for OpenFabric routing.
echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -pVerify forwarding enabled:
sysctl net.ipv4.ip_forwardExpected: net.ipv4.ip_forward = 1
Ensure TB4 interfaces come up automatically on boot:
Create systemd service:
cat > /etc/systemd/system/thunderbolt-interfaces.service << 'EOF'
[Unit]
Description=Configure Thunderbolt Network Interfaces
After=network.target thunderbolt.service
Wants=network.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/thunderbolt-startup.sh
[Install]
WantedBy=multi-user.target
EOFCreate startup script:
cat > /usr/local/bin/thunderbolt-startup.sh << 'EOF'
#!/bin/bash
# Thunderbolt interface startup script
LOGFILE="/var/log/thunderbolt-startup.log"
echo "$(date): Starting Thunderbolt interface configuration" >> "$LOGFILE"
# Wait up to 30 seconds for interfaces to appear
for i in {1..30}; do
if ip link show en05 &>/dev/null && ip link show en06 &>/dev/null; then
echo "$(date): Thunderbolt interfaces found" >> "$LOGFILE"
break
fi
echo "$(date): Waiting for Thunderbolt interfaces... ($i/30)" >> "$LOGFILE"
sleep 1
done
# Configure interfaces if they exist
if ip link show en05 &>/dev/null; then
/usr/local/bin/pve-en05.sh
echo "$(date): en05 configured" >> "$LOGFILE"
fi
if ip link show en06 &>/dev/null; then
/usr/local/bin/pve-en06.sh
echo "$(date): en06 configured" >> "$LOGFILE"
fi
echo "$(date): Thunderbolt configuration completed" >> "$LOGFILE"
EOF
chmod +x /usr/local/bin/thunderbolt-startup.shEnable the service:
systemctl daemon-reload
systemctl enable thunderbolt-interfaces.serviceLocation: Datacenter → SDN → Fabrics
- Click: "Add Fabric" → "OpenFabric"
- Configure in the dialog:
- Name:
tb4 - IPv4 Prefix:
10.100.0.0/24 - IPv6 Prefix: (leave empty for IPv4-only)
- Hello Interval:
3(default) - CSNP Interval:
10(default)
- Name:
- Click: "OK"
Expected result: You should see a fabric named tb4 with Protocol OpenFabric and IPv4 10.100.0.0/24
Still in: Datacenter → SDN → Fabrics → (select tb4 fabric)
- Click: "Add Node"
- Configure for n2:
- Node:
n2 - IPv4:
10.100.0.12 - IPv6: (leave empty)
- Interfaces: Select
en05anden06from the interface list
- Node:
- Click: "OK"
- Repeat for n3: IPv4:
10.100.0.13, interfaces:en05, en06 - Repeat for n4: IPv4:
10.100.0.14, interfaces:en05, en06
Important: Configure /30 point-to-point addresses on the en05 and en06 interfaces:
n2: en05:10.100.0.1/30, en06:10.100.0.5/30n3: en05:10.100.0.9/30, en06:10.100.0.13/30n4: en05:10.100.0.17/30, en06:10.100.0.21/30
Critical: This activates the mesh - nothing works until you apply!
In GUI: Datacenter → SDN → "Apply" (button in top toolbar)
Expected result: Status table shows all nodes with "OK" status
Critical: OpenFabric routing requires FRR (Free Range Routing) to be running.
systemctl start frr
systemctl enable frrVerify FRR is running:
systemctl status frr | grep ActiveExpected output: Active: active (running)
Check TB4 interfaces are up with correct settings:
ip addr show | grep -E '(en05|en06|10\.100\.0\.)'Critical test: Verify full mesh communication works.
# Test router ID connectivity from current node:
ping -c 3 10.100.0.12
ping -c 3 10.100.0.13
ping -c 3 10.100.0.14Expected: All pings succeed with sub-millisecond latency (~0.6ms)
If connectivity fails: TB4 interfaces may need manual bring-up:
ip link set en05 up mtu 65520
ip link set en06 up mtu 65520
ifreload -aInstall Ceph packages:
pveceph install --repository no-subscriptionEssential: Proper directory structure and ownership:
mkdir -p /var/lib/ceph && chown ceph:ceph /var/lib/ceph
mkdir -p /etc/ceph && chown ceph:ceph /etc/cephOn the first node (n2) only:
pveceph mon createVerify monitor creation:
ceph -sSet public and cluster networks for optimal TB4 performance:
Changed to use the thunderbolt network for everything
ceph config set global public_network 10.100.0.0/24
ceph config set global cluster_network 10.100.0.0/24
ceph config set mon public_network 10.100.0.0/24
ceph config set mon cluster_network 10.100.0.0/24Original Used a different public/cluster network which led to degraded performance (https://forum.proxmox.com/threads/low-ceph-performance-on-3-node-proxmox-9-cluster-with-sata-ssds.170091/)
ceph config set global public_network 10.11.12.0/24- ceph config set global cluster_network 10.100.0.0/24
ceph config set mon public_network 10.11.12.0/24- ceph config set mon cluster_network 10.100.0.0/24
On n3 and n4 nodes:
pveceph mon createVerify 3-monitor quorum:
ceph quorum_statusCreate OSDs on NVMe drives (adjust device names as needed):
# Create two OSDs per node:
pveceph osd create /dev/nvme0n1
pveceph osd create /dev/nvme1n1Verify all OSDs are up:
ceph osd tree# Set OSD memory target to 8GB per OSD:
ceph config set osd osd_memory_target 8589934592
# Set BlueStore cache sizes for NVMe performance:
ceph config set osd bluestore_cache_size_ssd 4294967296
# Set memory allocation optimizations:
ceph config set osd osd_memory_cache_min 1073741824
ceph config set osd osd_memory_cache_resize_interval 1# Set CPU threading optimizations:
ceph config set osd osd_op_num_threads_per_shard 2
ceph config set osd osd_op_num_shards 8
# Set BlueStore threading for NVMe:
ceph config set osd bluestore_sync_submit_transaction false
ceph config set osd bluestore_throttle_bytes 268435456
ceph config set osd bluestore_throttle_deferred_bytes 134217728
# Set CPU-specific optimizations:
ceph config set osd osd_client_message_cap 1000
ceph config set osd osd_client_message_size_cap 1073741824# Set network optimizations for TB4 mesh:
ceph config set global ms_tcp_nodelay true
ceph config set global ms_tcp_rcvbuf 134217728
ceph config set global ms_tcp_prefetch_max_size 65536
# Set cluster network optimizations:
ceph config set global ms_cluster_mode crc
ceph config set global ms_async_op_threads 8
ceph config set global ms_dispatch_throttle_bytes 1073741824
# Set heartbeat optimizations:
ceph config set osd osd_heartbeat_interval 6
ceph config set osd osd_heartbeat_grace 20# Set BlueStore optimizations for NVMe drives:
ceph config set osd bluestore_compression_algorithm lz4
ceph config set osd bluestore_compression_mode aggressive
ceph config set osd bluestore_compression_required_ratio 0.7
# Set NVMe-specific optimizations:
ceph config set osd bluestore_cache_trim_interval 200
# Set WAL and DB optimizations:
ceph config set osd bluestore_block_db_size 5368709120
ceph config set osd bluestore_block_wal_size 1073741824# Set scrubbing optimizations:
ceph config set osd osd_scrub_during_recovery false
ceph config set osd osd_scrub_begin_hour 2
ceph config set osd osd_scrub_end_hour 6
# Set deep scrub optimizations:
ceph config set osd osd_deep_scrub_interval 1209600
ceph config set osd osd_scrub_max_interval 1209600
ceph config set osd osd_scrub_min_interval 86400
# Set recovery optimizations:
ceph config set osd osd_recovery_max_active 8
ceph config set osd osd_max_backfills 4
ceph config set osd osd_recovery_op_priority 1# Create pool with optimal PG count for 6 OSDs:
ceph osd pool create cephtb4 256 256
# Set 2:1 replication ratio:
ceph osd pool set cephtb4 size 2
ceph osd pool set cephtb4 min_size 1
# Enable RBD application:
ceph osd pool application enable cephtb4 rbdceph -sExpected results:
- Health: HEALTH_OK
- OSDs: 6 osds: 6 up, 6 in
- PGs: All PGs active+clean
# Test write performance:
rados -p cephtb4 bench 10 write --no-cleanup -b 4M -t 16
# Test read performance:
rados -p cephtb4 bench 10 rand -t 16
# Clean up test data:
rados -p cephtb4 cleanupExpected Results:
- Write Performance: ~1,300 MB/s average, 2,000+ MB/s peak
- Read Performance: ~1,760 MB/s average, 2,400+ MB/s peak
# Network tuning:
echo 'net.core.rmem_max = 268435456' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 268435456' >> /etc/sysctl.conf
echo 'net.core.netdev_max_backlog = 30000' >> /etc/sysctl.conf
# Memory tuning:
echo 'vm.swappiness = 1' >> /etc/sysctl.conf
echo 'vm.min_free_kbytes = 4194304' >> /etc/sysctl.conf
# Apply settings:
sysctl -pProblem: TB4 interfaces not coming up after reboot
Quick Fix: Manually bring up interfaces:
ip link set en05 up mtu 65520
ip link set en06 up mtu 65520
ifreload -aPermanent Fix: Check systemd service:
systemctl status thunderbolt-interfaces.service
# Check if scripts are corrupted:
wc -l /usr/local/bin/pve-en*.sh
# Check for shebang errors:
head -1 /usr/local/bin/*.sh | grep -E 'thunderbolt|pve-en'
# Fix shebang if corrupted:
sed -i '1s/#\\!/#!/' /usr/local/bin/thunderbolt-startup.sh
sed -i '1s/#\\!/#!/' /usr/local/bin/pve-en05.sh
sed -i '1s/#\\!/#!/' /usr/local/bin/pve-en06.shProblem: Mesh connectivity fails between nodes
# Check interface status:
ip addr show | grep -E '(en05|en06|10\.100\.0\.)'
# Verify FRR routing service:
systemctl status frrProblem: OSDs going down after creation
# Restart OSD services after fixing mesh:
systemctl restart ceph-osd@*.serviceProblem: Ceph cluster shows OSDs down after reboot
# 1. Bring up TB4 interfaces:
/usr/local/bin/pve-en05.sh
/usr/local/bin/pve-en06.sh
# 2. Wait for interfaces to stabilize:
sleep 10
# 3. Restart Ceph OSDs:
systemctl restart ceph-osd@*.service
# 4. Monitor recovery:
watch ceph -sProblem: Inactive PGs or slow performance
# Check cluster status:
ceph -s
# Verify optimizations are applied:
ceph config dump | grep -E '(memory_target|cache_size|compression)'
# Check network binding:
ceph config get osd cluster_network
ceph config get osd public_network- Cleaned up commands for direct terminal execution (no SSH wrappers)
- Fixed formatting and organization
- Updated Ceph version references (Nautilus → Reef)
- Clarified step-by-step execution flow
Excellent guide! Got me up and running where the other guides seemed to get stuck at some point. Thanks!
For others' reference....this is mainly the same as tas-labs guide (which is based on Scyto), but console-friendly instructions instead of SSH. Seemed to have less issues for me and my 3x MS-01 setup.