Hardware used: 3× Minisforum MS-01
With the release of Proxmox 9 and its newer kernel, Thunderbolt interfaces on the MS-01 come up automatically out of the box, which makes setup much easier.
However, I found an issue in a full mesh topology, for example:
- PVE1 connects to PVE2 and PVE3
- PVE2 connects to PVE1 and PVE3
- PVE3 connects to PVE1 and PVE2
When I restart PVE2, the Thunderbolt interfaces on that node do not come up automatically. Even running ifup thunderbolt# locally doesn’t restore them. The only way to bring the links back was to run ifup thunderbolt# on one of the remote hosts.
To work around this, I created a small script and a systemd service. They monitor for when the Thunderbolt device is reconnected and automatically bring up the interface once it’s UP and stable.
- All hypervisor nodes are running a fresh install of Proxmox 9.
- The hypervisors are connected in a full mesh topology.
- Thunderbolt connections are in place and working.
- You’ve followed Full Mesh Network for Ceph Server – Using SDN Fabrics.
I use this approach because it’s supported in the Proxmox UI, it’s simple to set up, and it provides redundancy with a full mesh.
- Edit
/etc/network/interfacesand configure the Thunderbolt interfaces with hotplug support and jumbo frames (recommended for Ceph performance):allow-hotplug thunderbolt0 iface thunderbolt0 inet manual mtu 9000 allow-hotplug thunderbolt1 iface thunderbolt1 inet manual mtu 9000
2. Save the systemctl service file tbnet-bringup@.service to /etc/systemd/system/tbnet-bringup@.service
3. Save the tbnet-bringup.sh script to /usr/local/sbin/tbnet-bringup.sh
4. Save the 99-thunderbolt-net.rules to /etc/udev/rules.d/99-thunderbolt-net.rules
5. Make the script exacutable. `chmod +x /usr/local/sbin/tbnet-bringup.sh`
6. Reload udevadm and the systemctl daemon
```bash
udevadm control --reload-rules
udevadm trigger -s net
systemctl daemon-reload
- Repeat for each host in the cluster.
- I would suggest to open two ssh connections to a neighbouring host (lets say PVE1) and run:
watch -c -n 1 "ip -br -c a | grep thunderin one window to monitor when the interface is restored.journalctl -fin a second window for log monitoring
- Then reboot a neighbour node e.g. PVE2
- The script will log journalctl while as soon as it see the device connection restored.
15:02:59 pve1 kernel: thunderbolt 0-0:1.1: retimer disconnected
15:03:00 pve1 kernel: thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
15:03:05 pve1 kernel: thunderbolt 0-1: new host found, vendor=0x8086 device=0x1
15:03:05 pve1 kernel: thunderbolt 0-1: Intel Corp. (none)15:03:07 pve1 tbnet-bringup[127399]: [thunderbolt1] Attempt 1/10: state=down carrier=0; retry in 5s
15:03:12 pve1 tbnet-bringup[127429]: [thunderbolt1] Attempt 2/10: state=down carrier=0; retry in 5s
15:03:12 pve1 fabricd[1253]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP!
15:03:13 pve1 fabricd[1253]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP!
15:03:17 pve1 tbnet-bringup[127494]: [thunderbolt1] UP detected (attempt 3); verifying stable for 6s…
15:03:27 pve1 fabricd[1253]: [GNY7F-C4R79] ISIS-Adj (ceph): Rcvd P2P IIH from (thunderbolt1) with invalid pdu length 8997
15:03:27 pve1 tbnet-bringup[127586]: [thunderbolt1] Stable UP for 10s. Applying offload tweaks and ifup.
15:03:27 pve1 systemd[1]: tbnet-bringup@thunderbolt1.service: Deactivated successfully.
15:03:27 pve1 systemd[1]: Finished tbnet-bringup@thunderbolt1.service - Thunderbolt net bring-up for thunderbolt1 (with retries).
After you have tested and validated the networks are restored. Your ready to proceed with the CEPH installation and use the Fabric subnet as your CEPH network.