I've never fully understood how NAT traversal works and I went on a dive today.
Run whatismyip.py on a public server. This plays the role of a STUN server.
Run easynat.py on a laptop behind a home router.
You could run hardnat.py also on a laptop behind a home router, but don't do it on the same home router because
"hairpinning" is almost always broken.
You could also run it on a public server, it should work there too. But to really push it put it in a hostile environment,
like over phone data: run it in Termux, or hotspot a second laptop to your phone and run it there. Cellphone data is always carrier-grade NATed, sometimes multiple times over.
If this can can get you a p2p connection between the cellphone and the laptop you're doing pretty well.
This implements the birthday paradox port scan described by Tailscale. It spawns many sockets behind one NAT then tries to connect to many ports from the other without knowing what external ports the NAT assigned in hopes of stumbling across a port that was chosen. I found I had to bump from their recommended 2048 to 2457.6 to really make it reliable.
This code is not good. But I got a difficult technique working in a few hours thanks to python and I share in hopes it helps you on your journey to the mooooon.
It's more reliable to launch easynat before hardnat.
There's an asymmetry: easynat is meant for running on a "cone NAT", one where the inner port = the outer port. This isn't always available. It's apparently spec'd somewhere that UDP NATs should be "coned" but who knows if any particular router actually respects that. I wonder if the asymmetry is necessary. Maybe both sides should open many ports, and each port should scan many ports.
I also wonder if the middle server can help more; perhaps it can give start/stop commands to synchronize the sides better.
Refs: