The user encountered an issue where they could no longer connect to their Talos cluster due to expired certificates. This manifested in two ways:
-
Unable to use
kubectl:kubectl get nodes E1006 19:24:27.025594 1695958 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: the server has asked for the client to provide credentials" error: You must be logged in to the server (the server has asked for the client to provide credentials) -
Unable to use
talosctl:talosctl kubeconfig -n ... -e ... --talosconfig ./talosconfig error copying: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: remote error: tls: bad certificate"
The root cause was the expiration of both Kubernetes client certificates and Talos API client certificates. In Talos, these certificates are typically valid for one year by default.
To resolve this issue, follow these steps:
-
Extract the CA certificate and key from the control plane configuration:
yq -r .machine.ca.crt controlplane.yaml | base64 -d > ca.crt yq -r .machine.ca.key controlplane.yaml | base64 -d > ca.key
-
Generate fresh credentials:
talosctl gen key --name admin talosctl gen csr --key admin.key --ip 127.0.0.1 talosctl gen crt --ca ca --csr admin.csr --name admin
-
Update the
talosconfigfile with the new values:# Generate base64 encoded values base64 -w0 ca.crt > ca.crt.b64 base64 -w0 admin.crt > admin.crt.b64 base64 -w0 admin.key > admin.key.b64 # Update talosconfig (use a text editor like vim) vim talosconfig
-
Refresh the Kubernetes configuration:
talosctl kubeconfig -n <node-ip> -e <node-ip> --talosconfig ./talosconfig
To prevent this issue in the future:
- Set up a reminder to renew certificates before they expire.
- Consider implementing automated certificate rotation if supported by your Talos version.
- Keep your Talos version updated, as newer versions may have improved certificate management features.
- The certificates for the Talos API are typically stored in the node's machine configuration.
- The CA certificate usually has a longer validity period (e.g., 10 years) compared to client certificates.
- There is currently no
-k/--insecureflag fortalosctlto bypass certificate verification, which can make recovery more challenging.
If you encounter issues:
-
Verify the expiration dates of your certificates:
openssl x509 -in <certificate-file> -text -noout
-
Check the Talos API server certificate:
openssl s_client -connect <node-ip>:50000 -showcerts
-
"oneliner" to update
yq -r .machine.ca.crt controlplane.yaml | base64 -d > ca.crt && \ yq -r .machine.ca.key controlplane.yaml | base64 -d > ca.key && \ talosctl gen key --name admin && \ talosctl gen csr --key admin.key --ip 127.0.0.1 && \ talosctl gen crt --ca ca.crt --csr admin.csr --name admin --days 8760 && \ yq eval '.contexts.home.ca = "'"$(base64 -w0 ca.crt)"'" | .contexts.home.crt = "'"$(base64 -w0 admin.crt)"'" | .contexts.home.key = "'"$(base64 -w0 admin.key)"'"' -i talosconfig