If you're running a Kubernetes cluster and pulling Docker images from AWS Elastic Container Registry (ECR), you've likely encountered this frustrating problem: ECR authentication tokens expire every 12 hours.
This means your pods can't pull images after the token expires, leading to failed deployments and restarts. While this is a security feature by design, it creates an operational headache for teams running production Kubernetes clusters.
Several approaches exist to solve this problem:
- Manual token refresh — Not practical for production environments
- AWS EKS with IRSA — Great if you're on EKS, but what about self-hosted clusters?
- Third-party tools — Adds another dependency to manage
- Custom operators — Overkill for a relatively simple problem
For my self-hosted Kubernetes cluster (running on Hetzner Cloud with Talos Linux), I needed a simple, reliable solution that would work outside of AWS infrastructure.
I built an automated ECR credential refresh system using Terraform that:
- ✅ Runs every 6 hours (well before the 12-hour expiration)
- ✅ Updates credentials across all namespaces automatically
- ✅ Requires minimal resources (50m CPU, 64Mi RAM)
- ✅ Is fully declarative and version-controlled with Terraform
- ✅ Works with any Kubernetes cluster (not just EKS)
The solution consists of several components:
- IAM User — Dedicated AWS user for ECR read-only access
- Kubernetes Namespace — Isolated namespace for the credential updater
- Service Account + RBAC — Cluster-wide permissions to update secrets
- CronJob — Scheduled task that refreshes credentials
- AWS Credentials Secret — Securely stored AWS access keys
Here's how they work together:
┌─────────────────┐
│ CronJob │ (Runs every 6 hours)
│ (alpine/k8s) │
└────────┬────────┘
│
├─> Fetches ECR token from AWS
│
├─> Discovers all namespaces
│
└─> Creates/Updates docker-registry secret
in each namespace
First, create a dedicated IAM user with read-only ECR permissions:
resource "aws_iam_user" "ecr_k8s_user" {
name = "${var.APP_NAME}-ecr-k8s-user"
path = "/system/"
}
resource "aws_iam_user_policy" "ecr_k8s_policy" {
name = "${var.APP_NAME}-ecr-readonly"
user = aws_iam_user.ecr_k8s_user.name
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:DescribeRepositories",
"ecr:DescribeImages",
"ecr:ListImages"
]
Resource = "*"
}
]
})
}
resource "aws_iam_access_key" "ecr_k8s_key" {
user = aws_iam_user.ecr_k8s_user.name
}Security Note: These are read-only permissions. The IAM user can pull images but cannot push or modify your ECR repositories.
Create a dedicated namespace to isolate the credential updater:
resource "kubernetes_namespace" "ecr_updater" {
metadata {
name = "ecr-updater"
labels = {
name = "ecr-updater"
}
}
}
resource "kubernetes_secret" "aws_credentials" {
metadata {
name = "aws-ecr-credentials"
namespace = kubernetes_namespace.ecr_updater.metadata[0].name
}
data = {
AWS_ACCESS_KEY_ID = aws_iam_access_key.ecr_k8s_key.id
AWS_SECRET_ACCESS_KEY = aws_iam_access_key.ecr_k8s_key.secret
AWS_REGION = var.AWS_REGION
AWS_ACCOUNT_ID = data.aws_caller_identity.current.account_id
}
type = "Opaque"
}The CronJob needs cluster-wide permissions to update secrets in all namespaces:
resource "kubernetes_service_account" "ecr_updater" {
metadata {
name = "ecr-credential-updater"
namespace = kubernetes_namespace.ecr_updater.metadata[0].name
}
}
resource "kubernetes_cluster_role" "ecr_updater" {
metadata {
name = "ecr-credential-updater"
}
rule {
api_groups = [""]
resources = ["secrets"]
verbs = ["get", "create", "patch", "update"]
}
rule {
api_groups = [""]
resources = ["namespaces"]
verbs = ["get", "list"]
}
}
resource "kubernetes_cluster_role_binding" "ecr_updater" {
metadata {
name = "ecr-credential-updater"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = kubernetes_cluster_role.ecr_updater.metadata[0].name
}
subject {
kind = "ServiceAccount"
name = kubernetes_service_account.ecr_updater.metadata[0].name
namespace = kubernetes_namespace.ecr_updater.metadata[0].name
}
}The heart of the solution is a CronJob that runs every 6 hours:
resource "kubernetes_cron_job_v1" "ecr_credential_refresh" {
metadata {
name = "ecr-credential-refresh"
namespace = kubernetes_namespace.ecr_updater.metadata[0].name
}
spec {
schedule = "0 */6 * * *" # Every 6 hours
successful_jobs_history_limit = 3
failed_jobs_history_limit = 3
job_template {
metadata {
name = "ecr-credential-refresh"
}
spec {
template {
metadata {
labels = {
app = "ecr-credential-refresh"
}
}
spec {
service_account_name = kubernetes_service_account.ecr_updater.metadata[0].name
restart_policy = "OnFailure"
container {
name = "ecr-credential-updater"
image = "alpine/k8s:1.30.7"
command = ["/bin/sh", "-c"]
args = [
<<-EOT
#!/bin/sh
set -e
# Install AWS CLI
echo "Installing AWS CLI..."
apk add --no-cache aws-cli
echo "Fetching ECR authorization token..."
TOKEN=$(aws ecr get-login-password --region $AWS_REGION)
echo "Creating Docker config JSON..."
DOCKER_CONFIG=$(echo -n "{\"auths\":{\"$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com\":{\"username\":\"AWS\",\"password\":\"$TOKEN\"}}}" | base64 -w 0)
# Get all namespaces and update secrets in each
echo "Discovering all namespaces..."
NAMESPACES=$(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}')
echo "Found namespaces: $NAMESPACES"
for NAMESPACE in $NAMESPACES; do
echo "Updating secret in namespace: $NAMESPACE"
# Create or update the secret
kubectl create secret docker-registry ecr-registry-credentials \
--docker-server=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com \
--docker-username=AWS \
--docker-password=$TOKEN \
--namespace=$NAMESPACE \
--dry-run=client -o yaml | kubectl apply -f -
echo "Secret updated successfully in $NAMESPACE"
done
echo "ECR credentials refresh completed successfully!"
EOT
]
env {
name = "AWS_ACCESS_KEY_ID"
value_from {
secret_key_ref {
name = kubernetes_secret.aws_credentials.metadata[0].name
key = "AWS_ACCESS_KEY_ID"
}
}
}
env {
name = "AWS_SECRET_ACCESS_KEY"
value_from {
secret_key_ref {
name = kubernetes_secret.aws_credentials.metadata[0].name
key = "AWS_SECRET_ACCESS_KEY"
}
}
}
env {
name = "AWS_REGION"
value_from {
secret_key_ref {
name = kubernetes_secret.aws_credentials.metadata[0].name
key = "AWS_REGION"
}
}
}
env {
name = "AWS_ACCOUNT_ID"
value_from {
secret_key_ref {
name = kubernetes_secret.aws_credentials.metadata[0].name
key = "AWS_ACCOUNT_ID"
}
}
}
resources {
limits = {
cpu = "100m"
memory = "128Mi"
}
requests = {
cpu = "50m"
memory = "64Mi"
}
}
}
}
}
}
}
}
}The shell script in the CronJob performs these steps:
- Installs AWS CLI in the Alpine container
- Fetches a fresh ECR token using AWS credentials
- Discovers all namespaces in the cluster
- Creates or updates a
docker-registrysecret namedecr-registry-credentialsin each namespace - Logs progress for debugging and monitoring
The key insight is using kubectl apply with --dry-run=client -o yaml, which allows us to create the secret if it doesn't exist or update it if it does — all in one command.
Once deployed, reference the secret in your pod specifications:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: production
spec:
template:
spec:
imagePullSecrets:
- name: ecr-registry-credentials # This is automatically created/updated
containers:
- name: my-app
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latestEverything is defined in Terraform, making it reproducible and version-controlled.
Not tied to EKS or any specific cloud provider. Works with self-hosted clusters on Hetzner, DigitalOcean, bare metal, etc.
The CronJob uses only 50m CPU and 64Mi RAM — negligible overhead.
Once deployed, it runs automatically every 6 hours. No manual intervention needed.
Automatically discovers and updates credentials in all namespaces, including new ones.
Logs are straightforward to read. Check job history with:
kubectl get jobs -n ecr-updater
kubectl logs -n ecr-updater job/ecr-credential-refresh-xxxxxkubectl get cronjob -n ecr-updaterkubectl get jobs -n ecr-updater# Get the latest job
kubectl get jobs -n ecr-updater --sort-by=.metadata.creationTimestamp
# View logs
kubectl logs -n ecr-updater job/ecr-credential-refresh-12345678# Check if secrets exist in all namespaces
kubectl get secrets --all-namespaces | grep ecr-registry-credentialskubectl create job -n ecr-updater manual-refresh --from=cronjob/ecr-credential-refresh-
IAM Permissions: The IAM user has read-only access to ECR. It cannot push images or modify repositories.
-
Kubernetes RBAC: The ServiceAccount can only manage secrets and list namespaces. It has no other cluster permissions.
-
Secret Management: AWS credentials are stored as Kubernetes secrets. Consider using external secret management (like AWS Secrets Manager with External Secrets Operator) for enhanced security.
-
Network Policies: Consider adding network policies to restrict the CronJob's network access to only AWS ECR endpoints.
Modify the cron schedule to run more or less frequently:
schedule = "0 */4 * * *" # Every 4 hours
schedule = "0 */12 * * *" # Every 12 hours (risky - token expires every 12h)Modify the shell script to only update specific namespaces:
NAMESPACES="production staging development"Change ecr-registry-credentials to match your existing deployments:
kubectl create secret docker-registry my-custom-secret-name \
# ... rest of commandIf you want to populate secrets immediately upon deployment (not waiting for the first CronJob run), add this Kubernetes Job:
resource "kubernetes_job_v1" "ecr_credential_initial" {
metadata {
name = "ecr-credential-initial"
namespace = kubernetes_namespace.ecr_updater.metadata[0].name
}
spec {
template {
metadata {}
spec {
# ... (same spec as the CronJob)
}
}
}
wait_for_completion = true
timeouts {
create = "5m"
update = "5m"
}
}The script iterates through all namespaces, which scales linearly. In my testing:
- 10 namespaces: ~15 seconds
- 50 namespaces: ~1 minute
- 100 namespaces: ~2 minutes
For clusters with hundreds of namespaces, consider:
- Filtering to only production namespaces
- Running parallel updates
- Increasing resource limits
This Terraform-based ECR credential refresh solution has been running in my production Kubernetes cluster for months without issues. It's simple, reliable, and works across any Kubernetes distribution.
The key advantages are:
- Fully declarative infrastructure as code
- No vendor lock-in (works outside EKS)
- Minimal resource footprint
- Automatic multi-namespace support
- Easy to customize and debug
If you're running a self-hosted Kubernetes cluster and pulling from ECR, this approach can save you from expired credential headaches.
The complete Terraform code is available in my infrastructure repository. Simply copy the IAM, RBAC, and CronJob resources into your existing Terraform configuration.
Key variables needed:
var.APP_NAME— Your application prefixvar.AWS_REGION— Your AWS regionvar.HCLOUD_TOKEN— Only if using Hetzner Clouddata.aws_caller_identity.current.account_id— Your AWS account ID
Have you solved this problem differently? Found ways to improve this solution? I'd love to hear your thoughts in the comments!
About the Author: Building production infrastructure for cryptocurrency data feeds at scale. Passionate about Kubernetes, Terraform, and making DevOps simple and reliable.
Tags: #Kubernetes #AWS #ECR #Terraform #DevOps #InfrastructureAsCode #CloudNative #Docker