When a Copilot coding agent opens a PR in any org repo and the Docker image build succeeds, automatically deploy that image to the repo's designated testing cluster by generating a service orders folder in the corresponding deployment repo. When the PR is closed or merged, automatically clean up the deployment.
Key constraints:
- Only testing clusters — never production
- Org-level setup, minimal per-repo configuration
- No personal access tokens — GitHub Apps only
- Source repos never hold deployment write credentials
┌─────────────────────────────────────────────────────────────────────┐
│ SOURCE REPO │
│ (e.g. glg/apollo-admin) │
│ │
│ .deploy.yml Existing CI Workflow │
│ ┌────────────┐ ┌─────────────────────────────────────────┐ │
│ │ cluster: │ │ 1. PR opened by Copilot agent │ │
│ │ i22 │ │ 2. Build Docker image → pr-42-abc1234 │ │
│ │ service: │ │ 3. Push to registry │ │
│ │ apollo- │ │ 4. Generate glg-deploy-dispatcher token │ │
│ │ admin │ │ 5. repository_dispatch → deploy-auto │ │
│ └────────────┘ │ payload: {repo, pr#, tag, sha} │ │
│ └────────────────────┬────────────────────┘ │
│ │ │
│ Secrets available: │ │
│ DISPATCHER_APP_ID (org secret) │ │
│ DISPATCHER_PRIVATE_KEY (org secret) │ │
└────────────────────────────────────────────┼────────────────────────┘
│ repository_dispatch
▼
┌─────────────────────────────────────────────────────────────────────┐
│ glg/deploy-automation │
│ (central orchestration repo) │
│ │
│ Workflow: on repository_dispatch │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ VALIDATION PHASE │ │
│ │ a. Fetch PR from source repo API → verify user.type == Bot │ │
│ │ b. Verify pr_author in strict actor allowlist │ │
│ │ c. Fetch .deploy.yml from DEFAULT BRANCH of source repo │ │
│ │ d. Fetch clusters.yml from glg/deploy-config default branch │ │
│ │ e. Verify cluster is in allowed_clusters list │ │
│ │ f. Resolve cluster → deployment_repo from cluster_repos map │ │
│ │ g. Validate image_tag matches ^pr-\d+-[a-f0-9]{7,40}$ │ │
│ │ h. Check active PR deployment count < threshold (e.g. 3) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ DEPLOYMENT PHASE │ │
│ │ i. Generate glg-deploy-bot token (contents:write on │ │
│ │ deployment repos) │ │
│ │ j. Clone deployment repo │ │
│ │ k. Generate orders folder for {service}-pr-{number} │ │
│ │ (copy + modify existing service, or from template) │ │
│ │ l. Commit to main with message: │ │
│ │ "deploy: {service} pr-{number} from {source_repo}#{pr}" │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ Secrets (repo-level only): │
│ DEPLOY_BOT_APP_ID │
│ DEPLOY_BOT_PRIVATE_KEY │
│ │
│ Also has: │
│ Scheduled GC workflow (cron) │
│ Cleanup handler (on cleanup-pr dispatch) │
└─────────────────────────────┬───────────────────────────────────────┘
│ git push (via deploy-bot token)
▼
┌─────────────────────────────────────────────────────────────────────┐
│ DEPLOYMENT REPO │
│ (e.g. glg/gds.clusterconfig.i22) │
│ │
│ services/ │
│ apollo-admin/ ← existing production-like deploy │
│ orders │
│ ... │
│ apollo-admin-pr-42/ ← created by automation │
│ orders ← dockerdeploy .../apollo-admin/ │
│ ... pr-42-abc1234 │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ glg/deploy-config │
│ (locked-down config repo) │
│ │
│ clusters.yml │
│ ┌──────────────────────────────────────────┐ │
│ │ allowed_clusters: │ │
│ │ - i22 │ │
│ │ - i25 │ │
│ │ │ │
│ │ # Repo is derived from cluster ID: │ │
│ │ # glg/gds.clusterconfig.{cluster_id} │ │
│ └──────────────────────────────────────────┘ │
│ │
│ actor_allowlist.yml │
│ ┌──────────────────────────────────────────┐ │
│ │ allowed_actors: │ │
│ │ - copilot-swe-agent[bot] │ │
│ │ - github-actions[bot] │ │
│ └──────────────────────────────────────────┘ │
│ │
│ Branch protection: require 2 reviewers │
│ CODEOWNERS: @glg/platform-team │
└─────────────────────────────────────────────────────────────────────┘
Two apps provide clean separation of privileges:
glg-deploy-dispatcher |
glg-deploy-bot |
|
|---|---|---|
| Purpose | Source repos dispatch events to deploy-automation |
deploy-automation writes to deployment repos |
| Permissions | contents: read, metadata: read |
contents: write, metadata: read |
| Installed on | All source repos + deploy-automation + deploy-config |
gds.clusterconfig.* deployment repos only |
| Secrets stored in | Org secrets, scoped to source repos only | Repo secrets on deploy-automation only |
| Blast radius if compromised | Can read source code and dispatch events. Cannot write to any repo. | Can write to gds.clusterconfig.* deployment repos. But key is only in deploy-automation, not exposed to source repos. |
A dedicated repo with strict access controls, owned by the platform/security team.
clusters.yml — allowlist of testing clusters:
allowed_clusters:
- i22
- i25
- i30
# Deployment repo is derived from cluster ID: glg/gds.clusterconfig.{cluster_id}
# No explicit mapping needed — the naming convention is enforced by the workflow.actor_allowlist.yml — strict list of bot actors allowed to trigger deployments:
allowed_actors:
- copilot-swe-agent[bot]
- github-actions[bot]Access controls:
- Branch protection on
main, require 2 reviewers - CODEOWNERS:
@glg/platform-team - No direct pushes
Lives in each source repo's root on the default branch. The workflow always reads this from the default branch, never the PR branch.
cluster: i22
service_path: services/apollo-adminNote: There is no deployment_repo field. The cluster ID is used to derive the deployment repo name via the convention glg/gds.clusterconfig.{cluster_id}. The cluster ID is validated against the allowlist in clusters.yml. This prevents a malicious .deploy.yml from targeting production clusters or arbitrary repos.
Contains all deployment logic:
deploy-prdispatch handler workflowcleanup-prdispatch handler workflow- Scheduled garbage collection workflow
Each source repo adds two small jobs to their existing Docker build workflow. This is the only per-repo setup required beyond .deploy.yml:
# Added to the existing docker-build.yml workflow
on:
pull_request:
types: [opened, synchronize, reopened, closed]
jobs:
build:
if: github.event.action != 'closed'
runs-on: ubuntu-latest
outputs:
image_tag: ${{ steps.tag.outputs.image_tag }}
steps:
# ... existing Docker build steps ...
- name: Set image tag
id: tag
run: |
SHORT_SHA=$(echo "${{ github.sha }}" | cut -c1-7)
echo "image_tag=pr-${{ github.event.pull_request.number }}-${SHORT_SHA}" >> "$GITHUB_OUTPUT"
# ... push to registry ...
trigger-deploy:
needs: build
if: |
github.event.action != 'closed'
&& github.event.pull_request.user.type == 'Bot'
runs-on: ubuntu-latest
steps:
- name: Generate dispatcher token
id: app-token
uses: actions/create-github-app-token@v1 # pin to SHA in practice
with:
app-id: ${{ secrets.DISPATCHER_APP_ID }}
private-key: ${{ secrets.DISPATCHER_PRIVATE_KEY }}
owner: glg
repositories: deploy-automation
- name: Trigger deployment
uses: peter-evans/repository-dispatch@v3 # pin to SHA in practice
with:
token: ${{ steps.app-token.outputs.token }}
repository: glg/deploy-automation
event-type: deploy-pr
client-payload: >-
{
"source_repo": "${{ github.repository }}",
"pr_number": ${{ github.event.pull_request.number }},
"pr_author": "${{ github.event.pull_request.user.login }}",
"image_tag": "${{ needs.build.outputs.image_tag }}",
"sha": "${{ github.sha }}",
"default_branch": "${{ github.event.repository.default_branch }}"
}
trigger-cleanup:
if: |
github.event.action == 'closed'
&& github.event.pull_request.user.type == 'Bot'
runs-on: ubuntu-latest
steps:
- name: Generate dispatcher token
id: app-token
uses: actions/create-github-app-token@v1
with:
app-id: ${{ secrets.DISPATCHER_APP_ID }}
private-key: ${{ secrets.DISPATCHER_PRIVATE_KEY }}
owner: glg
repositories: deploy-automation
- name: Trigger cleanup
uses: peter-evans/repository-dispatch@v3
with:
token: ${{ steps.app-token.outputs.token }}
repository: glg/deploy-automation
event-type: cleanup-pr
client-payload: >-
{
"source_repo": "${{ github.repository }}",
"pr_number": ${{ github.event.pull_request.number }}
}# glg/deploy-automation/.github/workflows/deploy-pr.yml
name: Deploy PR to Testing Cluster
on:
repository_dispatch:
types: [deploy-pr]
env:
MAX_PR_DEPLOYMENTS: 3
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Extract payload
id: payload
run: |
echo "source_repo=${{ github.event.client_payload.source_repo }}" >> "$GITHUB_OUTPUT"
echo "pr_number=${{ github.event.client_payload.pr_number }}" >> "$GITHUB_OUTPUT"
echo "pr_author=${{ github.event.client_payload.pr_author }}" >> "$GITHUB_OUTPUT"
echo "image_tag=${{ github.event.client_payload.image_tag }}" >> "$GITHUB_OUTPUT"
echo "sha=${{ github.event.client_payload.sha }}" >> "$GITHUB_OUTPUT"
echo "default_branch=${{ github.event.client_payload.default_branch }}" >> "$GITHUB_OUTPUT"
# --- VALIDATION PHASE ---
- name: Validate image tag format
run: |
TAG="${{ steps.payload.outputs.image_tag }}"
if [[ ! "$TAG" =~ ^pr-[0-9]+-[a-f0-9]{7,40}$ ]]; then
echo "::error::Invalid image tag format: $TAG"
exit 1
fi
- name: Validate source repo is in org
run: |
REPO="${{ steps.payload.outputs.source_repo }}"
if [[ ! "$REPO" =~ ^glg/ ]]; then
echo "::error::Source repo is not in glg org: $REPO"
exit 1
fi
- name: Generate dispatcher token (for reading configs)
id: dispatcher-token
uses: actions/create-github-app-token@v1
with:
app-id: ${{ secrets.DISPATCHER_APP_ID }}
private-key: ${{ secrets.DISPATCHER_PRIVATE_KEY }}
owner: glg
- name: Validate PR author is a bot
env:
GH_TOKEN: ${{ steps.dispatcher-token.outputs.token }}
run: |
REPO="${{ steps.payload.outputs.source_repo }}"
PR_NUM="${{ steps.payload.outputs.pr_number }}"
PR_DATA=$(gh api "repos/${REPO}/pulls/${PR_NUM}" --jq '{type: .user.type, login: .user.login, state: .state}')
USER_TYPE=$(echo "$PR_DATA" | jq -r '.type')
USER_LOGIN=$(echo "$PR_DATA" | jq -r '.login')
PR_STATE=$(echo "$PR_DATA" | jq -r '.state')
if [[ "$PR_STATE" != "open" ]]; then
echo "::error::PR #${PR_NUM} is not open (state: ${PR_STATE})"
exit 1
fi
if [[ "$USER_TYPE" != "Bot" ]]; then
echo "::error::PR author is not a bot (type: ${USER_TYPE})"
exit 1
fi
echo "pr_author_login=${USER_LOGIN}" >> "$GITHUB_OUTPUT"
- name: Fetch actor allowlist
id: allowlist
env:
GH_TOKEN: ${{ steps.dispatcher-token.outputs.token }}
run: |
ALLOWLIST=$(gh api "repos/glg/deploy-config/contents/actor_allowlist.yml" --jq '.content' | base64 -d)
AUTHOR="${{ steps.payload.outputs.pr_author }}"
if ! echo "$ALLOWLIST" | grep -qxF " - ${AUTHOR}"; then
echo "::error::Actor '${AUTHOR}' is not in the allowlist"
exit 1
fi
echo "Actor '${AUTHOR}' is in the allowlist"
- name: Fetch .deploy.yml from default branch
id: deploy-config
env:
GH_TOKEN: ${{ steps.dispatcher-token.outputs.token }}
run: |
REPO="${{ steps.payload.outputs.source_repo }}"
BRANCH="${{ steps.payload.outputs.default_branch }}"
CONFIG=$(gh api "repos/${REPO}/contents/.deploy.yml?ref=${BRANCH}" --jq '.content' | base64 -d)
CLUSTER=$(echo "$CONFIG" | yq '.cluster')
SERVICE_PATH=$(echo "$CONFIG" | yq '.service_path')
if [[ -z "$CLUSTER" || "$CLUSTER" == "null" ]]; then
echo "::error::.deploy.yml is missing 'cluster' field"
exit 1
fi
if [[ -z "$SERVICE_PATH" || "$SERVICE_PATH" == "null" ]]; then
echo "::error::.deploy.yml is missing 'service_path' field"
exit 1
fi
echo "cluster=${CLUSTER}" >> "$GITHUB_OUTPUT"
echo "service_path=${SERVICE_PATH}" >> "$GITHUB_OUTPUT"
- name: Validate cluster and resolve deployment repo
id: cluster
env:
GH_TOKEN: ${{ steps.dispatcher-token.outputs.token }}
run: |
CLUSTERS_CONFIG=$(gh api "repos/glg/deploy-config/contents/clusters.yml" --jq '.content' | base64 -d)
CLUSTER="${{ steps.deploy-config.outputs.cluster }}"
# Check cluster is in allowlist
if ! echo "$CLUSTERS_CONFIG" | yq ".allowed_clusters[]" | grep -qxF "$CLUSTER"; then
echo "::error::Cluster '${CLUSTER}' is not in the allowed clusters list"
exit 1
fi
# Derive deployment repo from cluster ID (enforced naming convention)
DEPLOY_REPO="glg/gds.clusterconfig.${CLUSTER}"
echo "deploy_repo=${DEPLOY_REPO}" >> "$GITHUB_OUTPUT"
- name: Check PR deployment count
env:
GH_TOKEN: ${{ steps.dispatcher-token.outputs.token }}
run: |
DEPLOY_REPO="${{ steps.cluster.outputs.deploy_repo }}"
SERVICE_PATH="${{ steps.deploy-config.outputs.service_path }}"
SERVICE_NAME=$(basename "$SERVICE_PATH")
# Count existing PR deployment folders for this service
EXISTING=$(gh api "repos/${DEPLOY_REPO}/contents/$(dirname "$SERVICE_PATH")" --jq '.[].name' 2>/dev/null | grep -c "^${SERVICE_NAME}-pr-" || true)
if [[ "$EXISTING" -ge "$MAX_PR_DEPLOYMENTS" ]]; then
echo "::error::Service '${SERVICE_NAME}' already has ${EXISTING} PR deployments (max: ${MAX_PR_DEPLOYMENTS})"
exit 1
fi
echo "Current PR deployments for ${SERVICE_NAME}: ${EXISTING}"
# --- DEPLOYMENT PHASE ---
- name: Generate deploy-bot token
id: deploy-token
uses: actions/create-github-app-token@v1
with:
app-id: ${{ secrets.DEPLOY_BOT_APP_ID }}
private-key: ${{ secrets.DEPLOY_BOT_PRIVATE_KEY }}
owner: glg
repositories: ${{ steps.cluster.outputs.deploy_repo }}
- name: Generate orders folder and deploy
env:
GH_TOKEN: ${{ steps.deploy-token.outputs.token }}
run: |
DEPLOY_REPO="${{ steps.cluster.outputs.deploy_repo }}"
SERVICE_PATH="${{ steps.deploy-config.outputs.service_path }}"
SERVICE_NAME=$(basename "$SERVICE_PATH")
SERVICE_DIR=$(dirname "$SERVICE_PATH")
PR_NUMBER="${{ steps.payload.outputs.pr_number }}"
IMAGE_TAG="${{ steps.payload.outputs.image_tag }}"
SOURCE_REPO="${{ steps.payload.outputs.source_repo }}"
PR_FOLDER="${SERVICE_NAME}-pr-${PR_NUMBER}"
# Clone deployment repo
git clone "https://x-access-token:${GH_TOKEN}@github.com/${DEPLOY_REPO}.git" deploy-repo
cd deploy-repo
git config user.name "glg-deploy-bot[bot]"
git config user.email "glg-deploy-bot[bot]@users.noreply.github.com"
# Copy existing service folder as base (or fail if it doesn't exist)
if [[ ! -d "${SERVICE_PATH}" ]]; then
echo "::error::Service path '${SERVICE_PATH}' does not exist in ${DEPLOY_REPO}"
exit 1
fi
# Remove existing PR folder if it exists (update scenario)
rm -rf "${SERVICE_DIR}/${PR_FOLDER}"
# Copy and modify
cp -r "${SERVICE_PATH}" "${SERVICE_DIR}/${PR_FOLDER}"
# Update the dockerdeploy line in the orders file
ORDERS_FILE="${SERVICE_DIR}/${PR_FOLDER}/orders"
if [[ ! -f "$ORDERS_FILE" ]]; then
echo "::error::No orders file found at ${ORDERS_FILE}"
exit 1
fi
# Replace the dockerdeploy line's tag portion
# Original: dockerdeploy github/glg/apollo-admin/main:latest
# Updated: dockerdeploy github/glg/apollo-admin/main:pr-42-abc1234
sed -i.bak -E "s|(dockerdeploy [^:]+):.*|\1:${IMAGE_TAG}|" "$ORDERS_FILE"
rm -f "${ORDERS_FILE}.bak"
# Commit and push with retry for concurrent pushes
git add -A
git commit -m "deploy: ${SERVICE_NAME} pr-${PR_NUMBER} from ${SOURCE_REPO}#${PR_NUMBER}
Source: ${SOURCE_REPO}#${PR_NUMBER}
Image tag: ${IMAGE_TAG}
Automated by glg/deploy-automation"
MAX_RETRIES=3
for i in $(seq 1 $MAX_RETRIES); do
if git push origin main; then
echo "Successfully deployed ${PR_FOLDER}"
break
fi
if [[ $i -eq $MAX_RETRIES ]]; then
echo "::error::Failed to push after ${MAX_RETRIES} retries"
exit 1
fi
echo "Push failed, retrying (attempt $((i+1))/${MAX_RETRIES})..."
git pull --rebase origin main
done# glg/deploy-automation/.github/workflows/cleanup-pr.yml
name: Cleanup PR Deployment
on:
repository_dispatch:
types: [cleanup-pr]
jobs:
cleanup:
runs-on: ubuntu-latest
steps:
- name: Extract payload
id: payload
run: |
echo "source_repo=${{ github.event.client_payload.source_repo }}" >> "$GITHUB_OUTPUT"
echo "pr_number=${{ github.event.client_payload.pr_number }}" >> "$GITHUB_OUTPUT"
- name: Validate source repo is in org
run: |
REPO="${{ steps.payload.outputs.source_repo }}"
if [[ ! "$REPO" =~ ^glg/ ]]; then
echo "::error::Source repo is not in glg org: $REPO"
exit 1
fi
- name: Generate dispatcher token
id: dispatcher-token
uses: actions/create-github-app-token@v1
with:
app-id: ${{ secrets.DISPATCHER_APP_ID }}
private-key: ${{ secrets.DISPATCHER_PRIVATE_KEY }}
owner: glg
- name: Fetch .deploy.yml from default branch
id: deploy-config
env:
GH_TOKEN: ${{ steps.dispatcher-token.outputs.token }}
run: |
REPO="${{ steps.payload.outputs.source_repo }}"
# Get default branch
DEFAULT_BRANCH=$(gh api "repos/${REPO}" --jq '.default_branch')
CONFIG=$(gh api "repos/${REPO}/contents/.deploy.yml?ref=${DEFAULT_BRANCH}" --jq '.content' | base64 -d)
CLUSTER=$(echo "$CONFIG" | yq '.cluster')
SERVICE_PATH=$(echo "$CONFIG" | yq '.service_path')
echo "cluster=${CLUSTER}" >> "$GITHUB_OUTPUT"
echo "service_path=${SERVICE_PATH}" >> "$GITHUB_OUTPUT"
- name: Resolve deployment repo
id: cluster
env:
GH_TOKEN: ${{ steps.dispatcher-token.outputs.token }}
run: |
CLUSTERS_CONFIG=$(gh api "repos/glg/deploy-config/contents/clusters.yml" --jq '.content' | base64 -d)
CLUSTER="${{ steps.deploy-config.outputs.cluster }}"
# Validate cluster is in allowlist
if ! echo "$CLUSTERS_CONFIG" | yq ".allowed_clusters[]" | grep -qxF "$CLUSTER"; then
echo "::error::Cluster '${CLUSTER}' is not in the allowed clusters list"
exit 1
fi
# Derive deployment repo from cluster ID
DEPLOY_REPO="glg/gds.clusterconfig.${CLUSTER}"
echo "deploy_repo=${DEPLOY_REPO}" >> "$GITHUB_OUTPUT"
- name: Generate deploy-bot token
id: deploy-token
uses: actions/create-github-app-token@v1
with:
app-id: ${{ secrets.DEPLOY_BOT_APP_ID }}
private-key: ${{ secrets.DEPLOY_BOT_PRIVATE_KEY }}
owner: glg
repositories: ${{ steps.cluster.outputs.deploy_repo }}
- name: Remove PR deployment folder
env:
GH_TOKEN: ${{ steps.deploy-token.outputs.token }}
run: |
DEPLOY_REPO="${{ steps.cluster.outputs.deploy_repo }}"
SERVICE_PATH="${{ steps.deploy-config.outputs.service_path }}"
SERVICE_NAME=$(basename "$SERVICE_PATH")
SERVICE_DIR=$(dirname "$SERVICE_PATH")
PR_NUMBER="${{ steps.payload.outputs.pr_number }}"
SOURCE_REPO="${{ steps.payload.outputs.source_repo }}"
PR_FOLDER="${SERVICE_NAME}-pr-${PR_NUMBER}"
git clone "https://x-access-token:${GH_TOKEN}@github.com/${DEPLOY_REPO}.git" deploy-repo
cd deploy-repo
git config user.name "glg-deploy-bot[bot]"
git config user.email "glg-deploy-bot[bot]@users.noreply.github.com"
TARGET="${SERVICE_DIR}/${PR_FOLDER}"
if [[ ! -d "$TARGET" ]]; then
echo "PR deployment folder '${TARGET}' does not exist, nothing to clean up"
exit 0
fi
rm -rf "$TARGET"
git add -A
git commit -m "cleanup: remove ${PR_FOLDER} (${SOURCE_REPO}#${PR_NUMBER} closed)
Source: ${SOURCE_REPO}#${PR_NUMBER}
Automated by glg/deploy-automation"
MAX_RETRIES=3
for i in $(seq 1 $MAX_RETRIES); do
if git push origin main; then
echo "Successfully cleaned up ${PR_FOLDER}"
break
fi
if [[ $i -eq $MAX_RETRIES ]]; then
echo "::error::Failed to push after ${MAX_RETRIES} retries"
exit 1
fi
echo "Push failed, retrying..."
git pull --rebase origin main
done# glg/deploy-automation/.github/workflows/gc.yml
name: Garbage Collect Stale PR Deployments
on:
schedule:
- cron: '0 6 * * *' # Daily at 6am UTC
workflow_dispatch: {} # Allow manual trigger
jobs:
gc:
runs-on: ubuntu-latest
steps:
- name: Generate dispatcher token
id: dispatcher-token
uses: actions/create-github-app-token@v1
with:
app-id: ${{ secrets.DISPATCHER_APP_ID }}
private-key: ${{ secrets.DISPATCHER_PRIVATE_KEY }}
owner: glg
- name: Generate deploy-bot token
id: deploy-token
uses: actions/create-github-app-token@v1
with:
app-id: ${{ secrets.DEPLOY_BOT_APP_ID }}
private-key: ${{ secrets.DEPLOY_BOT_PRIVATE_KEY }}
owner: glg
- name: Fetch cluster config
id: config
env:
GH_TOKEN: ${{ steps.dispatcher-token.outputs.token }}
run: |
gh api "repos/glg/deploy-config/contents/clusters.yml" --jq '.content' | base64 -d > clusters.yml
- name: Scan and clean stale deployments
env:
GH_TOKEN_READ: ${{ steps.dispatcher-token.outputs.token }}
GH_TOKEN_WRITE: ${{ steps.deploy-token.outputs.token }}
run: |
ORPHANS_FOUND=0
# Iterate over each allowed cluster and derive deployment repo
for CLUSTER_ID in $(yq '.allowed_clusters[]' clusters.yml); do
DEPLOY_REPO="glg/gds.clusterconfig.${CLUSTER_ID}"
echo "Scanning ${DEPLOY_REPO}..."
# List all directories that match the *-pr-* pattern
# This is a simplified scan — adjust based on your actual directory structure
DIRS=$(GH_TOKEN="$GH_TOKEN_READ" gh api "repos/${DEPLOY_REPO}/git/trees/main?recursive=1" \
--jq '.tree[] | select(.type == "tree") | .path' \
| grep -E '-pr-[0-9]+$' || true)
for DIR in $DIRS; do
# Extract service name and PR number from folder name
FOLDER_NAME=$(basename "$DIR")
PR_NUM=$(echo "$FOLDER_NAME" | grep -oE 'pr-[0-9]+$' | sed 's/pr-//')
if [[ -z "$PR_NUM" ]]; then
continue
fi
# We need to find which source repo this came from.
# Check the last commit message on this folder for the source repo reference.
COMMIT_MSG=$(GH_TOKEN="$GH_TOKEN_READ" gh api "repos/${DEPLOY_REPO}/commits?path=${DIR}&per_page=1" \
--jq '.[0].commit.message' 2>/dev/null || true)
SOURCE_REPO=$(echo "$COMMIT_MSG" | grep -oE 'glg/[^ #]+' | head -1 || true)
if [[ -z "$SOURCE_REPO" ]]; then
echo " WARNING: Could not determine source repo for ${DIR}, skipping"
continue
fi
# Check if the PR is still open
PR_STATE=$(GH_TOKEN="$GH_TOKEN_READ" gh api "repos/${SOURCE_REPO}/pulls/${PR_NUM}" \
--jq '.state' 2>/dev/null || echo "not_found")
if [[ "$PR_STATE" == "open" ]]; then
echo " ${DIR}: PR #${PR_NUM} still open, keeping"
continue
fi
echo " ${DIR}: PR #${PR_NUM} is ${PR_STATE}, removing"
ORPHANS_FOUND=$((ORPHANS_FOUND + 1))
# Clone, remove, commit, push
TEMP_DIR=$(mktemp -d)
GH_TOKEN="$GH_TOKEN_WRITE" git clone "https://x-access-token:${GH_TOKEN_WRITE}@github.com/${DEPLOY_REPO}.git" "$TEMP_DIR"
cd "$TEMP_DIR"
git config user.name "glg-deploy-bot[bot]"
git config user.email "glg-deploy-bot[bot]@users.noreply.github.com"
rm -rf "$DIR"
git add -A
git commit -m "gc: remove stale deployment ${FOLDER_NAME} (${SOURCE_REPO}#${PR_NUM} ${PR_STATE})
Automated garbage collection by glg/deploy-automation"
for i in 1 2 3; do
if git push origin main; then
break
fi
git pull --rebase origin main
done
cd -
rm -rf "$TEMP_DIR"
done
done
echo "Garbage collection complete. Orphans removed: ${ORPHANS_FOUND}"
if [[ "$ORPHANS_FOUND" -gt 0 ]]; then
echo "::warning::Removed ${ORPHANS_FOUND} stale PR deployment(s)"
fi| # | Threat | Severity | Mitigation |
|---|---|---|---|
| 1 | App key compromise | CRITICAL | Two-app architecture. Source repos only hold the dispatcher key (read-only). Deploy-bot key lives only in deploy-automation repo. Even if dispatcher key leaks, attacker cannot write to deployment repos. |
| 2 | Bot actor spoofing | HIGH | Double validation: user.type == "Bot" (GitHub-controlled field) AND exact-match against actor_allowlist.yml in locked-down config repo. |
| 3 | Malicious .deploy.yml |
HIGH | Always read from the default branch, never the PR branch. Cluster validated against allowlist. Deployment repo resolved from the config repo, not from .deploy.yml. |
| 4 | Deployment flooding / DoS | MEDIUM | Max 3 active PR deployments per service. Enforced in validation phase before any write occurs. |
| 5 | Command injection via PR content | MEDIUM | Image tag validated against strict regex ^pr-\d+-[a-f0-9]{7,40}$. All PR-derived values passed through environment variables, not string interpolation. |
| 6 | Race conditions in deployment repo | LOW-MED | Retry loop with git pull --rebase on push failure (up to 3 attempts). |
| 7 | GitHub App over-permissioning | MEDIUM | Deploy-bot installed only on gds.clusterconfig.* deployment repos. Dispatcher installed on source repos + config repos. Neither has more access than needed. |
| 8 | Stale deployments from cleanup failures | LOW-MED | Daily cron GC scans all deployment repos, cross-references with PR state, removes orphaned folders. Warns via GitHub Actions annotations. |
| 9 | Compromised shared workflow | HIGH | Mitigated by the dispatch pattern: there is no reusable workflow called by source repos. All logic lives in deploy-automation which is protected by branch protection and CODEOWNERS. Source repos only send a dispatch event. |
- Source repos never hold deployment write credentials — they only have the dispatcher app key which can read and dispatch, never write
.deploy.ymlis read from the default branch — PR authors cannot tamper with cluster targeting- Cluster allowlist is in a separate locked-down repo — only the platform team can modify what clusters are targetable
- Actor allowlist is centrally managed — adding a new bot type requires platform team review
- All validation happens in
deploy-automation— source repos have no say in what gets deployed where beyond their merged.deploy.yml - Rate limited — max 3 concurrent PR deployments per service
- Self-healing — scheduled GC catches any cleanup failures
- No reusable workflow to compromise — the dispatch pattern means source repos never reference or run
deploy-automationcode directly
Every deployment must pass ALL of these checks:
| # | Check | Prevents |
|---|---|---|
| 1 | PR exists and is open | Stale/invalid dispatch payloads |
| 2 | pr.user.type == "Bot" |
Human PRs triggering deploys |
| 3 | pr.user.login in actor_allowlist.yml |
Unknown bots triggering deploys |
| 4 | .deploy.yml read from default branch |
PR branch tampering with config |
| 5 | cluster in allowed_clusters |
Deploying to production |
| 6 | deployment_repo derived from glg/gds.clusterconfig.{cluster_id} convention |
Arbitrary repo targeting |
| 7 | image_tag matches ^pr-\d+-[a-f0-9]{7,40}$ |
Command injection via tag |
| 8 | Active PR deployments for service < 3 | Resource exhaustion / flooding |
| 9 | Source repo belongs to the org (^glg/) |
Cross-org abuse |
- Create
glg-deploy-dispatcherGitHub App- Permissions:
contents: read,metadata: read - Install on: all source repos +
deploy-automation+deploy-config
- Permissions:
- Create
glg-deploy-botGitHub App- Permissions:
contents: write,metadata: read - Install on:
gds.clusterconfig.*deployment repos only
- Permissions:
- Create
glg/deploy-automationrepo- Add repo secrets:
DEPLOY_BOT_APP_ID,DEPLOY_BOT_PRIVATE_KEY,DISPATCHER_APP_ID,DISPATCHER_PRIVATE_KEY - Add the three workflows:
deploy-pr.yml,cleanup-pr.yml,gc.yml - Enable branch protection on
main
- Add repo secrets:
- Create
glg/deploy-configrepo- Add
clusters.ymlandactor_allowlist.yml - Enable branch protection: require 2 reviewers
- Add CODEOWNERS:
@glg/platform-team
- Add
- Add org secrets scoped to source repos:
DISPATCHER_APP_IDDISPATCHER_PRIVATE_KEY
- Add
.deploy.ymlto the repo's default branch - Add
trigger-deployandtrigger-cleanupjobs to existing Docker build workflow