Team: Agentic Backend
Author: Engineering
Date: 2026-03-07
Status: Implementation planned — [plan at docs/plans/2026-03-07-cli-skill-management.md]
We currently manage 7 custom Claude skills by manually uploading zip files to platform.claude.com, then updating environment variables (PPT_SKILL_ID, etc.) in Pydantic settings or k8s config. This causes:
- No git tracking — skill content is opaque; nobody can read, diff, or review it
- Outages on ID mismatch — wrong env var = broken skill = broken document generation
- Version coupling failures — different k8s clusters run different image tags, but all share
"version": "latest", so a skill update for one cluster breaks another - No admin recoverability — if a skill breaks, you can't inspect or fix it in platform.claude.com
| Env Var | Service | Skill Type |
|---|---|---|
PPT_SKILL_ID |
PowerPoint generation | custom |
EY_PPT_SKILL_ID |
EY-branded PowerPoint | custom |
XLSX_SKILL_ID |
Spreadsheet generation | custom |
DOCX_SKILL_ID |
Word document generation | custom |
ORG_CHART_SKILL_ID |
Org chart / network graph | custom |
DATA_ANALYSIS_SKILL_ID |
Data analysis | custom |
D3_DATA_VIZ_SKILL_ID |
D3 visualizations | custom |
Store skill source files in git and deploy them via the Anthropic Skills API (POST /v1/skills/{id}/versions), pinning versions per build.
Anthropic exposes a full Skills CRUD API in the Python SDK:
# Upload new version
client.beta.skills.versions.create(skill_id=..., files=[...], betas=["skills-2025-10-02"])
# List versions
client.beta.skills.versions.list(skill_id=..., betas=["skills-2025-10-02"])This means we can automate everything — no more manual uploads.
flowchart LR
Engineer["Engineer"]
Platform["platform.claude.com"]
EnvVar["K8s Env Vars"]
App["Agentic Backend"]
Engineer -->|"1. manually upload zip"| Platform
Platform -->|"2. returns skill_id"| Engineer
Engineer -->|"3. manually update env var"| EnvVar
EnvVar -->|"4. app reads skill_id"| App
App -->|"5. uses skill_id + version=latest"| Platform
style Engineer fill:#f99,stroke:#c00
style Platform fill:#fcc,stroke:#c00
style EnvVar fill:#fcc,stroke:#c00
flowchart LR
Git["Git Repo<br/>skills/ directory"]
CI["CI/CD Pipeline"]
API["Anthropic Skills API"]
K8s["K8s ConfigMap"]
App["Agentic Backend"]
Git -->|"1. push to branch"| CI
CI -->|"2. deploy-skills CLI"| API
API -->|"3. returns version ID"| CI
CI -->|"4. inject VERSION env var"| K8s
K8s -->|"5. app reads pinned version"| App
App -->|"6. uses skill_id + version=pinned"| API
style Git fill:#9f9,stroke:#0a0
style CI fill:#9f9,stroke:#0a0
style API fill:#ccf,stroke:#00c
flowchart TB
subgraph "Git Tags"
v320["v3.2.0<br/>skills/ @ commit abc"]
v342["v3.4.2<br/>skills/ @ commit xyz"]
end
subgraph "Anthropic Skills API"
sv1["Skill Version<br/>1738000000"]
sv2["Skill Version<br/>1741318800"]
end
subgraph "K8s Clusters"
A["Cluster A<br/>image: v3.2.0<br/>PPT_SKILL_VERSION=1738000000"]
B["Cluster B<br/>image: v3.4.2<br/>PPT_SKILL_VERSION=1741318800"]
end
v320 --> sv1
v342 --> sv2
sv1 --> A
sv2 --> B
style A fill:#ffe,stroke:#cc0
style B fill:#efe,stroke:#0a0
agentic-backend/
├── skills/ # NEW — git-tracked skill source
│ ├── README.md
│ ├── pptx/
│ │ ├── SKILL.md # Required entry point
│ │ ├── scripts/ # Optional executable code
│ │ └── assets/ # Optional templates
│ ├── ey-pptx/
│ │ └── SKILL.md
│ ├── xlsx/
│ │ └── SKILL.md
│ ├── docx/
│ │ └── SKILL.md
│ ├── org-chart/
│ │ └── SKILL.md
│ ├── data-analysis/
│ │ └── SKILL.md
│ └── d3-data-viz/
│ └── SKILL.md
├── src/
│ ├── core/
│ │ └── settings.py # MODIFIED — adds *_SKILL_VERSION fields
│ ├── cli.py # MODIFIED — adds deploy-skills, list-skill-versions
│ └── services/
│ └── claude_skills/
│ ├── registry.py # NEW — skill name -> settings mapping
│ └── config.py # UNCHANGED — already has skill_version field
# Deploy all skills
just deploy-skills
# Deploy one skill
just deploy-skill pptx
# Dry run (validate only)
just deploy-skills --dry-runOutput:
Deploying pptx...
Deploying xlsx...
...
┌─────────────┬──────────────────────────────┬─────────────┬──────────────────────────────────┐
│ Skill │ Skill ID │ New Version │ Env Var │
├─────────────┼──────────────────────────────┼─────────────┼──────────────────────────────────┤
│ pptx │ skill_01PU1cCLSDx4Ao8g3ke9m… │ 1741318800 │ PPT_SKILL_VERSION=1741318800 │
│ xlsx │ skill_014vWE9w1c9CVRmoie7np… │ 1741318801 │ XLSX_SKILL_VERSION=1741318801 │
│ ... │ ... │ ... │ ... │
└─────────────┴──────────────────────────────┴─────────────┴──────────────────────────────────┘
Set these version values as env vars in your k8s config to pin this deployment.
just skill-versions pptxflowchart TD
A["1. Extract content from existing<br/>skill zips on platform.claude.com"] --> B["2. Place files in skills/<name>/<br/>with SKILL.md at root"]
B --> C["3. Run: just deploy-skills<br/>(uploads via API, prints versions)"]
C --> D["4. Set *_SKILL_VERSION env vars<br/>in k8s configmaps per cluster"]
D --> E["5. Deploy — app uses<br/>pinned versions instead of 'latest'"]
E --> F["6. Remove manual upload<br/>workflow from team process"]
style A fill:#fcc
style F fill:#9f9
- Skill IDs are stable —
PPT_SKILL_IDetc. don't change, they're permanent identifiers - Default is backward-compatible —
*_SKILL_VERSIONdefaults to"latest", so existing deployments keep working - No API contract changes — the Anthropic container config just gets a pinned version string instead of
"latest"
| # | Action | Owner | Status |
|---|---|---|---|
| 1 | Review this RFC and the implementation plan | Team | Pending |
| 2 | Extract existing skill zip contents into skills/ directories |
Skill owners | Not started |
| 3 | Implement the 10 tasks in the plan (settings, wiring, registry, CLI, tests) | Assignee TBD | Not started |
| 4 | Add deploy-skills step to CI/CD pipeline |
DevOps | Not started |
| 5 | Update k8s manifests to include *_SKILL_VERSION env vars |
DevOps | Not started |
| 6 | Deprecate manual platform.claude.com upload process | Team | After rollout |
- Who has the existing skill zips? We need to extract their contents into the
skills/directories. - CI/CD integration timing — should we add the deploy step to the existing release pipeline or keep it manual via
just deploy-skillsinitially? - Rollout strategy — suggest starting with one skill (e.g.,
pptx) to validate, then rolling out to all 7.