Skip to content

Instantly share code, notes, and snippets.

@apolloclark
Last active January 1, 2026 18:54
Show Gist options
  • Select an option

  • Save apolloclark/e2285b8aa6bcc8cc0c08c9b1c28eeab1 to your computer and use it in GitHub Desktop.

Select an option

Save apolloclark/e2285b8aa6bcc8cc0c08c9b1c28eeab1 to your computer and use it in GitHub Desktop.
K8S Vulnerability Management

K8S Vulnerability Management

Diagram for a Trivy Container Image Scanner and CI/CD Pipeline

I've done Vulnerability Management as a Security Engineer for the past 8 years, across AWS EC2, Azure VM, GCP VM, and Docker / Kubernetes Container Images. Here is my best advice to any organization that wants to do it.

1. Don't Buy a Scanner, Initially Build it with Open-source

Build it, then buy it.

It's tempting to just buy a security vendor product, and that is a good idea, if there is budget and willingness within the organization to adopt a new tool. However, even with a vendor tool, the following steps will still need to be implemented to be successful. I've seen multiple projects that adopted a vendor tool, but failed because they did not follow all of these steps, and faced a lot of pushback from developer teams and executives.

Container Scanners

  • Trivy
  • Clair
  • Grype / Anchore
  • Dagda
  • OpenSCAP
  • Dockle
  • Tern
  • Prisma Cloud
  • Wiz

Analysis

2. Create Dedicated Git Repos for Each Image

You cannot Git Version / Tag a folder in a Git Repo.

A common approach is to create a Git Monorepo of all or many Container Images, but that prevents you from doing fine-grained Git Versioning / Tagging for each individual Container Image. If there are Image A and Image B in the same Git Repo, but you only made changes to Image A, then you create a new Git Version / Tag, which applies to all folders and all files in the Git Repo, it implies that Image B was also updated. It is better to put Image A into a dedicated Git Repo and Image B into a dedicated Git Repo, so that they can be updated, tested, and deployed, independently. Ideally, all Container Images should be rebuilt every 7 days, to fulfil compliance with security standards such as FedRAMP, and have nightly builds for emergency security updates for Critical severity vulnerabilities.

3. Establish Image Ownership

Every Container Image should have one Owners Team.

I've heard countless times: "Everyone uses this image, so everyone owns it." That's not good enough! A specific and singular Developer Team needs to be responsible for maintaining, upgrading, fixing, and deploying each Contianer Image. The best way to do this is to require every Git Repo within an organization have an Owners.txt file. You can automate verifying this with a Python script that scans every Git Repo, every morning, for the existence and contents of this file. Even better is to include specific Developer Team Manager names, emails, and Slack channels, which are also verified to still be active in case someone leaves the organization.

4. Establish Consistent Tagging / Versioning

Inconsistent Container Image Tagging makes accurate analysis impossible.

I've worked with multiple companies where each Developer Team has their own standard for tagging, ex: "1.0.1", "1.0.1-dev", "1.0.1-abcd1234", "1.0.1-abcd1234-dev", "2026.01.01-abcd1234", "2026.01.01-abcd1234-dev", "20260101-abcd1234", "20260101-abcd1234-dev" The organization needs to have every Developer Team use the same Container Image Tagging / Versioning. I suggest the last example: "<date>-<git_hash>-<environment>".

5. Stop Using Publicly Available Images

Even the most popular Container Images for things as common as Python are only rebuilt every 3-4 weeks, which is not fast enough to integrate security fixes for Critical and High severity vulnerabilities. The latest FedRAMP standard is 3 days for Critical, 7 days for High, 21 days for Moderate, and 180 days for Low.

6. Use Shared Secure Base Images

While migrating away from using Public Images, it's important to establish the organization-wide usage of Shared Secure Base Images, often Ubuntu / Debian, Amazon Linux 2 / RHEL, and optionally Window Server, or some variation. Ideally it would only be RHEL or only Debian, but in practice the licensing costs for RHEL become "too expensive" at scale, which is a discussion needed between the Sales, Security, Platform, Legal, and Finance Teams. If you don't want to build your own Shared Secure Base Images, investigate purchasing them from ChainGaurd.

7. Create a Standardized CI/CD Pipeline

Technically the Shared Secure Base Images can be manually built, but it should be fully automated with a CI/CD Pipeline. I suggest using GitHub Actions, HashiCorp Packer, and Ansible. GitHub Actions has been replacing Jenkins and other CI/CD Pipeline Tools for the past few years now. HashiCorp Packer is a CLI tool for easily building both Cloud VM and Container Images. Ansible can be used to automated hardening of Container Images with there being multiple open-source CIS Benchmark Ansible Playbooks available.

8. Test Everything

There are not any silver bullets in IT, except having a robust test suite with 90%+ code coverage. Any organization that is running more than 10 Container Images needs automated testing suites. Manual testing in 2026 is not good enough. See my gist below for CI/CD Pipeline Tool suggestions.

9. Scan Everything

The targets to scan are:

  • AWS EKS / ArgoCD
  • AWS ECR
  • JFrog Artifactory
  • CI/CD Build Logs

What is running? What is stored? What are the result of that latest CI/CD Builds?

Ideally both AWS ECR and JFrog Artifactory should be purged of any Container Image that is older than 6 months. After that initial change has been made, the Retention Policy can be shortened to 3 months, 1 month, and even just 7 days. The results of all these scans and logs should be stored in a database, be it SQL such as MySQL or Postgres, or NoSQL such as ElasticSearch, MongoDB, or similar.

10. Build Reporting

Now that everything has been scanned and is stored in a database that can be queried, it's possible to create reporting. This can be as simple as a Python script that creates a CSV / Excel file, and is run once a week on the latest findings, creating both point in time global summaries, and changes over time. Generally, executives only care about the changes over time, and expect it to be improving. If you skipped any of the earlier steps in this document then you will have a lot of difficulty with explaining to executives: "What needs to be fixed?" and "Who is responsible for fixing it?"

Suggested KPIS:

  • Total Count, of All Vulnerabilities
  • Total Count, of All Vulnerabilities, Group By Severity
  • Total Count, of All Vulnerabilities, Group By Severity and Team (Most Vulnerable Team)
  • Total Count, of Top 5 Vulnerabilities, Occurances (Most Common Vulnerabilities)
  • Container Images, Order By Vulnerability Count (Most Vulnerable Images)
  • Container Images, Order By Age Desc (Oldest Images)

11. Build Dashboards

Executives like to visualize data, and a lot of Developer Team Managers don't know how to write SQL queries. I prefer to give everyone read-only SQL access so that they can self-serve analysis with the data. I've used countless visualization tools.

12. Block Insecure Builds

After a few weeks or months of basically changing how the organization operates, then you can start to block insecure builds within the CI/CD Pipelines. At this point of technical maturity the Developer Teams will have all of the information and tools they need to be successful.

13. Buy an Enterprise Tool

After doing the earlier steps for 3-6 months, then consider buying an enterprise vendor tool. If you choose a tool like Trivy, then you can swap it in for the enterprise version called AquaSec. The source for Reporting and Dashboards will change, but the business process workflow will stay the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment