Skip to content

Instantly share code, notes, and snippets.

@qinqon
Created November 25, 2025 08:56
Show Gist options
  • Select an option

  • Save qinqon/cf3790d59c83ed4760e4d4ac0b7d63cd to your computer and use it in GitHub Desktop.

Select an option

Save qinqon/cf3790d59c83ed4760e4d4ac0b7d63cd to your computer and use it in GitHub Desktop.

Complete PoC: Agent Platform with KubeVirt VMs and ISO Boot

This is a comprehensive PoC for using actual KubeVirt VirtualMachine resources with the HyperShift Agent platform, including full ISO boot support.

Table of Contents

  1. Architecture Overview
  2. Prerequisites
  3. Phase 1: Management Cluster Setup
  4. Phase 2: OpenShift Virtualization
  5. Phase 3: KubeVirtBMC Deployment
  6. Phase 4: HyperShift and Agent Platform
  7. Phase 5: KubeVirt Worker VMs
  8. Phase 6: ISO Boot Configuration
  9. Phase 7: BareMetalHost Integration
  10. Phase 8: NodePool Creation
  11. Verification and Testing
  12. Troubleshooting
  13. Clean Up

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    Management Cluster (OCP)                      │
│                                                                   │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ OpenShift Virtualization (KubeVirt)                        │ │
│  │                                                              │ │
│  │  ┌──────────────────┐  ┌──────────────────┐               │ │
│  │  │ VirtualMachine   │  │ VirtualMachine   │               │ │
│  │  │  worker-0        │  │  worker-1        │               │ │
│  │  │                  │  │                  │               │ │
│  │  │ ┌──────────────┐ │  │ ┌──────────────┐ │               │ │
│  │  │ │ Discovery ISO│ │  │ │ Discovery ISO│ │               │ │
│  │  │ │ (CD-ROM)     │ │  │ │ (CD-ROM)     │ │               │ │
│  │  │ │ bootOrder: 1 │ │  │ │ bootOrder: 1 │ │               │ │
│  │  │ └──────────────┘ │  │ └──────────────┘ │               │ │
│  │  │ ┌──────────────┐ │  │ ┌──────────────┐ │               │ │
│  │  │ │ OS Disk      │ │  │ │ OS Disk      │ │               │ │
│  │  │ │ (120GB)      │ │  │ │ (120GB)      │ │               │ │
│  │  │ │ bootOrder: 2 │ │  │ │ bootOrder: 2 │ │               │ │
│  │  │ └──────────────┘ │  │ └──────────────┘ │               │ │
│  │  └──────────────────┘  └──────────────────┘               │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                   │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ KubeVirtBMC (Virtual BMC Emulation)                        │ │
│  │                                                              │ │
│  │  ┌────────────────┐  ┌────────────────┐                   │ │
│  │  │ VirtualMachine │  │ VirtualMachine │                   │ │
│  │  │ BMC (Redfish)  │  │ BMC (Redfish)  │                   │ │
│  │  │ worker-0       │  │ worker-1       │                   │ │
│  │  └────────────────┘  └────────────────┘                   │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                   │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ Metal3 + Assisted Service + HyperShift                     │ │
│  │                                                              │ │
│  │  ┌────────────────┐  ┌────────────────┐                   │ │
│  │  │ BareMetalHost  │  │ BareMetalHost  │                   │ │
│  │  │ worker-0       │  │ worker-1       │                   │ │
│  │  │ (points to BMC)│  │ (points to BMC)│                   │ │
│  │  └────────────────┘  └────────────────┘                   │ │
│  │                                                              │ │
│  │  ┌────────────────┐  ┌────────────────┐                   │ │
│  │  │ Agent          │  │ Agent          │                   │ │
│  │  │ worker-0       │  │ worker-1       │                   │ │
│  │  └────────────────┘  └────────────────┘                   │ │
│  │                                                              │ │
│  │  ┌──────────────────────────────────────┐                 │ │
│  │  │ NodePool (Agent Platform)            │                 │ │
│  │  │ - agentLabelSelector                 │                 │ │
│  │  │ - replicas: 2                        │                 │ │
│  │  └──────────────────────────────────────┘                 │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                   │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ Hosted Control Plane (HCP)                                 │ │
│  │ - etcd, kube-apiserver, etc.                               │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

Hardware Requirements

  • Hypervisor/Host:
    • 128GB RAM minimum (64GB for management cluster, 64GB for worker VMs)
    • 16+ CPU cores
    • 1TB disk space
    • Nested virtualization enabled

Software Requirements

# Check nested virtualization
cat /sys/module/kvm_intel/parameters/nested  # Should show 'Y'
# or for AMD
cat /sys/module/kvm_amd/parameters/nested

# Enable if disabled
echo "options kvm_intel nested=1" | sudo tee /etc/modprobe.d/kvm.conf
sudo modprobe -r kvm_intel
sudo modprobe kvm_intel

Required Tools

# Install kcli
curl -s https://raw.githubusercontent.com/karmab/kcli/main/install.sh | bash

# Verify installation
kcli version

Required Files

  1. OpenShift Pull Secret: Get from https://console.redhat.com/openshift/install/pull-secret
  2. SSH Public Key: For accessing VMs
# Save pull secret
cat > openshift_pull.json << 'EOF'
{your-pull-secret-json-here}
EOF

# Generate SSH key if needed
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N ""

Phase 1: Management Cluster Setup

1.1 Create Management Cluster Plan

Create mgmt-cluster.yaml:

plan: mgmt-cluster
force: true
version: stable
tag: "4.17"
cluster: "mgmt-cluster"
domain: hypershiftbm.lab
api_ip: 192.168.125.10
ingress_ip: 192.168.125.11
dualstack: false
disk_size: 200
extra_disks: [200]
memory: 64000  # 64GB for management cluster
numcpus: 16
ctlplanes: 3
workers: 0
metal3: true  # CRITICAL: Enable Metal3
network: ipv4
metallb_pool: ipv4-virtual-network
metallb_ranges:
- 192.168.125.150-192.168.125.190
metallb_autoassign: true
apps:
- lvms-operator
- metallb-operator

1.2 Deploy Management Cluster

# Deploy
kcli create cluster openshift --pf mgmt-cluster.yaml

# This will take approximately 45 minutes
# Monitor progress
kcli list cluster

1.3 Configure Kubeconfig

# Export kubeconfig
export KUBECONFIG=~/.kcli/clusters/mgmt-cluster/auth/kubeconfig

# Verify cluster
oc get nodes
oc get co  # Wait for all operators to be Available

Phase 2: OpenShift Virtualization

2.1 Install OpenShift Virtualization Operator

# Create namespace
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-cnv
EOF

# Create OperatorGroup
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-cnv-group
  namespace: openshift-cnv
spec:
  targetNamespaces:
  - openshift-cnv
EOF

# Subscribe to OpenShift Virtualization
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  channel: stable
  name: kubevirt-hyperconverged
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
EOF

# Wait for operator installation
echo "Waiting for OpenShift Virtualization operator..."
oc wait --for=condition=Ready -n openshift-cnv subscription/kubevirt-hyperconverged --timeout=600s

# Check CSV (ClusterServiceVersion)
oc get csv -n openshift-cnv

2.2 Create HyperConverged Instance

cat <<EOF | oc apply -f -
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  featureGates:
    enableCommonBootImageImport: true
    deployKubeSecondaryDNS: false
EOF

# Wait for HyperConverged to be ready (this may take 5-10 minutes)
echo "Waiting for HyperConverged deployment..."
oc wait --for=condition=Available -n openshift-cnv hyperconverged/kubevirt-hyperconverged --timeout=900s

# Verify KubeVirt is running
oc get kubevirt -n openshift-cnv
oc get pods -n openshift-cnv

2.3 Verify OpenShift Virtualization

# Check that virt-api is running
oc get pods -n openshift-cnv | grep virt-api

# Check that virt-controller is running
oc get pods -n openshift-cnv | grep virt-controller

# Check that virt-handler is running on all nodes
oc get pods -n openshift-cnv | grep virt-handler

# Test virtctl (optional)
curl -L -o /tmp/virtctl https://github.com/kubevirt/kubevirt/releases/download/v1.1.0/virtctl-v1.1.0-linux-amd64
chmod +x /tmp/virtctl
sudo mv /tmp/virtctl /usr/local/bin/virtctl

Phase 3: KubeVirtBMC Deployment

3.1 Clone and Deploy KubeVirtBMC

# Clone the repository
git clone https://github.com/starbops/kubevirtbmc.git
cd kubevirtbmc

# Deploy CRDs
oc apply -f config/crd/bases/virtualmachinebmc.bmc.tinkerbell.org_virtualmachinebmcs.yaml

# Deploy RBAC
oc apply -f config/rbac/role.yaml
oc apply -f config/rbac/role_binding.yaml
oc apply -f config/rbac/service_account.yaml

# Deploy manager
oc apply -f config/manager/manager.yaml

# Verify deployment
oc get pods -n kubevirtbmc-system
oc wait --for=condition=Ready -n kubevirtbmc-system pod -l control-plane=controller-manager --timeout=300s

Alternative: Deploy using Kustomize

cd kubevirtbmc
oc apply -k config/default

# Verify
oc get all -n kubevirtbmc-system

3.2 Verify KubeVirtBMC

# Check CRD is installed
oc get crd virtualmachinebmcs.virtualmachinebmc.bmc.tinkerbell.org

# Check operator logs
oc logs -n kubevirtbmc-system deployment/kubevirtbmc-controller-manager -f

Phase 4: HyperShift and Agent Platform

4.1 Configure Metal3 for Multi-Namespace

# Allow Metal3 to watch all namespaces (required for BMH outside openshift-machine-api)
oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true}}'

# Wait for metal3 pod to restart
echo "Waiting for metal3 pod to restart..."
sleep 10

# Wait for metal3 to be ready
until oc wait -n openshift-machine-api \
  $(oc get pods -n openshift-machine-api -l baremetal.openshift.io/cluster-baremetal-operator=metal3-state -o name) \
  --for=condition=Ready --timeout=10s >/dev/null 2>&1; do
  echo "Waiting for metal3 pod..."
  sleep 5
done

echo "Metal3 is ready!"

4.2 Install Assisted Service and Hive

Option A: Using tasty

# Install tasty
curl -s -L https://github.com/karmab/tasty/releases/download/v0.4.0/tasty-linux-amd64 > /tmp/tasty
chmod +x /tmp/tasty
sudo mv /tmp/tasty /usr/local/bin/tasty

# Install operators
tasty install assisted-service-operator hive-operator

# Wait for operators
oc wait --for=condition=Ready -n multicluster-engine subscription/assisted-service-operator --timeout=600s
oc wait --for=condition=Ready -n multicluster-engine subscription/hive-operator --timeout=600s

Option B: Manual Installation via OperatorHub

# Install MultiCluster Engine (includes Assisted Service and Hive)
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: multicluster-engine
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: multicluster-engine-og
  namespace: multicluster-engine
spec:
  targetNamespaces:
  - multicluster-engine
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: multicluster-engine
  namespace: multicluster-engine
spec:
  channel: stable-2.7
  name: multicluster-engine
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
EOF

# Wait for installation
oc wait --for=condition=Ready -n multicluster-engine subscription/multicluster-engine --timeout=600s

4.3 Configure AgentServiceConfig

export DB_VOLUME_SIZE="10Gi"
export FS_VOLUME_SIZE="10Gi"
export OCP_VERSION="4.17.0"
export OCP_MAJMIN=${OCP_VERSION%.*}
export ARCH="x86_64"
export OCP_RELEASE_VERSION=$(curl -s https://mirror.openshift.com/pub/openshift-v4/${ARCH}/clients/ocp/${OCP_VERSION}/release.txt | awk '/machine-os / { print $2 }')
export ISO_URL="https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/${OCP_MAJMIN}/${OCP_VERSION}/rhcos-${OCP_VERSION}-${ARCH}-live.${ARCH}.iso"
export ROOT_FS_URL="https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/${OCP_MAJMIN}/${OCP_VERSION}/rhcos-${OCP_VERSION}-${ARCH}-live-rootfs.${ARCH}.img"

envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
  name: agent
  namespace: multicluster-engine
spec:
  databaseStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: ${DB_VOLUME_SIZE}
  filesystemStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: ${FS_VOLUME_SIZE}
  osImages:
    - openshiftVersion: "${OCP_VERSION}"
      version: "${OCP_RELEASE_VERSION}"
      url: "${ISO_URL}"
      rootFSUrl: "${ROOT_FS_URL}"
      cpuArchitecture: "${ARCH}"
EOF

# Wait for AgentServiceConfig to be ready
oc wait --for=condition=DeploymentsHealthy -n multicluster-engine agentserviceconfig/agent --timeout=600s

# Verify assisted-service is running
oc get pods -n multicluster-engine | grep assisted

4.4 Install HyperShift Operator

# Get HyperShift CLI
export HYPERSHIFT_RELEASE=4.17
podman cp $(podman create --name hypershift --rm --pull always \
  quay.io/hypershift/hypershift-operator:${HYPERSHIFT_RELEASE}):/usr/bin/hypershift /tmp/hypershift
podman rm -f hypershift 2>/dev/null || true
sudo install -m 0755 -o root -g root /tmp/hypershift /usr/local/bin/hypershift

# Verify CLI
hypershift version

# Install HyperShift operator
hypershift install \
  --hypershift-image quay.io/hypershift/hypershift-operator:${HYPERSHIFT_RELEASE} \
  --enable-defaulting-webhook=false

# Verify installation
oc get pods -n hypershift
oc wait --for=condition=Ready -n hypershift pod -l app=operator --timeout=300s

Phase 5: KubeVirt Worker VMs

5.1 Create Namespace for Worker VMs

export WORKER_NAMESPACE="hosted-workers"
oc create namespace ${WORKER_NAMESPACE}

5.2 Create Storage Class (if needed)

Check existing storage classes:

oc get storageclass

# If using LVMS from management cluster setup
export STORAGE_CLASS="lvms-vg1"

# Or use hostpath provisioner for testing
# Note: For production, use a proper storage class

5.3 Create Worker VM Template

Save this as kubevirt-worker-template.yaml:

# This is a template - we'll substitute values for each worker
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: ${WORKER_NAME}
  namespace: ${WORKER_NAMESPACE}
  labels:
    app: hosted-cluster-worker
    worker-role: ${WORKER_ROLE}
    worker-zone: ${WORKER_ZONE}
spec:
  running: false  # Metal3 will control this via BMC
  template:
    metadata:
      labels:
        kubevirt.io/vm: ${WORKER_NAME}
        app: hosted-cluster-worker
    spec:
      domain:
        cpu:
          cores: 4
          sockets: 1
          threads: 1
        devices:
          disks:
          # Discovery ISO - boot first
          - name: discovery-iso
            bootOrder: 1
            cdrom:
              bus: sata
              readonly: true
          # OS disk - boot second (after ISO installation)
          - name: os-disk
            bootOrder: 2
            disk:
              bus: virtio
          # Cloud-init for network configuration
          - name: cloudinitdisk
            disk:
              bus: virtio
          interfaces:
          - name: default
            bridge: {}
            macAddress: "${WORKER_MAC}"
          networkInterfaceMultiqueue: true
        firmware:
          bootloader:
            bios:
              useSerial: true
        machine:
          type: q35
        resources:
          requests:
            memory: 16Gi
      networks:
      - name: default
        pod: {}
      volumes:
      # Discovery ISO (will be created later from InfraEnv)
      - name: discovery-iso
        persistentVolumeClaim:
          claimName: agent-discovery-iso
      # OS disk
      - name: os-disk
        dataVolume:
          name: ${WORKER_NAME}-os
      # Cloud-init for static IP configuration
      - name: cloudinitdisk
        cloudInitNoCloud:
          networkData: |
            version: 2
            ethernets:
              eth0:
                match:
                  macaddress: "${WORKER_MAC}"
                addresses:
                  - ${WORKER_IP}/24
                gateway4: 192.168.125.1
                nameservers:
                  addresses:
                    - 8.8.8.8
                    - 1.1.1.1
          userData: |
            #cloud-config
            ssh_authorized_keys:
              - ${SSH_PUB_KEY}
---
# DataVolume for OS disk (empty, will be populated by installer)
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ${WORKER_NAME}-os
  namespace: ${WORKER_NAMESPACE}
spec:
  source:
    blank: {}
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 120Gi
    storageClassName: ${STORAGE_CLASS}

Phase 6: ISO Boot Configuration

6.1 Create HostedCluster

export CLUSTERS_NAMESPACE="clusters"
export HOSTED_CLUSTER_NAME="agent-cluster"
export HOSTED_CONTROL_PLANE_NAMESPACE="${CLUSTERS_NAMESPACE}-${HOSTED_CLUSTER_NAME}"
export BASEDOMAIN="hypershiftbm.lab"
export PULL_SECRET_FILE=$PWD/openshift_pull.json
export OCP_RELEASE="4.17.0"

# Create namespace
oc create ns ${HOSTED_CONTROL_PLANE_NAMESPACE}

# Create hosted cluster
hypershift create cluster agent \
    --name=${HOSTED_CLUSTER_NAME} \
    --pull-secret=${PULL_SECRET_FILE} \
    --agent-namespace=${HOSTED_CONTROL_PLANE_NAMESPACE} \
    --base-domain=${BASEDOMAIN} \
    --api-server-address=api.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN} \
    --release-image=quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE}-x86_64 \
    --ssh-key=$HOME/.ssh/id_rsa.pub

# Wait for control plane pods
echo "Waiting for Hosted Control Plane pods..."
oc wait --for=condition=Ready -n ${HOSTED_CONTROL_PLANE_NAMESPACE} pod -l app=kube-apiserver --timeout=900s

6.2 Configure DNS

# Get the LoadBalancer IPs from services
export API_IP=$(oc get svc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} kube-apiserver -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export APPS_IP=$(oc get svc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} router-default -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

# Add DNS entries (for testing, use /etc/hosts on your workstation)
echo "Add these entries to your /etc/hosts or DNS server:"
echo "${API_IP} api.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN}"
echo "${API_IP} api-int.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN}"
echo "${APPS_IP} *.apps.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN}"

# On Linux/Mac workstation:
sudo tee -a /etc/hosts <<EOF
${API_IP} api.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN}
${API_IP} api-int.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN}
${APPS_IP} console-openshift-console.apps.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN}
${APPS_IP} oauth-openshift.apps.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN}
EOF

6.3 Create InfraEnv and Get ISO URL

export SSH_PUB_KEY=$(cat $HOME/.ssh/id_rsa.pub)

envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  name: ${HOSTED_CLUSTER_NAME}
  namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
spec:
  pullSecretRef:
    name: pull-secret
  sshAuthorizedKey: ${SSH_PUB_KEY}
  nmStateConfigLabelSelector:
    matchLabels: {}
EOF

# Wait for ISO to be generated
echo "Waiting for discovery ISO to be generated..."
oc wait --for=condition=ImageCreated -n ${HOSTED_CONTROL_PLANE_NAMESPACE} infraenv/${HOSTED_CLUSTER_NAME} --timeout=600s

# Get ISO URL
export ISO_URL=$(oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get infraenv ${HOSTED_CLUSTER_NAME} -o jsonpath='{.status.isoDownloadURL}')
echo "Discovery ISO URL: ${ISO_URL}"

6.4 Create DataVolume for Discovery ISO

# Create DataVolume to import discovery ISO
cat <<EOF | oc apply -f -
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: agent-discovery-iso
  namespace: ${WORKER_NAMESPACE}
spec:
  source:
    http:
      url: ${ISO_URL}
  pvc:
    accessModes:
      - ReadOnlyMany  # Can be shared by multiple VMs
    resources:
      requests:
        storage: 2Gi
    storageClassName: ${STORAGE_CLASS}
EOF

# Wait for ISO import to complete (may take 5-10 minutes)
echo "Importing discovery ISO (this may take a few minutes)..."
oc wait --for=condition=Ready -n ${WORKER_NAMESPACE} dv/agent-discovery-iso --timeout=900s

# Verify PVC was created
oc get pvc -n ${WORKER_NAMESPACE} agent-discovery-iso

6.5 Create Worker VMs with ISO Boot

Now create the actual worker VMs using the template:

export STORAGE_CLASS="lvms-vg1"  # Adjust to your storage class
export SSH_PUB_KEY=$(cat $HOME/.ssh/id_rsa.pub)

# Worker 0
export WORKER_NAME="hosted-worker-0"
export WORKER_MAC="52:54:00:aa:bb:01"
export WORKER_IP="192.168.125.201"
export WORKER_ROLE="database"
export WORKER_ZONE="zone-a"

envsubst < kubevirt-worker-template.yaml | oc apply -f -

# Worker 1
export WORKER_NAME="hosted-worker-1"
export WORKER_MAC="52:54:00:aa:bb:02"
export WORKER_IP="192.168.125.202"
export WORKER_ROLE="compute"
export WORKER_ZONE="zone-b"

envsubst < kubevirt-worker-template.yaml | oc apply -f -

# Worker 2
export WORKER_NAME="hosted-worker-2"
export WORKER_MAC="52:54:00:aa:bb:03"
export WORKER_IP="192.168.125.203"
export WORKER_ROLE="compute"
export WORKER_ZONE="zone-c"

envsubst < kubevirt-worker-template.yaml | oc apply -f -

# Wait for DataVolumes to be ready
echo "Waiting for worker OS disks to be provisioned..."
oc wait --for=condition=Ready -n ${WORKER_NAMESPACE} dv/hosted-worker-0-os --timeout=300s
oc wait --for=condition=Ready -n ${WORKER_NAMESPACE} dv/hosted-worker-1-os --timeout=300s
oc wait --for=condition=Ready -n ${WORKER_NAMESPACE} dv/hosted-worker-2-os --timeout=300s

# Verify VMs are created (but not running)
oc get vm -n ${WORKER_NAMESPACE}
oc get vmi -n ${WORKER_NAMESPACE}  # Should be empty (VMs not started yet)

Phase 7: BareMetalHost Integration

7.1 Create VirtualMachineBMC Resources

For each worker VM, create a BMC:

# Worker 0
cat <<EOF | oc apply -f -
apiVersion: virtualmachinebmc.bmc.tinkerbell.org/v1alpha1
kind: VirtualMachineBMC
metadata:
  name: hosted-worker-0-bmc
  namespace: ${WORKER_NAMESPACE}
spec:
  virtualMachineName: hosted-worker-0
  virtualMachineNamespace: ${WORKER_NAMESPACE}
  protocol: redfish
  credentials:
    username: admin
    password: password
EOF

# Worker 1
cat <<EOF | oc apply -f -
apiVersion: virtualmachinebmc.bmc.tinkerbell.org/v1alpha1
kind: VirtualMachineBMC
metadata:
  name: hosted-worker-1-bmc
  namespace: ${WORKER_NAMESPACE}
spec:
  virtualMachineName: hosted-worker-1
  virtualMachineNamespace: ${WORKER_NAMESPACE}
  protocol: redfish
  credentials:
    username: admin
    password: password
EOF

# Worker 2
cat <<EOF | oc apply -f -
apiVersion: virtualmachinebmc.bmc.tinkerbell.org/v1alpha1
kind: VirtualMachineBMC
metadata:
  name: hosted-worker-2-bmc
  namespace: ${WORKER_NAMESPACE}
spec:
  virtualMachineName: hosted-worker-2
  virtualMachineNamespace: ${WORKER_NAMESPACE}
  protocol: redfish
  credentials:
    username: admin
    password: password
EOF

# Verify BMC services are created
oc get svc -n ${WORKER_NAMESPACE} -l virtualmachinebmc.bmc.tinkerbell.org/name

7.2 Get BMC Endpoints

# Function to get BMC endpoint
get_bmc_endpoint() {
  local worker_name=$1
  local svc_name=$(oc get svc -n ${WORKER_NAMESPACE} \
    -l virtualmachinebmc.bmc.tinkerbell.org/name=${worker_name}-bmc \
    -o jsonpath='{.items[0].metadata.name}')
  local bmc_ip=$(oc get svc -n ${WORKER_NAMESPACE} ${svc_name} \
    -o jsonpath='{.spec.clusterIP}')
  local bmc_port=$(oc get svc -n ${WORKER_NAMESPACE} ${svc_name} \
    -o jsonpath='{.spec.ports[0].port}')
  echo "redfish://${bmc_ip}:${bmc_port}/redfish/v1/Systems/1"
}

# Get endpoints for all workers
export BMC_WORKER_0=$(get_bmc_endpoint hosted-worker-0)
export BMC_WORKER_1=$(get_bmc_endpoint hosted-worker-1)
export BMC_WORKER_2=$(get_bmc_endpoint hosted-worker-2)

echo "Worker 0 BMC: ${BMC_WORKER_0}"
echo "Worker 1 BMC: ${BMC_WORKER_1}"
echo "Worker 2 BMC: ${BMC_WORKER_2}"

# Test BMC endpoint (optional)
curl -k -u admin:password ${BMC_WORKER_0}/redfish/v1 | jq

7.3 Create BareMetalHost Resources

export BMC_USERNAME=$(echo -n "admin" | base64 -w0)
export BMC_PASSWORD=$(echo -n "password" | base64 -w0)

# Worker 0
export WORKER_NAME="hosted-worker-0"
export WORKER_MAC="52:54:00:aa:bb:01"
export BMC_ENDPOINT="${BMC_WORKER_0}"

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: ${WORKER_NAME}-bmc-secret
  namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
type: Opaque
data:
  username: ${BMC_USERNAME}
  password: ${BMC_PASSWORD}
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ${WORKER_NAME}
  namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
  labels:
    infraenvs.agent-install.openshift.io: ${HOSTED_CLUSTER_NAME}
    worker-role: database
    worker-zone: zone-a
  annotations:
    inspect.metal3.io: disabled
    bmac.agent-install.openshift.io/hostname: ${WORKER_NAME}.${BASEDOMAIN}
spec:
  automatedCleaningMode: disabled
  online: true
  bootMACAddress: "${WORKER_MAC}"
  bmc:
    address: ${BMC_ENDPOINT}
    credentialsName: ${WORKER_NAME}-bmc-secret
    disableCertificateVerification: true
EOF

# Worker 1
export WORKER_NAME="hosted-worker-1"
export WORKER_MAC="52:54:00:aa:bb:02"
export BMC_ENDPOINT="${BMC_WORKER_1}"

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: ${WORKER_NAME}-bmc-secret
  namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
type: Opaque
data:
  username: ${BMC_USERNAME}
  password: ${BMC_PASSWORD}
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ${WORKER_NAME}
  namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
  labels:
    infraenvs.agent-install.openshift.io: ${HOSTED_CLUSTER_NAME}
    worker-role: compute
    worker-zone: zone-b
  annotations:
    inspect.metal3.io: disabled
    bmac.agent-install.openshift.io/hostname: ${WORKER_NAME}.${BASEDOMAIN}
spec:
  automatedCleaningMode: disabled
  online: true
  bootMACAddress: "${WORKER_MAC}"
  bmc:
    address: ${BMC_ENDPOINT}
    credentialsName: ${WORKER_NAME}-bmc-secret
    disableCertificateVerification: true
EOF

# Worker 2
export WORKER_NAME="hosted-worker-2"
export WORKER_MAC="52:54:00:aa:bb:03"
export BMC_ENDPOINT="${BMC_WORKER_2}"

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: ${WORKER_NAME}-bmc-secret
  namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
type: Opaque
data:
  username: ${BMC_USERNAME}
  password: ${BMC_PASSWORD}
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ${WORKER_NAME}
  namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
  labels:
    infraenvs.agent-install.openshift.io: ${HOSTED_CLUSTER_NAME}
    worker-role: compute
    worker-zone: zone-c
  annotations:
    inspect.metal3.io: disabled
    bmac.agent-install.openshift.io/hostname: ${WORKER_NAME}.${BASEDOMAIN}
spec:
  automatedCleaningMode: disabled
  online: true
  bootMACAddress: "${WORKER_MAC}"
  bmc:
    address: ${BMC_ENDPOINT}
    credentialsName: ${WORKER_NAME}-bmc-secret
    disableCertificateVerification: true
EOF

7.4 Monitor BareMetalHost and Agent Registration

# Watch BareMetalHosts
watch -n 5 "oc get bmh -n ${HOSTED_CONTROL_PLANE_NAMESPACE}"

# Expected progression:
# - registering → provisioning → provisioned

# In another terminal, watch KubeVirt VMs
watch -n 5 "oc get vm -n ${WORKER_NAMESPACE}"

# VMs should start (running: true) when Metal3 powers them on

# Watch for Agents to appear
watch -n 5 "oc get agents -n ${HOSTED_CONTROL_PLANE_NAMESPACE}"

# Check Agent details with BMH mapping
oc get agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} -o jsonpath='{range .items[*]}BMH: {@.metadata.labels.agent-install\.openshift\.io/bmh} Agent: {@.metadata.name} State: {@.status.debugInfo.state} Approved: {@.spec.approved}{"\n"}{end}'

Phase 8: NodePool Creation

8.1 Create NodePool - All Workers

cat <<EOF | oc apply -f -
apiVersion: hypershift.openshift.io/v1beta1
kind: NodePool
metadata:
  name: ${HOSTED_CLUSTER_NAME}-workers
  namespace: ${CLUSTERS_NAMESPACE}
spec:
  clusterName: ${HOSTED_CLUSTER_NAME}
  replicas: 3
  management:
    autoRepair: false
    upgradeType: InPlace
  platform:
    type: Agent
    agent:
      agentLabelSelector:
        matchLabels: {}  # Select any available agent
  release:
    image: quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE}-x86_64
EOF

8.2 Create NodePool - Label-Based Selection

Alternative: Create multiple NodePools targeting specific workers:

# NodePool for database workers (zone-a)
cat <<EOF | oc apply -f -
apiVersion: hypershift.openshift.io/v1beta1
kind: NodePool
metadata:
  name: ${HOSTED_CLUSTER_NAME}-db
  namespace: ${CLUSTERS_NAMESPACE}
spec:
  clusterName: ${HOSTED_CLUSTER_NAME}
  replicas: 1
  management:
    autoRepair: false
    upgradeType: InPlace
  platform:
    type: Agent
    agent:
      agentLabelSelector:
        matchLabels:
          worker-role: database
          worker-zone: zone-a
  release:
    image: quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE}-x86_64
EOF

# NodePool for compute workers (zones b and c)
cat <<EOF | oc apply -f -
apiVersion: hypershift.openshift.io/v1beta1
kind: NodePool
metadata:
  name: ${HOSTED_CLUSTER_NAME}-compute
  namespace: ${CLUSTERS_NAMESPACE}
spec:
  clusterName: ${HOSTED_CLUSTER_NAME}
  replicas: 2
  management:
    autoRepair: false
    upgradeType: InPlace
  platform:
    type: Agent
    agent:
      agentLabelSelector:
        matchLabels:
          worker-role: compute
  release:
    image: quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE}-x86_64
EOF

8.3 Monitor NodePool and Installation

# Watch NodePool status
watch -n 5 "oc get nodepool -n ${CLUSTERS_NAMESPACE}"

# Watch Agents binding to Machines
watch -n 5 "oc get agents -n ${HOSTED_CONTROL_PLANE_NAMESPACE}"

# Check Machine creation
watch -n 5 "oc get machines -n ${HOSTED_CONTROL_PLANE_NAMESPACE}"

# Expected Agent states progression:
# insufficient → known-unbound → binding → installing → installing-in-progress → added-to-existing-cluster

# Check detailed agent state
oc get agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} -o jsonpath='{range .items[*]}Agent: {@.metadata.name} BMH: {@.metadata.labels.agent-install\.openshift\.io/bmh} State: {@.status.debugInfo.state} Progress: {@.status.progress.progressInfo}{"\n"}{end}'

Verification and Testing

Get Hosted Cluster Kubeconfig

# Generate kubeconfig for hosted cluster
hypershift create kubeconfig --name=${HOSTED_CLUSTER_NAME} > ${HOSTED_CLUSTER_NAME}-kubeconfig

# Use hosted cluster kubeconfig
export KUBECONFIG=${HOSTED_CLUSTER_NAME}-kubeconfig

# Wait for nodes to appear
watch -n 5 "oc get nodes"

Verify Nodes

# Check nodes
oc get nodes -o wide

# Verify node labels from BareMetalHost
oc get nodes --show-labels | grep worker-

# Check that static IPs are assigned
oc get nodes -o custom-columns=NAME:.metadata.name,IP:.status.addresses[0].address

# Verify nodes have correct zone labels
oc get nodes -L worker-zone,worker-role

Run Test Workload

# Create test deployment
oc create deployment nginx --image=nginxinc/nginx-unprivileged:latest --replicas=3

# Wait for pods
oc wait --for=condition=Ready pod -l app=nginx --timeout=300s

# Check pod distribution across workers
oc get pods -o wide -l app=nginx

# Verify pods are using the worker nodes
oc get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName -l app=nginx

# Clean up
oc delete deployment nginx

Verify ISO Boot Worked

# Check VirtualMachine boot order
oc get vm -n ${WORKER_NAMESPACE} hosted-worker-0 -o yaml | grep -A 10 bootOrder

# Check that VMs booted from ISO then disk
# After installation, VMs should be booting from disk (bootOrder: 2)

# Check VM console logs (optional)
virtctl console hosted-worker-0 -n ${WORKER_NAMESPACE}
# Press Ctrl+] to exit

Troubleshooting

Issue: Discovery ISO Import Fails

# Check DataVolume status
oc describe dv agent-discovery-iso -n ${WORKER_NAMESPACE}

# Check CDI importer pod logs
oc logs -n ${WORKER_NAMESPACE} $(oc get pods -n ${WORKER_NAMESPACE} -l app=containerized-data-importer -o name | head -1)

# Common issue: ISO URL not accessible from pod
# Solution: Verify ISO URL is accessible
curl -I ${ISO_URL}

# If URL is not accessible, manually download and upload
curl -L ${ISO_URL} -o /tmp/discovery.iso
virtctl image-upload dv agent-discovery-iso \
  --image-path=/tmp/discovery.iso \
  --size=2Gi \
  --storage-class=${STORAGE_CLASS} \
  -n ${WORKER_NAMESPACE}

Issue: VMs Not Starting via BMC

# Check KubeVirtBMC logs
oc logs -n kubevirtbmc-system deployment/kubevirtbmc-controller-manager -f

# Check VirtualMachineBMC status
oc describe virtualmachinebmc -n ${WORKER_NAMESPACE}

# Manually test BMC endpoint
export BMC_IP=$(oc get svc -n ${WORKER_NAMESPACE} -l virtualmachinebmc.bmc.tinkerbell.org/name=hosted-worker-0-bmc -o jsonpath='{.items[0].spec.clusterIP}')
export BMC_PORT=$(oc get svc -n ${WORKER_NAMESPACE} -l virtualmachinebmc.bmc.tinkerbell.org/name=hosted-worker-0-bmc -o jsonpath='{.items[0].spec.ports[0].port}')

# Test Redfish API
curl -k -u admin:password http://${BMC_IP}:${BMC_PORT}/redfish/v1/Systems/1 | jq

# Test power on
curl -k -u admin:password -X POST \
  -H "Content-Type: application/json" \
  -d '{"ResetType":"On"}' \
  http://${BMC_IP}:${BMC_PORT}/redfish/v1/Systems/1/Actions/ComputerSystem.Reset

# Check if VM started
oc get vm -n ${WORKER_NAMESPACE}
oc get vmi -n ${WORKER_NAMESPACE}

Issue: Agents Not Appearing

# Check InfraEnv status
oc describe infraenv ${HOSTED_CLUSTER_NAME} -n ${HOSTED_CONTROL_PLANE_NAMESPACE}

# Check BareMetalHost status
oc describe bmh -n ${HOSTED_CONTROL_PLANE_NAMESPACE}

# Check Metal3 logs
oc logs -n openshift-machine-api deployment/metal3

# Check if VMs are actually running
oc get vmi -n ${WORKER_NAMESPACE}

# Access VM console to see boot process
virtctl console hosted-worker-0 -n ${WORKER_NAMESPACE}

# Check VM is booting from ISO
# You should see the discovery agent starting

Issue: Agents Stuck in "insufficient" State

# Check Agent validation errors
oc get agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} -o yaml | grep -A 20 validationsInfo

# Common issues:
# - Insufficient memory (need 16GB)
# - Insufficient CPU (need 4 cores)
# - No installation disk

# Verify VM resources
oc get vm -n ${WORKER_NAMESPACE} hosted-worker-0 -o jsonpath='{.spec.template.spec.domain.resources}'

# Check disk availability
oc get agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.inventory.disks}{"\n"}{end}'

Issue: Agents Not Binding to NodePool

# Check Agent labels match NodePool selector
oc get agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} --show-labels

# Check NodePool selector
oc get nodepool -n ${CLUSTERS_NAMESPACE} ${HOSTED_CLUSTER_NAME}-workers -o yaml | grep -A 5 agentLabelSelector

# Manually label Agents if needed
oc label agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} <agent-name> worker-role=database

# Check if Agents are approved
oc get agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} -o custom-columns=NAME:.metadata.name,APPROVED:.spec.approved,STATE:.status.debugInfo.state

# Approve Agents if needed
oc patch agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} <agent-name> -p '{"spec":{"approved":true}}' --type merge

Issue: Installation Failing

# Check Agent installation progress
oc get agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE} -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.progress.progressInfo}{"\n"}{end}'

# Check assisted-service logs
oc logs -n multicluster-engine deployment/assisted-service -f

# Check installer pod logs (if agent reached installing state)
oc get pods -n ${HOSTED_CONTROL_PLANE_NAMESPACE} -l app=assisted-installer

# Access VM console to see installation
virtctl console hosted-worker-0 -n ${WORKER_NAMESPACE}

Issue: VM Not Booting from Disk After Installation

# Check VM boot order
oc get vm -n ${WORKER_NAMESPACE} hosted-worker-0 -o yaml | grep -A 20 bootOrder

# Verify OS disk has content
oc get pvc -n ${WORKER_NAMESPACE} | grep os

# Check VM is running
oc get vmi -n ${WORKER_NAMESPACE}

# Access console
virtctl console hosted-worker-0 -n ${WORKER_NAMESPACE}

# If stuck on ISO, try ejecting CD-ROM
# This is a KubeVirt limitation - may need to update VM spec to remove ISO after installation

Issue: Static IP Not Applied

# Check cloud-init in VM spec
oc get vm -n ${WORKER_NAMESPACE} hosted-worker-0 -o yaml | grep -A 30 cloudInitNoCloud

# Check inside VM (via console)
virtctl console hosted-worker-0 -n ${WORKER_NAMESPACE}
# Once logged in:
ip addr show
cat /etc/sysconfig/network-scripts/ifcfg-eth0  # RHEL/CentOS
networkctl status  # systemd-networkd

Debug Commands Reference

# Management cluster resources
oc get hostedcluster -n ${CLUSTERS_NAMESPACE}
oc get nodepool -n ${CLUSTERS_NAMESPACE}
oc get bmh -n ${HOSTED_CONTROL_PLANE_NAMESPACE}
oc get agent -n ${HOSTED_CONTROL_PLANE_NAMESPACE}
oc get infraenv -n ${HOSTED_CONTROL_PLANE_NAMESPACE}
oc get machines -n ${HOSTED_CONTROL_PLANE_NAMESPACE}

# KubeVirt resources
oc get vm -n ${WORKER_NAMESPACE}
oc get vmi -n ${WORKER_NAMESPACE}
oc get dv -n ${WORKER_NAMESPACE}
oc get pvc -n ${WORKER_NAMESPACE}
oc get virtualmachinebmc -n ${WORKER_NAMESPACE}

# Logs
oc logs -n hypershift deployment/operator
oc logs -n ${HOSTED_CONTROL_PLANE_NAMESPACE} deployment/capi-provider
oc logs -n openshift-machine-api deployment/metal3
oc logs -n multicluster-engine deployment/assisted-service
oc logs -n kubevirtbmc-system deployment/kubevirtbmc-controller-manager

Clean Up

Delete Everything

# Delete NodePool
oc delete nodepool -n ${CLUSTERS_NAMESPACE} ${HOSTED_CLUSTER_NAME}-workers

# Delete HostedCluster
hypershift destroy cluster agent \
  --name=${HOSTED_CLUSTER_NAME} \
  --namespace=${CLUSTERS_NAMESPACE}

# Delete BareMetalHosts
oc delete bmh -n ${HOSTED_CONTROL_PLANE_NAMESPACE} --all

# Delete KubeVirt VMs
oc delete vm -n ${WORKER_NAMESPACE} --all

# Delete DataVolumes
oc delete dv -n ${WORKER_NAMESPACE} --all

# Delete VirtualMachineBMC
oc delete virtualmachinebmc -n ${WORKER_NAMESPACE} --all

# Delete namespaces
oc delete namespace ${WORKER_NAMESPACE}
oc delete namespace ${HOSTED_CONTROL_PLANE_NAMESPACE}

# Delete management cluster (if needed)
kcli delete cluster mgmt-cluster

Summary and Key Takeaways

This PoC demonstrates:

True KubeVirt Integration - VirtualMachine CRs, not libvirt VMs ✅ ISO Boot Support - Discovery ISO mounted as CD-ROM with proper boot order ✅ Virtual BMC Control - KubeVirtBMC provides Redfish API for Metal3 ✅ Static IP Configuration - Cloud-init for network pre-configuration ✅ Label-Based Selection - Target specific VMs via Agent labels ✅ Kubernetes-Native - Everything managed via Kubernetes APIs

Boot Flow Summary

  1. VM Createdrunning: false, ISO mounted as CD-ROM (bootOrder: 1)
  2. Metal3/BMC Powers On → VM starts, boots from ISO
  3. Discovery Agent Runs → Registers as Agent in Kubernetes
  4. NodePool Selects Agent → Based on labels
  5. Installation Begins → OS written to disk (bootOrder: 2)
  6. VM Reboots → Boots from disk (installed OS)
  7. Node Joins Cluster → Worker node ready

Advantages Over Libvirt Approach

Aspect Libvirt VMs KubeVirt VMs
Management virsh, kcli kubectl/oc
Declarative Limited Full GitOps
RBAC Host-level Kubernetes RBAC
Storage Host filesystem PVCs, CSI
Networking Libvirt networks Pod networking, Multus
Live Migration Manual KubeVirt native
Integration External to K8s Native K8s resources

This approach provides a production-ready pattern for using KubeVirt VMs as Agent platform workers in HyperShift!

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment