Skip to content

Instantly share code, notes, and snippets.

@ShyamsundarR
Created November 11, 2025 00:14
Show Gist options
  • Select an option

  • Save ShyamsundarR/6944fc676524c717e88189cd3a30ff95 to your computer and use it in GitHub Desktop.

Select an option

Save ShyamsundarR/6944fc676524c717e88189cd3a30ff95 to your computer and use it in GitHub Desktop.

DRCluster Custom Resource Definition

Overview

DRCluster is a cluster-scoped custom resource that represents a managed cluster participating in disaster recovery (DR) operations. It defines the DR characteristics of a managed cluster, including its region, S3 profile for metadata storage, network CIDRs for fencing operations, and the desired fencing state. The DRCluster controller validates cluster connectivity, manages cluster fencing/unfencing operations, deploys DR operator components, and handles maintenance modes during failover operations.

API Group and Version

  • Group: ramendr.openshift.io
  • Version: v1alpha1
  • Kind: DRCluster
  • Scope: Cluster

Resource Structure

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: <cluster-name>
spec:
  # DRCluster specification
status:
  # DRCluster observed state

Spec Fields

region (Required)

  • Type: string
  • Description: Identifies the DR group for the managed cluster. All managed clusters in the same region are considered part of a synchronous replication group for Metro DR scenarios.
  • Immutable: Yes
  • Validation: Cannot be changed after creation

Example:

spec:
  region: "us-east-1"

s3ProfileName (Required)

  • Type: string
  • Description: Name of the S3 profile (defined in Ramen operator configuration) used to store and restore persistent volume (PV) related cluster state during recovery or relocate actions. This S3 profile must be available for successful workload migration to this cluster. For applications active on this cluster, their PV-related state is stored to S3 profiles of all other DRClusters in the same DRPolicy.
  • Immutable: Yes
  • Validation: Must reference a valid S3 profile in the Ramen configuration

Example:

spec:
  s3ProfileName: "s3-profile-east"

cidrs (Optional)

  • Type: []string
  • Description: List of CIDR strings representing the network ranges used or potentially used by nodes in this managed cluster. These CIDRs are used for cluster fencing operations in sync/Metro DR scenarios to block network access during failover.
  • Validation: Each CIDR must be in valid format (e.g., "192.168.1.0/24")

Example:

spec:
  cidrs:
    - "192.168.1.0/24"
    - "10.0.0.0/16"

clusterFence (Optional)

  • Type: ClusterFenceState (enum)
  • Description: Determines the desired fencing state of the cluster
  • Valid Values:
    • Unfenced: Cluster is not fenced and is operational
    • Fenced: Cluster should be fenced (network access blocked)
    • ManuallyFenced: Cluster has been manually fenced by administrator
    • ManuallyUnfenced: Cluster has been manually unfenced by administrator

Example:

spec:
  clusterFence: Unfenced

Status Fields

phase

  • Type: DRClusterPhase
  • Description: Current lifecycle phase of the DRCluster
  • Possible Values:
    • Available: DRCluster is validated and available for use
    • Starting: Initial reconciliation in progress
    • Fencing: Fencing operation in progress
    • Fenced: Cluster has been successfully fenced
    • Unfencing: Unfencing operation in progress
    • Unfenced: Cluster has been successfully unfenced

conditions

  • Type: []metav1.Condition
  • Description: Standard Kubernetes conditions reflecting the current state

Condition Types:

  1. Validated

    • Indicates whether the DRCluster has been validated
    • Reasons:
      • Succeeded: Cluster successfully validated
      • Initializing: Validation in progress
      • ConfigMapGetFailed: Failed to get configuration
      • DrClustersDeployFailed: Failed to deploy DR components
      • s3ConnectionFailed: S3 connection validation failed
      • s3ListFailed: S3 list operation failed
  2. Fenced

    • Indicates the fencing state of the cluster
    • Reasons:
      • Fencing: Fencing operation in progress
      • Fenced: Successfully fenced
      • Unfencing: Unfencing operation in progress
      • Unfenced: Successfully unfenced
      • FenceError: Fencing operation failed
      • UnfenceError: Unfencing operation failed
  3. Clean

    • Indicates whether NetworkFence resources exist for this cluster
    • Reasons:
      • Clean: No fencing CRs present
      • Fencing/Unfencing/Cleaning: Operations in progress
      • CleanError: Cleanup operation failed

maintenanceModes

  • Type: []ClusterMaintenanceMode
  • Description: List of active maintenance modes on the cluster, typically used during regional DR failover operations

ClusterMaintenanceMode Fields:

  • storageProvisioner (string): Type of storage provisioner
  • targetID (string): Storage or replication instance identifier
  • state (MModeState): Current state of the maintenance mode
  • conditions ([]metav1.Condition): Conditions from the MaintenanceMode resource

Annotations

The DRCluster controller uses annotations for storage-specific configuration:

  • drcluster.ramendr.openshift.io/storage-secret-name: Name of storage secret
  • drcluster.ramendr.openshift.io/storage-secret-namespace: Namespace of storage secret
  • drcluster.ramendr.openshift.io/storage-clusterid: Storage cluster identifier
  • drcluster.ramendr.openshift.io/storage-driver: Storage driver name (e.g., CSI driver)

Labels

The controller automatically adds the following labels:

  • cluster.open-cluster-management.io/backup: Set to appropriate value for OCM backup integration

Finalizers

  • drclusters.ramendr.openshift.io/ramen: Ensures proper cleanup on deletion

Usage Examples

Basic DRCluster Definition

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: cluster1
spec:
  region: "east"
  s3ProfileName: "s3-profile-east-1"
  cidrs:
    - "192.168.1.0/24"

DRCluster with Storage Annotations

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: cluster2
  annotations:
    drcluster.ramendr.openshift.io/storage-driver: "openshift-storage.rbd.csi.ceph.com"
    drcluster.ramendr.openshift.io/storage-secret-name: "rook-csi-rbd-provisioner"
    drcluster.ramendr.openshift.io/storage-secret-namespace: "openshift-storage"
    drcluster.ramendr.openshift.io/storage-clusterid: "openshift-storage"
spec:
  region: "west"
  s3ProfileName: "s3-profile-west-1"
  cidrs:
    - "10.0.0.0/16"
    - "10.1.0.0/16"
  clusterFence: Unfenced

Fencing a Cluster

To fence a cluster during a disaster scenario:

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: cluster1
spec:
  region: "east"
  s3ProfileName: "s3-profile-east-1"
  cidrs:
    - "192.168.1.0/24"
  clusterFence: Fenced  # Change to Fenced

Unfencing a Cluster

To unfence a previously fenced cluster:

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: cluster1
spec:
  region: "east"
  s3ProfileName: "s3-profile-east-1"
  cidrs:
    - "192.168.1.0/24"
  clusterFence: Unfenced  # Change to Unfenced

Lifecycle Management

Cluster Fencing

When a cluster needs to be fenced (e.g., during failover):

  1. Admin sets spec.clusterFence to Fenced
  2. Controller identifies peer cluster(s) in the same region (via DRPolicy)
  3. Controller creates a NetworkFence ManifestWork on the peer cluster
  4. NetworkFence resource blocks network traffic from the fenced cluster's CIDRs
  5. Status transitions: AvailableFencingFenced
  6. Conditions updated to reflect fencing state

Cluster Unfencing

To restore a fenced cluster:

  1. Admin sets spec.clusterFence to Unfenced
  2. Controller updates the NetworkFence ManifestWork with unfenced state
  3. Network traffic is restored
  4. Status transitions: FencedUnfencingUnfenced
  5. NetworkFence resources are cleaned up
  6. Conditions updated to reflect clean state

Manual Fencing States

For clusters fenced through external mechanisms:

  • ManuallyFenced: Use when cluster is fenced outside of Ramen control
  • ManuallyUnfenced: Use when manually unfencing an externally fenced cluster

These states allow DRCluster to track fencing state without attempting automated fencing operations.

Validation

The DRCluster controller performs the following validations:

  1. S3 Profile Validation

    • Verifies S3 profile exists in Ramen configuration
    • Tests connectivity to S3 store
    • Validates list operation on S3 bucket
  2. CIDR Format Validation

    • Ensures all CIDRs are in valid format
    • Uses standard Go net.ParseCIDR validation
  3. Region Immutability

    • Prevents changes to region after creation
  4. S3ProfileName Immutability

    • Prevents changes to S3 profile after creation
  5. Deployment Validation

    • Verifies DR operator components are deployed via ManifestWork
    • Checks ManifestWork applied status

Related Resources

DRPolicy

DRClusters are referenced in DRPolicy resources to define disaster recovery relationships:

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRPolicy
metadata:
  name: dr-policy-east-west
spec:
  drClusters:
    - cluster1  # References DRCluster
    - cluster2
  schedulingInterval: "5m"

DRClusterConfig

The controller automatically creates DRClusterConfig resources on managed clusters containing:

  • Cluster ID
  • Replication schedules from associated DRPolicies

NetworkFence

During fencing operations, the controller creates NetworkFence resources on peer clusters:

apiVersion: csiaddons.openshift.io/v1alpha1
kind: NetworkFence
metadata:
  name: network-fence-cluster1
spec:
  driver: "openshift-storage.rbd.csi.ceph.com"
  fenceState: Fenced
  cidrs:
    - "192.168.1.0/24"
  secret:
    name: rook-csi-rbd-provisioner
    namespace: openshift-storage
  parameters:
    clusterID: "openshift-storage"

Maintenance Modes

During regional DR failover operations, DRCluster manages maintenance modes for storage systems:

MaintenanceMode Activation

When a DRPC (DRPlacementControl) performs failover to this cluster:

  1. Controller detects failover to this cluster
  2. Analyzes VRGs (VolumeReplicationGroups) for required storage identifiers
  3. Creates MaintenanceMode ManifestWorks for each storage provisioner
  4. Updates status.maintenanceModes with activation details

MaintenanceMode Status

Status includes information about active maintenance modes:

status:
  maintenanceModes:
    - storageProvisioner: "openshift-storage.rbd.csi.ceph.com"
      targetID: "replication-id-123"
      state: Activated
      conditions:
        - type: Available
          status: "True"
          reason: Activated

MaintenanceMode Deactivation

After failover completes:

  1. Controller detects no active failovers requiring maintenance mode
  2. Prunes inactive MaintenanceMode ManifestWorks
  3. Cleans up associated ManagedClusterViews
  4. Updates status to remove deactivated modes

Operator Deployment

When DeploymentAutomationEnabled is configured, the controller automatically:

  1. Creates namespace for DR cluster operator
  2. Deploys OLM OperatorGroup
  3. Creates Subscription for ramen-dr-cluster-operator
  4. Deploys VolSync to the managed cluster
  5. Creates/updates DRCluster operator ConfigMap

Best Practices

  1. Region Design

    • Use meaningful region names that reflect geographic or availability zones
    • Group clusters that share storage replication in the same region
  2. S3 Profile Configuration

    • Ensure S3 profiles are configured before creating DRClusters
    • Test S3 connectivity independently before cluster creation
    • Use separate S3 buckets or prefixes for different clusters
  3. CIDR Management

    • Include all current and planned node network CIDRs
    • Update CIDRs before adding new node networks
    • Ensure CIDRs don't overlap between clusters in different regions
  4. Fencing Operations

    • Test fencing in non-production environments first
    • Ensure peer cluster is healthy before fencing operations
    • Monitor NetworkFence status on peer clusters
    • Verify application failover before unfencing
  5. Monitoring

    • Watch Validated condition for deployment issues
    • Monitor phase field for operational state
    • Check maintenanceModes during failover operations
    • Review conditions for error details

Troubleshooting

DRCluster Not Validating

Symptoms: Validated condition is False

Common Causes:

  • S3 profile misconfiguration
  • S3 connectivity issues
  • Invalid CIDR format
  • ManifestWork deployment failures

Resolution:

  1. Check condition reason in status
  2. Verify S3 profile configuration in Ramen ConfigMap
  3. Test S3 connectivity from hub cluster
  4. Validate CIDR formats
  5. Check ManifestWork status on managed cluster

Fencing Operation Stuck

Symptoms: Phase remains in Fencing or Unfencing

Common Causes:

  • Peer cluster unreachable
  • NetworkFence CRD not installed on peer cluster
  • Storage driver not responding
  • Invalid storage annotations

Resolution:

  1. Verify peer cluster is healthy
  2. Check NetworkFence ManifestWork status
  3. Verify storage annotations on DRCluster
  4. Check NetworkFence status on peer cluster
  5. Review CSI driver logs on peer cluster

Maintenance Mode Issues

Symptoms: MaintenanceModes not activating during failover

Common Causes:

  • Storage identifiers not available in VRG
  • MaintenanceMode ManifestWork not applied
  • ManagedClusterView failures

Resolution:

  1. Check VRG status on source cluster
  2. Verify ManifestWork for MaintenanceMode
  3. Check ManagedClusterView for errors
  4. Review DRPC status for failover state

RBAC Requirements

The DRCluster controller requires the following permissions:

On Hub Cluster:

- apiGroups: ["ramendr.openshift.io"]
  resources: ["drclusters", "drclusters/status", "drclusters/finalizers"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

- apiGroups: ["ramendr.openshift.io"]
  resources: ["drplacementcontrols", "drpolicies"]
  verbs: ["get", "list", "watch"]

- apiGroups: ["work.open-cluster-management.io"]
  resources: ["manifestworks"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

- apiGroups: ["view.open-cluster-management.io"]
  resources: ["managedclusterviews"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

- apiGroups: ["cluster.open-cluster-management.io"]
  resources: ["managedclusters"]
  verbs: ["get", "list", "watch"]

- apiGroups: [""]
  resources: ["secrets", "configmaps"]
  verbs: ["list", "watch"]

References

API Compatibility

  • Kubernetes: v1.21+
  • Open Cluster Management: v0.9+
  • CSI Addons: v0.5+ (for fencing operations)

Change History

Version Changes
v1alpha1 Initial API version

Note: This is an alpha API and may change in future releases. Fields marked as immutable cannot be changed after resource creation and will be rejected by validation webhooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment