ShyamsundarR/drcluster-cr.md

## drcluster-cr.md

      
    Raw
  

              drcluster-cr.md
            
          
    DRCluster Custom Resource Definition

Overview

DRCluster is a cluster-scoped custom resource that represents a managed cluster
participating in disaster recovery (DR) operations. It defines the DR characteristics
of a managed cluster, including its region, S3 profile for metadata storage, network
CIDRs for fencing operations, and the desired fencing state. The DRCluster controller
validates cluster connectivity, manages cluster fencing/unfencing operations, deploys
DR operator components, and handles maintenance modes during failover operations.
API Group and Version


Group: ramendr.openshift.io
Version: v1alpha1
Kind: DRCluster
Scope: Cluster

Resource Structure

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: <cluster-name>
spec:
  # DRCluster specification
status:
  # DRCluster observed state
Spec Fields

region (Required)


Type: string
Description: Identifies the DR group for the managed cluster. All managed
clusters in the same region are considered part of a synchronous replication
group for Metro DR scenarios.
Immutable: Yes
Validation: Cannot be changed after creation

Example:
spec:
  region: "us-east-1"
s3ProfileName (Required)


Type: string
Description: Name of the S3 profile (defined in Ramen operator configuration)
used to store and restore persistent volume (PV) related cluster state during
recovery or relocate actions. This S3 profile must be available for successful
workload migration to this cluster. For applications active on this cluster,
their PV-related state is stored to S3 profiles of all other DRClusters in the
same DRPolicy.
Immutable: Yes
Validation: Must reference a valid S3 profile in the Ramen configuration

Example:
spec:
  s3ProfileName: "s3-profile-east"
cidrs (Optional)


Type: []string
Description: List of CIDR strings representing the network ranges used or
potentially used by nodes in this managed cluster. These CIDRs are used for
cluster fencing operations in sync/Metro DR scenarios to block network access
during failover.
Validation: Each CIDR must be in valid format (e.g., "192.168.1.0/24")

Example:
spec:
  cidrs:
    - "192.168.1.0/24"
    - "10.0.0.0/16"
clusterFence (Optional)


Type: ClusterFenceState (enum)
Description: Determines the desired fencing state of the cluster
Valid Values:

Unfenced: Cluster is not fenced and is operational
Fenced: Cluster should be fenced (network access blocked)
ManuallyFenced: Cluster has been manually fenced by administrator
ManuallyUnfenced: Cluster has been manually unfenced by administrator


Example:
spec:
  clusterFence: Unfenced
Status Fields

phase


Type: DRClusterPhase
Description: Current lifecycle phase of the DRCluster
Possible Values:

Available: DRCluster is validated and available for use
Starting: Initial reconciliation in progress
Fencing: Fencing operation in progress
Fenced: Cluster has been successfully fenced
Unfencing: Unfencing operation in progress
Unfenced: Cluster has been successfully unfenced


conditions


Type: []metav1.Condition
Description: Standard Kubernetes conditions reflecting the current state

Condition Types:


Validated

Indicates whether the DRCluster has been validated
Reasons:

Succeeded: Cluster successfully validated
Initializing: Validation in progress
ConfigMapGetFailed: Failed to get configuration
DrClustersDeployFailed: Failed to deploy DR components
s3ConnectionFailed: S3 connection validation failed
s3ListFailed: S3 list operation failed


Fenced

Indicates the fencing state of the cluster
Reasons:

Fencing: Fencing operation in progress
Fenced: Successfully fenced
Unfencing: Unfencing operation in progress
Unfenced: Successfully unfenced
FenceError: Fencing operation failed
UnfenceError: Unfencing operation failed


Clean

Indicates whether NetworkFence resources exist for this cluster
Reasons:

Clean: No fencing CRs present
Fencing/Unfencing/Cleaning: Operations in progress
CleanError: Cleanup operation failed


maintenanceModes


Type: []ClusterMaintenanceMode
Description: List of active maintenance modes on the cluster, typically
used during regional DR failover operations

ClusterMaintenanceMode Fields:

storageProvisioner (string): Type of storage provisioner
targetID (string): Storage or replication instance identifier
state (MModeState): Current state of the maintenance mode
conditions ([]metav1.Condition): Conditions from the MaintenanceMode resource

Annotations

The DRCluster controller uses annotations for storage-specific configuration:

drcluster.ramendr.openshift.io/storage-secret-name: Name of storage secret
drcluster.ramendr.openshift.io/storage-secret-namespace: Namespace of
storage secret
drcluster.ramendr.openshift.io/storage-clusterid: Storage cluster identifier
drcluster.ramendr.openshift.io/storage-driver: Storage driver name
(e.g., CSI driver)

Labels

The controller automatically adds the following labels:

cluster.open-cluster-management.io/backup: Set to appropriate value for
OCM backup integration

Finalizers


drclusters.ramendr.openshift.io/ramen: Ensures proper cleanup on deletion

Usage Examples

Basic DRCluster Definition

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: cluster1
spec:
  region: "east"
  s3ProfileName: "s3-profile-east-1"
  cidrs:
    - "192.168.1.0/24"
DRCluster with Storage Annotations

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: cluster2
  annotations:
    drcluster.ramendr.openshift.io/storage-driver: "openshift-storage.rbd.csi.ceph.com"
    drcluster.ramendr.openshift.io/storage-secret-name: "rook-csi-rbd-provisioner"
    drcluster.ramendr.openshift.io/storage-secret-namespace: "openshift-storage"
    drcluster.ramendr.openshift.io/storage-clusterid: "openshift-storage"
spec:
  region: "west"
  s3ProfileName: "s3-profile-west-1"
  cidrs:
    - "10.0.0.0/16"
    - "10.1.0.0/16"
  clusterFence: Unfenced
Fencing a Cluster

To fence a cluster during a disaster scenario:
apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: cluster1
spec:
  region: "east"
  s3ProfileName: "s3-profile-east-1"
  cidrs:
    - "192.168.1.0/24"
  clusterFence: Fenced  # Change to Fenced
Unfencing a Cluster

To unfence a previously fenced cluster:
apiVersion: ramendr.openshift.io/v1alpha1
kind: DRCluster
metadata:
  name: cluster1
spec:
  region: "east"
  s3ProfileName: "s3-profile-east-1"
  cidrs:
    - "192.168.1.0/24"
  clusterFence: Unfenced  # Change to Unfenced
Lifecycle Management

Cluster Fencing

When a cluster needs to be fenced (e.g., during failover):

Admin sets spec.clusterFence to Fenced
Controller identifies peer cluster(s) in the same region (via DRPolicy)
Controller creates a NetworkFence ManifestWork on the peer cluster
NetworkFence resource blocks network traffic from the fenced cluster's CIDRs
Status transitions: Available → Fencing → Fenced
Conditions updated to reflect fencing state

Cluster Unfencing

To restore a fenced cluster:

Admin sets spec.clusterFence to Unfenced
Controller updates the NetworkFence ManifestWork with unfenced state
Network traffic is restored
Status transitions: Fenced → Unfencing → Unfenced
NetworkFence resources are cleaned up
Conditions updated to reflect clean state

Manual Fencing States

For clusters fenced through external mechanisms:

ManuallyFenced: Use when cluster is fenced outside of Ramen control
ManuallyUnfenced: Use when manually unfencing an externally fenced cluster

These states allow DRCluster to track fencing state without attempting automated
fencing operations.
Validation

The DRCluster controller performs the following validations:


S3 Profile Validation

Verifies S3 profile exists in Ramen configuration
Tests connectivity to S3 store
Validates list operation on S3 bucket


CIDR Format Validation

Ensures all CIDRs are in valid format
Uses standard Go net.ParseCIDR validation


Region Immutability

Prevents changes to region after creation


S3ProfileName Immutability

Prevents changes to S3 profile after creation


Deployment Validation

Verifies DR operator components are deployed via ManifestWork
Checks ManifestWork applied status


Related Resources

DRPolicy

DRClusters are referenced in DRPolicy resources to define disaster recovery relationships:
apiVersion: ramendr.openshift.io/v1alpha1
kind: DRPolicy
metadata:
  name: dr-policy-east-west
spec:
  drClusters:
    - cluster1  # References DRCluster
    - cluster2
  schedulingInterval: "5m"
DRClusterConfig

The controller automatically creates DRClusterConfig resources on managed
clusters containing:

Cluster ID
Replication schedules from associated DRPolicies

NetworkFence

During fencing operations, the controller creates NetworkFence resources on
peer clusters:
apiVersion: csiaddons.openshift.io/v1alpha1
kind: NetworkFence
metadata:
  name: network-fence-cluster1
spec:
  driver: "openshift-storage.rbd.csi.ceph.com"
  fenceState: Fenced
  cidrs:
    - "192.168.1.0/24"
  secret:
    name: rook-csi-rbd-provisioner
    namespace: openshift-storage
  parameters:
    clusterID: "openshift-storage"
Maintenance Modes

During regional DR failover operations, DRCluster manages maintenance modes for
storage systems:
MaintenanceMode Activation

When a DRPC (DRPlacementControl) performs failover to this cluster:

Controller detects failover to this cluster
Analyzes VRGs (VolumeReplicationGroups) for required storage identifiers
Creates MaintenanceMode ManifestWorks for each storage provisioner
Updates status.maintenanceModes with activation details

MaintenanceMode Status

Status includes information about active maintenance modes:
status:
  maintenanceModes:
    - storageProvisioner: "openshift-storage.rbd.csi.ceph.com"
      targetID: "replication-id-123"
      state: Activated
      conditions:
        - type: Available
          status: "True"
          reason: Activated
MaintenanceMode Deactivation

After failover completes:

Controller detects no active failovers requiring maintenance mode
Prunes inactive MaintenanceMode ManifestWorks
Cleans up associated ManagedClusterViews
Updates status to remove deactivated modes

Operator Deployment

When DeploymentAutomationEnabled is configured, the controller automatically:

Creates namespace for DR cluster operator
Deploys OLM OperatorGroup
Creates Subscription for ramen-dr-cluster-operator
Deploys VolSync to the managed cluster
Creates/updates DRCluster operator ConfigMap

Best Practices


Region Design

Use meaningful region names that reflect geographic or availability zones
Group clusters that share storage replication in the same region


S3 Profile Configuration

Ensure S3 profiles are configured before creating DRClusters
Test S3 connectivity independently before cluster creation
Use separate S3 buckets or prefixes for different clusters


CIDR Management

Include all current and planned node network CIDRs
Update CIDRs before adding new node networks
Ensure CIDRs don't overlap between clusters in different regions


Fencing Operations

Test fencing in non-production environments first
Ensure peer cluster is healthy before fencing operations
Monitor NetworkFence status on peer clusters
Verify application failover before unfencing


Monitoring

Watch Validated condition for deployment issues
Monitor phase field for operational state
Check maintenanceModes during failover operations
Review conditions for error details


Troubleshooting

DRCluster Not Validating

Symptoms: Validated condition is False
Common Causes:

S3 profile misconfiguration
S3 connectivity issues
Invalid CIDR format
ManifestWork deployment failures

Resolution:

Check condition reason in status
Verify S3 profile configuration in Ramen ConfigMap
Test S3 connectivity from hub cluster
Validate CIDR formats
Check ManifestWork status on managed cluster

Fencing Operation Stuck

Symptoms: Phase remains in Fencing or Unfencing
Common Causes:

Peer cluster unreachable
NetworkFence CRD not installed on peer cluster
Storage driver not responding
Invalid storage annotations

Resolution:

Verify peer cluster is healthy
Check NetworkFence ManifestWork status
Verify storage annotations on DRCluster
Check NetworkFence status on peer cluster
Review CSI driver logs on peer cluster

Maintenance Mode Issues

Symptoms: MaintenanceModes not activating during failover
Common Causes:

Storage identifiers not available in VRG
MaintenanceMode ManifestWork not applied
ManagedClusterView failures

Resolution:

Check VRG status on source cluster
Verify ManifestWork for MaintenanceMode
Check ManagedClusterView for errors
Review DRPC status for failover state

RBAC Requirements

The DRCluster controller requires the following permissions:
On Hub Cluster:
- apiGroups: ["ramendr.openshift.io"]
  resources: ["drclusters", "drclusters/status", "drclusters/finalizers"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

- apiGroups: ["ramendr.openshift.io"]
  resources: ["drplacementcontrols", "drpolicies"]
  verbs: ["get", "list", "watch"]

- apiGroups: ["work.open-cluster-management.io"]
  resources: ["manifestworks"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

- apiGroups: ["view.open-cluster-management.io"]
  resources: ["managedclusterviews"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

- apiGroups: ["cluster.open-cluster-management.io"]
  resources: ["managedclusters"]
  verbs: ["get", "list", "watch"]

- apiGroups: [""]
  resources: ["secrets", "configmaps"]
  verbs: ["list", "watch"]
References


DRPolicy CRD Documentation
DRPlacementControl CRD Documentation
Ramen Operator Configuration
Cluster Fencing Guide
Regional DR Failover

API Compatibility


Kubernetes: v1.21+
Open Cluster Management: v0.9+
CSI Addons: v0.5+ (for fencing operations)

Change History


Version
Changes


v1alpha1
Initial API version


Note: This is an alpha API and may change in future releases. Fields marked
as immutable cannot be changed after resource creation and will be rejected by
validation webhooks.
No results found