Microsoft SQL Server High Availability


Microsoft SQL Server Failover Cluster

When protecting a MSSQL Failover Cluster with Zerto the key consideration is consistency of the database RDMs and cluster itself.


Note. When protecting A Failover Cluster to DRaaS (through a ZCC) the Cluster witness must be a file share witness, Disk-based witnesses are NOT supported.


Only the Primary Active Node in an Active/Passive Cluster should be protected by Zerto. Protecting both nodes with Zerto is not recommended for the following reasons:


  1. This will require double the storage space as each replicated node will be converted to non-shared VMDKs.
  2. If the active node is not changed often then a large amount of data will require replication after switching to the passive node.
  3. A failover operation will be complex as the active node will be using non shared VMDKs and will therefore be run as a non-clustered instance with no HA functionality and will need manual intervention to evict disks from the cluster instance and brought online as standalone disks in Disk Management.


If the Active Node role is switched to a non-Zerto-protected node, then any changes made to the RDMs are not replicated. Once the Active Node role is moved back to the Zerto-protected node the cluster will be in an inconsistent state, as the target RDMs contain data Zerto did not replicate. Performing a Force-Sync operation on the cluster VPG will return the cluster to a protected and consistent state.


This operation scans both the source and target RDMs / VMDKs then replicates any changes and inconsistencies found. The Force-Sync operation can be initiated manually to maintain cluster consistency during maintenance etc. For example, during cluster maintenance, when the administrator changes the Active Node role back to the Zerto protected node, their final action should be to select Force-Sync in the Zerto GUI. 

Note: Performing a Force-Sync operation will attempt to preserve the journal in some capacity, however it may reset the journal on the VPG, removing the ability to recover to previous point in time before the operation was performed. It is therefore recommended to only perform this operation out of working hours if the journal still contains consistent checkpoints that are required.


Automating Active Node Change by Script

An alternative to manually performing Force-Sync operations for maintaining consistency in shared disk clusters is to automate this operation using the Zerto PowerShell SQL Cluster script found in the Zerto Tech Marketing GitHub https://github.com/Zerto-Tech-Marketing/MSSQL_MSCS_Failover.


Note: No script is required for MSSQL Clustering support; it can simply be used to automate the manual process for maintaining consistency after cluster failover/failback operations.


The first step to utilizing this script is to create two VPGs: one protecting the current active node only and a second protecting the current passive node only.


Note. As both nodes will be protected this will require double the storage space as each replicated node will be converted to non-shared VMDKs.


The script should then be scheduled to be run directly on both SQL nodes every 1 minute. Its purpose is to check the active SQL node is the node protected by Zerto and automatically un-pause the relevant VPG and perform a Force-Sync if this is ever changed. The script will also pause the formerly active SQL nodes VPG to clearly indicate that the passive SQL node is not being replicated. Further information is provided in the comments section at the beginning of the script.


Post-Failover Configuration

Only the active node VPG should be recovered in a failover scenario, this will bring the cluster up as a single node instance using non shared VMDKs and will therefore be run as a non-clustered instance with no HA functionality and will need manual intervention to:


  1. Evict the original RDM disks (which are now non-shared VMDKs) from the cluster releasing the disk reservation.
  2. In Disk Management bring any offline disk online in read/write mode.
  3. Confirm you can now see the original RDM disk partitions and data files.
  4. Bring the MSSQL Cluster role online.
  5. Verify the role starts correctly and you can access the MSSQL instance, and the databases as expected.


Zerto can automatically change the IP address of VMs as part of a failover or failover test operation. However, if a MSSQL cluster requires a new IP address on the target site, this feature should not be used. This is due to issues with clusters and IP changes that can require manual intervention as part of a failover operation that significantly increase the RTO and complexity.


Failback

TBC....



Failover Testing

To successfully perform a non-disruptive failover test of a Microsoft SQL virtual machine configured as an RDM-based Failover Cluster Instance, Active Directory and DNS services are required to be online in the failover test isolated network. Therefore, Zerto recommends protecting an Active Directory Domain Controller, configured as a Global Catalog and the primary or secondary DNS server for the SQL Server virtual machine. Zerto can then be used to bring an up-to-date copy of Active Directory online with ease for failover testing.


Note. The Active Directory virtual machine should never be recovered to previous points in time in a production/live failover. Therefore, Zerto recommends placing the Active Directory virtual machine in its own VPG and assigning both failover and failover test Network Adapters in the virtual machine to connect to an isolated test network. Zerto recommends adhering Microsoft best practices for Active Directory for production/live failovers.


Note: When booting Active Directory in an isolated test network, a minimum five-minute window is required for Active Directory services to come fully online to allow the cluster services to start.


Limitations


Zerto MSSQL Failover Cluster support is not compatible with the following:

  1. Active / Active cluster – All SQL instances must run on the same node.
  2. Protecting Cluster VMs using iSCSI in-guest initiators to access shared cluster disks.