Veeam Partial-site Failovers Failing During Second VM boot

As a Disaster Recovery service provider Managecast uses Veeam to provide virtual machine failover into its VMware Cloud Director (VCD) DR environment. However, recently we’ve run into an issue during partial-site failovers using Veeam and the Network Extension Appliance (NEA).

When you initiate a partial-site failover using Veeam will bring up the NEA and configure a VPN tunnel and ARP proxy so that it can forward traffic to and from the powered-on replica VM in the DR environment. This typically works well, even if you select multiple VMs and perform a failover now on all of them at the same time. However, an issue can occur when you select multiple VMs and initiate a planned failover or start a failover now of multiple VMs sequentially rather than all at once.

This seems to be caused by the fact that VCD reconfigures all VMs in the vApp if one of the VMs is reconfigured. Veeam configures the NEA at the start of the failover and attaches vCenter networks that VCD is not aware of. When VCD reconfigures the NEA when the second failover VM is powered on it is not aware of the vCenter network and so disconnects that network from the NEA VM, causing the failover to fail due to loss of connectivity.

Veeam support confirmed this behavior and luckily have identified the issue it has been resolved in VMware Cloud Director version 10.3.2. They have provided a hotfix for this version, but it will not work for earlier versions.

If you are unable to upgrade to VCD version 10.3.2, there is also a workaround. When a NEA is deployed through Veeam it is placed into the default Cloud Connect vApp in VCD. The workaround is to move the NEA into a separate vApp that does not contain any of the DR replica VMs. This way, when subsequent replica failovers start, the NEA will not be reconfigured.

Exit mobile version