Friday, June 21, 2013

Failed Cluster While Backing Up Windows Server 2012 Hyper-V CSV's

Last week I had the opportunity to do something other than SCOM and get my hands dirty with a new Windows Server 2012 Hyper-V cluster project for one of our customers.

The build went well and I used the excellent Hyper-V Installation and Configuration Guide book along with this blog post on Windows Server 2012 Hyper-V Best Practices (In Easy Checklist Form) to double and then triple check all was deployed to best practice recommendations.


After we P2V'd a number of the customers physical machines and configured the DPM 2012 SP1 backup application to perform host-level backups (i.e. backing up the entire virtual machine while it's still running using the agent deployed to a Hyper-V host), we started seeing problems with performance on the cluster nodes and then ultimately inside each virtual machine running on the cluster.

Leaving the cluster in this state resulted in both hosts becoming unresponsive and everything grinding to a halt. We had to power cycle down the SAN (FC connected HP) and both hosts to get everything back online - not cool :(




So, after a small bit of searching for an answer we came across the following two recently released Hotfixes from Microsoft that needed to be applied to the Hyper-V hosts:

When I spoke with some of my MVP buddies about this particular issue, I was pointed in the direction of this excellent script from Hans Vredevoort over at Hyper-V.nu:

Updated: Windows Server 2012 Hyper-V and Cluster Hotfixes and Updates

This script can be run against your Windows Server 2012 hosts to report on any missing hotfixes or updates that should be applied. Although we thought we had deployed all the relevant updates to our customers cluster, once we ran the script it was apparent that there were a few more that still needed to be added in.

Edit: Microsoft's Cristian Edwards has mentioned in his comment below that there are some additional and updated scripts that you can also use to ensure you keep your Hyper-V hosts up to date. Check out his updated blog post here:

http://blogs.technet.com/b/cedward/archive/2013/05/31/validating-hyper-v-2012-and-failover-clustering-hotfixes-with-powershell-part-2.aspx

Conclusion

If you're running a Hyper-V 2012 cluster and are using ANY backup application (not just specifically DPM 2012 SP1) that performs host-level backups of virtual machines located on Cluster Shared Volumes (CSV's), then I'd highly recommend you install the above two hotfixes at least and then also run Hans' script to see what else you need to get deployed.

With the hotfixes applied to the environment, it works perfectly now!

4 comments:

  1. Great post I felt your pain whilst waiting for the hotfixes to be released. It's worth having a quick look at whether your storage device supports ODX too this can cause issues with backup runs in the same situation if the device doesn't support ODX. It's now turned on by default.

    ReplyDelete
    Replies
    1. Good call and thanks for the comment - I've already checked but have seen ODX mentioned several times in relation to the cluster fail issues.

      Kevin.

      Delete
    2. Please check http://blogs.technet.com/cedward for latest version of the script. You will find two different scripts. One for standalone hosts and one for clusters. The second one only requires the name of the cluster instead of each node name. Regards, cedwardpfe

      Delete
    3. Thanks Cristian, updated the post with a reference to yours now.

      Kevin.

      Delete