I've just seen this one landing by RSS into my inbox from the official System Center Operations Manager Engineering Blog posted by Microsoft's J.C. Hornbeck about a hotfix that you may (or may not) have deployed to your SCOM/OpsMgr management servers that essentially brings them to deadlock resulting in heartbeat failures, grey states and heaps of Event ID 2115 entries in your Windows Event logs.
The KB2775511 hotfix rollup for Windows 7 SP1 and Windows Server is the culprit and if you've deployed it onto your SCOM management servers, you'll need to remove it ASAP.
Update 15th November 2013: I've just seen a comment on this post below informing me that Microsoft have released a hotfix for this exact problem. Check it out here:
SCOM 2012 or SCOM 2007 R2 throws a "Heartbeat Failure" message and then goes into a greyed out state in Windows Server 2008 R2 SP1
Here's what the OpsMgr engineering team have to say:
"Removal of KB2775511 will correct the issues introduced. The OpsMgr team recommends that Operations Manager users refrain from installing KB2775511 until this deadlock issue is resolved. New information will be posted as it becomes available."
Better to be safe and sorry with this one and to check out your servers today to ensure this hotfix rollup hasn't been installed. It's also worthwhile to forward this information onto any customers or colleagues that have their own deployments of SCOM so they can check for themselves.
I have heard that now a hotfix for KB27775511 is available: http://support.microsoft.com/kb/2878378
ReplyDeleteThanks - I've updated the post now :)
Delete