This week I had a very strange issue with a Hyper-V Cluster managed by Virtual Machine Manager.
Completely randomly different cluster nodes failed and I weren’t able to start failover cluster manager on one of the cluster nodes. On the infected node it self, I wasn’t able to open the hyper-v manager or server manager.
After a lot of research I found a solution from the windows server core team which pointed me to the solution.
Unable to launch Cluster Failover Manager on any node of a 2012/2012R2 Cluster
When Failover Cluster Manager is opened to manage a Cluster, it will contact all the nodes and retrieve Cluster configuration information using WMI calls. If any one of the nodes in the Cluster does not have the cluster namespace “root\mscluster” in WMI, Failover Cluster Manager will fail and give one of the below errors:
Unfortunately, it does not give any indication of which node is missing the WMI namespace. One of the ways you can check to see which one has it missing is to run the below command on each node of the Cluster.
Get-WmiObject -namespace "root\mscluster" -class MSCluster_Resource
It can be a bit tedious and time consuming if you have quite a few nodes, say like 64 of them. The below script can be run on one of the nodes that will connect to all the other nodes and check to see if the namespace is present. If it is, it will succeed. If the namespace does not exist, it will fail.
Write-Host "Imported Cluster module"
Write-Host "Getting the cluster nodes..." -NoNewline
$nodes = Get-ClusterNode
Write-host "Found the below nodes "
Write-host " "
Write-host "Running the WMI query...."
Write-host " "
ForEach ($Node in $nodes)
Write-Host -NoNewline $node
if($Node.State -eq "Down")
Write-Host -ForegroundColor White " : Node down skipping"
$result = (get-wmiobject -class "MSCluster_CLUSTER" -namespace "root\MSCluster" -authentication PacketPrivacy -computername $Node -erroraction stop).__SERVER
Write-host -ForegroundColor Green " : WMI query succeeded "
Write-host -ForegroundColor Red -NoNewline " : WMI Query failed "
In the below example, you can see that one of the nodes failed.
To correct the problem, you would need to run the below from an administrative command prompt on the “failed” node(s).
Once the Cluster WMI has been added back, you can successfully open Failover Cluster Management. There is no restart of the machine or the Cluster Service needed.
Quote: Microsoft Ask the Core Team Blog
In my case I wasn’t able to fix it so easy because the server vendor implemented the WMI Provider directly in his BMC via Agent (for the interested ones Fujitsu). during the process of recompiling the WMI for the Cluster the whole Server Network interfaces and BMC fail.
so my fix:
- shutdown the server
- make it powerless
- start it
- check cluster (everything fine)
- uninstall the (fucking) agent
Since than it worked.