How do you get your System Center Virtual Machine Manager really highly available

Sometimes when I’m invited to visit a customer to “optimize their high available virtual machine manager”, I normally see following configuration.

VMMHA01

 

When I ask why they say it is high available, they normally tell me that they can move the machine from one host to another. Normally i ask now “And what happens when you need to patch the SQL DB, VMM or Windows Server or the storage fail?”

Here comes the point where most people realize that high availability means other things than moving services from A to B.

So now let us think what we need to get our VMM Server high available.

On the VMM Site we need following parts:

  • two VMM Management Servers running in a Cluster
  • two Database Servers running in a Cluster
  • two Fileserver running in a Cluster as Library server
  • two Hyper-V Hosts for VM Placement
  • two Storages with Storage Replication

VMMHA02

 

When it comes to a very big Hyper-V and VMM Environment, I would suggest to run you Management Systems in a separated Hyper-V Cluster. That helps you to keep your VM workload running even when you need to take down your fabric cluster in maintenance mode.

VMMHA03

How to plan redundancy for Scale out Fileser

Hey everybody,

after I posted some of my thoughts I normally put behind Hyper-V redundancy, today I want to show you some examples how you could plan redundancy for Scale out Fileserver.

When to choose a redundancy where only one or two cluster nodes can fail?

That is the most common and easiest why for node redundancy in a cluster. It means you have enough nodes in your cluster to cover one or two node failures. You would choose that cluster config when all of your nodes are in one datacenter or server room and you need no geo-redundant storage solution. Please notice, for a JBOD based Scale out Filserver you need at least a minimum of three JBODs. For converged Scale out Fileserver with Windows Server 2016 you will need 4 equal Scale out Fileserver Systems.

Sofs01

Traditional Scale out Fileserver with Storage Spaces and JBODs

sofs02

Traditional Scale out Fileserver with SAN Storage Backend

sofs03

Scale out Fileserver with Storage Spaces Direct in Windows Server 2016

When to choose a redundancy where you can choose half of the nodes?

In this scenario you can lose one half of your nodes but you need to fulfill some more requirements like storage replications or direct WAN links. You would normally use if you want to keep your services alive if one datacenter or serverroom fails.

sofs04

With Storage Spaces Direct in Windows Server 2016 and RDMA RoCE

sofs05

Scale out Fileserver with classic SAN storage replication

How to plan redundancy for Hyper-V Cluster

Hi everybody,

today again a post out of my daily business. When I’m out in the field and I plan a new cluster, I also need to decide how many and what type cluster redundancy I need to implement. For that I have some thing like a blueprint or decision matrix in my mind which I leverage.

Today I want to give you a small view into this matrix. 🙂

When to choose a redundancy where only one or two cluster nodes can fail?

That is the most common and easiest why for node redundancy in a cluster. It means you have enough nodes in your cluster to cover one or two node failures. You would choose that cluster config when all of your nodes are in one datacenter or server room and you have no additional space or need to replicate your virtual machines.

fail02

Cluster operating with one storage

fail01

Cluster operating with two storages

fail03

Hyper-V Hyperconverged with Windows Server 2016

When to choose a redundancy where you can choose half of the nodes?

In this scenario you can lose one half of your nodes but you need to fulfill some more requirements like storage replications or direct WAN links. You would normally use if you want to keep your services alive if one datacenter, server room or blade center fail.

fail04

Datacenter redundancy with storage

fail05

Redundancy with compute and storage blades

fail06

Different locations with Hyperconverged Hyper-V in Windows Server 2016

When to choose replication?

I normally prefer Hyper-V replication only as a warm standby option. That could be an option for example when you want to secure your datacenter and have no storage replication so that you can reboot your virtual machines on other hardware.

Replications is no replacement for a cluster and I would not recommend to replicate databases, exchange server, domain controller or other applications where the vendor officially supports replication.

Short checklist what you should do after you configured a Microsoft Failover Cluster

Hi everybody,

today I will provide a short checklist what I do after I configured a Microsoft Failover Cluster.

I need to say, the blogpost is inspired by some consultants who think they are so gifted with fucking awesomeness that they can install a (mal)functioning Hyper-V Cluster incl. System Center Virtual Machine Manager with all Components and Software Defined Network in only 6 hours in whole and even don’t know what a VLAN or IP Subnet is.

So than let us start.

We are now on the point that you successfully installed your failover cluster.

2015-08-30_12-15-17

  1. You need to configure the Cluster Quorum and Witness for your cluster. I would suggest you to use the same witness typ like the storage you use. So if you use a SMB File based storage you should use a fileshare witness or even with Server 2016 an Azure Witness. If you use a block storage, you should use a disk witness on the storage your hosting you LUNs with. Mixing up different types of storage and witness in a cluster could sometimes a bit troublemaking. Best Practice is to use disk witness if possible. When you are using fileshare witness never open a fileshare on a Host within you cluster or a virtual machine which is running on the cluster. The could properly result in some issues or even a split brain issue during maintenance or failure scenarios.
  2. After you configured the quorum, you should configure the communication of your cluster heartbeat. Therefor you can use the following small script.
  3. Configure the firewalls of you cluster nodes. NO NOT DISABLE THE FIREWALL, configure it as it is needed for your service. The reason why you shouldn’t disable the firewall is that at first you lose a security layer and open gates for attacks within the network. The second is that some windows services and applications may not function right with disabled firewall.
  4. Afterwards you need to configure the Active Directory Organizational Unit delegation so that the cluster service can create and change objects within the active directory. That is needed to create cluster aware update or new cluster roles. Delegation of Cluster Machine Accounts with Active Directory
  5. If you need  or wish to configure Kerberos constrained delegation, now is the point to do so for your cluster.
  6. Configure cluster aware updating for cluster. Starting with Cluster-Aware Updating: Self-Updating
  7. Configure your backup
  8. Make Failover tests for all cluster nodes, cluster roles and services and test you backup and the recovery
  9. Last but not DOCUMENTATION. Document what you have done, so that also your coworkers can see how awesome you are 😉

I hope that helps you a bit in your daily work.

Cheers,

Flo

Updating permission setting for folder ‘ ‘ failed when I install a high available SQL Instance on cluster shared volume

Hey everybody,

the following blog is more to remind my self on a mistake I do very often. 😉

When I install a SQL Failover Cluster and a High Available Instance on a cluster shared volume, I get the error “Updating permission setting for folder ‘ ‘ failed”.

IMG-20150725-WA0002

There are different posts to solve the issue more or less complex.

SQL Installation Error updating permisson setting for folder

SQL Server 2008 installation will fail if the setup account does not have certain user rights

Permission error installing Failover Cluster instance

In my case the solution was pretty easy. I mostly forget to create a sub directory for SQL Databases and Files on the cluster shared volume. So as example:

False:

C.\ClusterStorage\SQL-Backup\ <- will give you the error

Right:

C:\ClusterStorage\SQL-Backup\Files\ <- will work fine

So easy solution for the error, create a subfolder and install use that path during installation.