How To Use VMware DRS for Resource Optimization
In the the article on using vMotion we explained how to enable vMotion on ESXi hosts and manually move machines between hosts. Although vMotion can be scheduled as almost any other task in vSphere, system admins cannot predict the future workloads nor can they keep their eyes tracking ups, downs and imbalanced distribution of loads in a virtual infrastructure.
DRS is the VMware solution to this problem that aims to give every virtual machine the resources it needs, and not necessarily to make the utilization equal on all hosts in the DRS cluster. In a lightly loaded new cluster one host can take all VMs leaving the other host with nothing to do depending on how aggressive you allow the DRS algorithm to act, and as long as all VMs resources requirements are satisfied.
VMware offers this automation feature as part of the vSphere Enterprise or Enterprise Plus of vSphere:
Start by right clicking on your datacenter and picking “New Cluster.” You can “Create a Cluster” from the “Getting Started” tab of a datacenter.
Give your cluster a descriptive name and “Turn ON” DRS for this cluster. Notice that there are other features that can be turned on for the same (or another) cluster like HA and EVC, but those are the subject of other articles.
At this point, you can start adding hosts to your cluster by moving them to it.
After creating a cluster, you can always edit and customize its settings. Actually, more advanced options will be available to you when editing an already created cluster than when you originally created it. In addition, you will have more details on the descriptions.
One of the most important settings is automation level.
Manual level does not do anything on its own and only provides the administrator with suggestions, Admins who do not trust automation (or do not fully understand how DRS works yet) may go with this option and observe the recommendations for a while to get the familiarity they need before handing their precious little VMs to DRS to baby sit.
The “partially automated” setting, on the other hand, takes charge when powering on new VMs, but will only offer suggestions in regards to already running VMs.
Fully automated not only keeps an eye on all running VMs to make sure that their need for resources is satisfied, but also acts on those recommendations in compliance with how aggressive you allow it to be.
My strong recommendation is to trust DRS to work in full automation mode. After all, this is the sort of feature that you paid a lot of money for when you bought the higher level Enterprise and Enterprise Plus licenses. In addition, regardless of how fast or accurate you think you are, you cannot keep track of resources manually at this level.
During one of the most useful session of VMworld 2011, vExpert, Greg Shields, explained DRS like a high-top table where each side of the table represents a host in the DRS cluster. DRS relocates VMs to ensure the table stays balanced. He then went on at length explaining the equations that DRS uses to calculate the priority of recommendations used by the migration threshold in the screenshot below.
Anyway, for purposes of this article you need to know that priority 1 migrations have a greater impact on rebalancing the cluster than priority 3 migrations, while priority 5 are for the greatest perfectionist seekers of balance.
The DRS algorithm weighs the benefits of the migration vs. the costs before doing the migration. Costs are CPU reserved during migration on the target host, memory consumed by shadow VM during vMotion on the target host and the slight VM downtime during the vMotion. The benefits are clearly more resources available to the migrating VM in its new location, and more resources left behind for the VM that will stay on the source host.
You can override cluster wide settings for certain VMs from the manage tab of the cluster. On the setting sub-tab you will find the VM overrides. This may be useful if you prefer that some machines do not move often while other machines clear the air for it.
Distributed Power Management
The next set of settings is a feature of its own right called DPM (Distributed Power Management):
The concept is very simple: in contrast with DRS, which balances the load to give each VM the needed resources, DPM gathers as much VMs as it can on as few hosts as possible in times when the cluster is lightly loaded.
This leaves some hosts without any workload; those hosts can be shutdown to save power. When the load picks up, DPM needs to power on the host using Wake-on-Lan, IPMI or iLO (Both are remote server management methods, that among other things, can turn on servers remotely).
Graph from VMware showing DPM
As with DRS, DPM can be set to just provide recommendations or actually vMotion VMs off hosts to shut them down. Similarly, it can be aggressive or conservative depending on your needs and priorities.
There is also a place to set some advanced options, as recommended by VMware customer support, to resolve an issue. So far, I did not need to set any advanced options for DRS, nor for DPM, even after searching to find some interesting ones to try. Still, it is good to know where you can set them if you ever need to.
Enhanced vMotion Compatibility (EVC)
To be able to move VMs from one host to another while running a reasonable degree of CPU features compatibility must exist between the hosts. They do not need to be of the exact CPU speed, but they at least need to be members in the same CPU family (or generation) from the same vendor.
EVC masks the newer CPU features to make all hosts offer a lowest common denominator to the VMs. In essence, this means that all the hosts in your cluster will only offer CPU features that the oldest member of the cluster can offer.
For example: an AMD Opteron Generation 4 cluster can have CPUs of Bulldozer family and newer. It cannot have members of an earlier generation, as those were not capable of SSE4.1 and a number of other features. Similarly, an Opteron “Piledriver” host will not be able to offer newer features like FMA, TBM, BMI1 and F16C as long as it is a member in a Generation 4 cluster.
Although EVC is usually discussed with DRS clusters, if you have a mixed generation datacenter and do not want (or cannot offer) to enable DRS, you can still create a cluster with only EVC enabled and have more flexibility when using manual vMotion.
DRS and DPM are two great advanced features that help optimize the performance and resources of your VMware cluster. They take the vMotion technology we have discussed earlier to the next level by adding automation and intelligence.
One of the most important uses of vMotion is to avoid a huge percentage of planned downtime that used to be required when there was a planned hardware maintenance or host patching. Although a DRS/DPM cluster can be very efficient in balancing the workload, it does not help you against unplanned downtime. For that, you may use VMware High Availability and Fault Tolerance, which will be the subject for other articles.