VMware DRS (Distributed Resource Scheduler) is a feature within the vSphere software platform that manages virtual machine (VM) distribution between ESXi hosts in the same cluster.  Specifically, DRS automatically balances virtual machine workloads between hosts to ensure that virtual machines do not contend for host resources. Additionally, DRS ensures that hosts do not run out of CPU and Memory.

This functionality is necessary because, over time, ESXi hosts may become overloaded with CPU, network, and memory demands from the virtual machines running on top of them.

To help you better understand DRS, here we will take a closer look at its functionality, key concepts, best practices and provide a configuration walkthrough.

Core DRS Functionality

There are various configurations available for DRS. We’ve outlined the most commonly used options in the table below:

Feature
Options
Automation Level
Full, Partial, Manual
Migration Threshold
Conservative, Aggressive
Predictive DRS
Enable, Disable
Virtual Machine Automation
Enable, Disable
VM Distribution
Enable, Disable
CPU Over-Commitment
Enable, Disable

VMware DRS Definitions

While the concept of DRS is relatively straightforward, some of the DRS options can be confusing. Below, we’ll take a look at the definitions of some of the most important options.

Automation Level

The DRS automation level determines which DRS features are automated. Setting Automation Level to manual does two things:

  • DRS recommends a host for initial virtual machine placement.
  • After an administrator manually powers a virtual machine, DRS recommends a host to run it.

However, an administrator can ignore both of these recommendations.

When the automation level setting is partial, DRS will automatically choose the best ESXi host to run a virtual machine after it powers on. With this setting, DRS will not automatically move virtual machines between hosts. However, it will create recommendations as illustrated below and allow you to execute those from the vSphere client manually.

When the automation level setting is full (VMware’s suggested best practice), two things happen:

  • During power-on, virtual machines are automatically placed on the best-suited host.
  • Virtual machines are automatically moved to other hosts within the same cluster if a host becomes busy or virtual machines suffer from contention.

Migration Threshold

The DRS migration threshold is an option to set how aggressive DRS is at moving virtual machines within the cluster. An aggressive setting moves virtual machines even if the benefits are slight. In contrast, a conservative setting does not move virtual machines to other hosts within the cluster unless significant benefits exist.

VMware’s best practice is to leave the setting as the default level of 3, between conservative (1) and aggressive (5).

Predictive DRS

This feature is disabled by default because it requires vRealize Operations Manager (vROps). If enabled, the Predictive DRS feature will proactively move workloads between hosts based on previous information and forecasts from vRealize Operations Manager. Predictive DRS is beneficial for workloads such as VDI (Virtual Desktop Infrastructure) since those typically have time-based demand changes that vROps is aware of. If a spike in usage is expected based on the forecasted information from vROps, DRS will proactively react and balance the cluster before the demand spikes occur.

Virtual Machine Automation

When Virtual Machine Automation is enabled, it allows for per-VM DRS settings. These settings override global DRS settings for the specified virtual machines.

VM Distribution

VM Distribution attempts to spread the number of virtual machines evenly between all hosts in a cluster.

This option is available because resource utilization across VMs on different hosts can vary widely. For example, if many CPU-intensive VMs run on a few hosts, the total number of VMs throughout a cluster can quickly become unbalanced.

VMware DRS Requirements

VMware defines these requirements for  DRS to function correctly:

  • Shared storage. Required so that virtual machines continue to access their storage when their compute resources migrate to different hosts within the cluster.
  • One or more hosts. Since DRS is a cluster-level feature, it is only applicable when there is more than one host in a vSphere cluster
  • A vMotion Network. vMotion is required to move the virtual machines between hosts. Therefore all the standard vMotion prerequisites apply to DRS.
  • DRS configuration. As you’d expect, DRS needs to be enabled. The default DRS settings are VMware’s recommended configurations. Therefore, enabling DRS will give you a good level of resource balancing by default.
  • Licensing requirements. DRS is not available with the vSphere standard license. You need a vSphere Enterprise Plus license to enable DRS from your vSphere client.

vMotion and its role in DRS

DRS relies heavily on vMotion to move virtual machines between hosts. vMotion itself is one of the most important components of the vSphere suite. vMotion uses an IP network to move a virtual machine from one ESXi host to another. This migration includes all the CPU and memory instructions that the virtual machine is running. The process is usually completely transparent and unnoticeable to the virtual machine and end-users.

vMotion between two physical ESXi hosts (source)

Shared Storage

As discussed, DRS and vMotion require each host within a cluster to access a shared storage solution. The shared storage enables all ESXi hosts to store virtual machines somewhere every host in the cluster can access. When DRS triggers vMotion to move virtual machines to another host, both the source and destination host already have access to the virtual machine files. As a result, the destination host can take ownership of the virtual machine’s files without moving any storage.

vMotion memory transfer

When triggered, vMotion will take a copy of a virtual machine’s memory and rapidly transfer it over the vMotion network to the destination host. A buffer tracks memory changes during this rapid transfer. Then, synchronization initiates to keep the virtual machine memory copy up to date while vMotion attempts to switch the virtual machine to the destination host.

Once the new host is ready to take ownership of the virtual machine, vMotion checks if the memory changes in the buffer are small enough to transfer in quickly. If there are, the final sync completes. Then, the new host takes ownership of the virtual machine. Since only the virtual machine memory transfers between the two hosts, vMotion is usually very quick. However, the speed depends on the number of memory changes on the virtual machine and also how fast your vMotion network is.

Where possible, VMware recommends a dedicated vMotion network to ensure a fast and consistent speed for virtual machine migration.

DRS uses vMotion to move VMs between hosts in a cluster, thus balancing workloads for best performance (source)

DRS Best Practices

Keeping in mind DRS is used to ensure good VM performance and reduce overloaded ESXi hosts, let’s explore some important VMware DRS best practices.

    1. Set DRS to be Fully Automated

Typically, when admins deploy new virtual machines, an ESXi host needs to be selected. Admins must manually check for the best host to place the virtual machine. Manually checks can be challenging since hosts often vary with how busy they are from a memory and CPU perspective. Additionally, compute utilization can change dramatically throughout the day.
DRS’s fully automated option allows an admin to select a cluster for running a virtual machine. DRS will then automatically place the virtual machine on the best available host.

   

    2. Dedicate network uplinks for vMotion activity

When DRS needs to move many virtual machines simultaneously, vMotion will queue them up depending on how fast your uplinks are to your switches. When you have many workloads with varying compute demands, the frequency of DRS-triggered vMotions can be high.
To solve this issue, you can dedicate uplinks to vMotion traffic only. Dedicated uplinks help speed up vMotion activity and result in fewer virtual machines queuing for a long time. You can configure these uplinks on standard or distributed switches if you have enough uplinks and switch ports on your data center switches.

     

     3. Increase vMotion Bandwidth

To help with the vMotion queuing issues, and where there is no option to dedicate uplinks for vMotion, you can swap your uplinks for ones with higher bandwidth.
For example, a 1Gbps network has a maximum number of concurrent vMotions set at 4. With a 10Gbps network, you can perform up to 8 vMotions at the same time.How does that help DRS?. The more concurrent vMotions you can achieve, the faster DRS can react to overloaded hosts or VMs contending for resources.

   

    4. Ensure CPU compatibility

When adding new hosts to your DRS-enabled cluster, it’s imperative to ensure they conform to the vMotion requirements for DRS to operate correctly. DRS will not move virtual machines to the new host if it is incompatible with other hosts in the cluster.
You need to ensure that you’re adding new hosts with the same CPU generation as your existing hosts in the cluster or that your cluster has a suitable EVC mode enabled before adding the new host.Compatible CPUs help ensure that vMotion will be supported between all hosts in the EVC cluster, resulting in a well-balanced and healthy environment for your workloads.

   

    5. Refrain from changing the DRS migration threshold

Unless you have a good reason, it’s a good idea to keep the DRS migration threshold set to the default value of 3.
If you are experiencing too many DRS migrations, then it’s time to review your host’s CPU and memory to see if you’re overloaded. Likewise, if you rarely see DRS-triggered migrations, then your hosts might be underutilized. In that case, there could be an opportunity to collapse and combine multiple clusters to save on hardware and licensing costs.While changing the DRS migration threshold does help in some circumstances, you don’t want to be too aggressive with this setting as it can cause the virtual machine to “stun” more often than necessary. It will also consume more host CPU and networking resources during the additional vMotion activity.

 

How to setup DRS

After configuring vMotion, you can take the following steps from the vSphere client: to setup DRS

  • From the vSphere client, right-click your cluster and select Settings
  • Under services, select vSphere DRS then Edit
  • Enable DRS by switching the vSphere DRS toggle to on
  • Unless otherwise required, consider the following settings for best practices:

Automation

  • Automation level: Fully Automated
  • Migration Threshold: Level 3

Additional Options

  • VM Distribution: Enable

 

  • ROI Calculator

    How much can you save on AWS?

    CloudBolt Cost Management can save organizations up to 40% on public cloud costs. Try our ROI Calculator today to see what you can save.

    Try ROI Calculator

  • We’ve got a tool for that

    We’ve got a tool for that

    Moz has a free tool that, etc.

    Try ROI Calculator

  • How much can you save on your public cloud bill?

    CloudBolt Cost Management can save organizations up to 40% on public cloud costs. Try our ROI Calculator today to see what you can save. Try ROI Calculator
  • Meeting you anywhere on your cloud journey.

    Cut down cloud sprawl, optimize costs, extend multiple tools and enable self-service IT with CloudBolt Software solutions. Talk to Us