vSAN Design Considerations

I have gone through the vSAN 6.2 Design and Sizing Guide, the vSAN 6.2 Network Design Guide and the vSAN 6.2 Stretched Cluster & 2 Node Guide (230+ pages) and aggregated all of the design decisions, design considerations and best practices into one page of bullet points that is easy to consume. If you read any other articles on my blog, you will realise that I am very fond of condensed lists and summarized bullet points. Here is a comprehensive list of resources for all things vSAN.

If you are interested in Nutanix HCI design, I also have my design decision summary for ESXi and AHV.

This is an aggregated and summarised list, if you want the context and explanation for each point, then you should refer to the white papers above and search for the text “Design Decision”, “Design Consideration”, “Best Practice” or “VMware Recommends”.

Design Decisions and Design Considerations

  • 4 or more nodes provide more availability options than 3 node configurations. Ensure there is enough storage capacity to meet the availability requirements and to allow for a rebuild of the components after a failure. Consider designing clusters with a minimum of 4 nodes where possible.
  • If the design goal is to deploy a certain number of virtual machines, ensure that there are enough ESXi hosts in the cluster to support the design.
  • Ensure there are enough physical devices in the capacity layer to accommodate a desired stripe width requirement.
  • Ensure there are enough hosts (and fault domains) in the cluster to accommodate a desired NumberOfFailuresToTolerate requirement.
  • Consider if the introduction of jumbo frames in a vSAN environment is worth the operation risks when the gains are negligible for the most part.
  • A single large disk group configuration or multiple smaller disk group configurations.
  • Consider if a workload requires PCIe performance or if the performance from SSD is sufficient. Consider if a design should have one large disk group with one large flash device, or multiple disk groups with multiple smaller flash devices. The latter design reduces the failure domain, and may also improve performance, but may be more expensive.
  • For all flash configurations, a cache to capacity ratio of 10% is still recommended. Ensure that flash endurance is included as a consideration when choosing devices for the cache layer. Endurance figures are included on the VCG.
  • Design for growth. Consider purchasing large enough flash devices that allow the capacity layer to be scaled simply over time.
  • Design with additional flash cache to allow easier scale up of the capacity layer. Alternatively scaling up cache and capacity at the same time through the addition of new disks groups is also an easier approach than trying to simply update the existing flash cache device in an existing disk group.
  • The number of magnetic disks matter in hybrid configurations, so choose them wisely. Having more, smaller magnetic disks will often give better performance than fewer, larger ones in hybrid configurations.
  • Choose a standard disk model/type across all nodes in the cluster. Do not mix drive models/types.
  • Always include the NumberOfFailuresToTolerate setting when designing vSAN capacity.
  • If the requirement is to rebuild components after a failure, the design should be sized so that there is a free host worth of capacity to tolerate each failure. To rebuild components after one failure or during maintenance, there needs to be one full host worth of capacity free. To rebuild components after a second failure, there needs to be two full host worth of capacity free.
  • Include formatting overhead in capacity calculations.
  • There are other considerations to take into account apart from NumberOfFailuresToTolerate and formatting overhead.
  • As a rule of thumb, VMware recommends leaving approximately 30% free space available in the cluster capacity.
  • If virtual machine snapshots are used heavily in a hybrid design, consider increasing the cache-to-capacity ratio from 10% to 15%.
  • Multiple storage I/O controllers per host can reduce the failure domain, but can also improve performance.
  • Choose storage I/O controllers that have as large a queue depth as possible. While 256 are the minimum, the recommendation would be to choose a controller with a much larger queue depth where possible.
  • Storage I/O controllers that offer RAID-0 mode typically take longer to install and replace than pass-thru drives from an operations perspective.
  • When choosing a storage I/O controller, verify that it is on the VCG, ensure cache is disabled, and ensure any third party acceleration features are disabled. If the controller offers both RAID-0 and pass-through support, consider using pass-through as this makes maintenance tasks such as disk replacement much easier.
  • Multiple disk groups typically mean better performance and smaller fault domains, but may sometimes come at a cost and consume additional disk slots.
  • VMware recommends that approximately 30% of free capacity should be kept to avoid unnecessary rebuilding/rebalancing activity. To have components rebuilt in the event of a failure, a design should also include at least one free host worth of capacity. If a design needs to rebuild components after multiple failures, then additional free hosts worth of capacity needs to be included.
  • Realistically, the metadata overhead incurred by creating components on vSAN is negligible and doesn’t need to be included in the overall capacity.
  • Realistically, the overhead incurred by creating witnesses on vSAN is negligible and doesn’t need to be included in the overall capacity.
  • The virtual machine memory snapshot size needs to be considered when sizing the vSAN datastore, if there is a desire to use virtual machine snapshots and capture the virtual machine’s memory in the snapshot.
  • Use FlashReadCacheReservation with caution. A misconfiguration or miscalculation can very easily over-allocate read cache to some virtual machines while starving others.
  • While the creation of replicas is taken into account when the capacity of the vSAN datastore is calculated, thin provisioning over-commitment is something that should be considered in the sizing calculations when provisioning virtual machines on a vSAN.
  • When designing very large vSAN clusters, consider using fault domains as a way on avoiding single rack failures impacting all replicas belonging to a virtual machine. Also consider the additional resource and capacity requirements needed to rebuild components in the event of a failure.

Best Practices

  • Design for growth
  • Ensure there is enough cache to meet the design requirements. The recommendation for cache is 10% of of the anticipated consumed storage capacity before the NumberOfFailuresToTolerate is considered.
  • Enable vSphere HA on the vSAN cluster for the highest level of availability.
  • Check the VCG and ensure that the flash devices are (a) supported and (b) provide the endurance characteristics that are required for the vSAN design.
  • Allow 30% slack space when designing capacity.
  • Try to maintain at least 30% free capacity across the cluster to accommodate the remediation of components when a failure occurs or a maintenance task is required. This best practice will also avoid any unnecessary rebalancing activity.
  • Check if any virtual machines are non-compliant due to a lack of resources before adding new resources. This will explain why new resources are being consumed immediately by vSAN. Also check if there are non-compliant VMs due to force provisioning before doing a full data migration.
  • In vSAN 5.5, always deploy virtual machines with a policy. Do not use the default policy if at all possible. This is not a concern for vSAN 6.0+, where the default policy has settings for all capabilities.
  • Use uniformly configured hosts for vSAN deployments. While compute only hosts can exist in vSAN environment, and consume storage from other hosts in the cluster, VMware does not recommend having unbalanced cluster configurations.
  • Enable HA with vSAN for the highest possible level of availability. However, any design will need to include additional capacity for rebuilding components
  • Consider using layer 2 multicast for simplicity of configuration and operations
  • Minimizing oversubscription to reduce opportunities for congestion during host rebuilds or high throughput operations.
  • Deploying all hosts within a fault domain to a low latency wire speed switch or switch stack. When multiple switches are used, pay attention to throughput of the links between switches. Deployments with limited or heavily over subscribed throughput should be carefully considered.
  • Disabling flow control for vSAN traffic
  • Deploying VDS for use with VMware vSAN.
  • Use Load Based Teaming or for load balancing, and appropriate spanning tree port configurations are taken into account.
  • Isolating each vSAN cluster’s traffic to its own VLAN when using multiple clusters.
  • Using the existing MTU/Frame size you would otherwise be using in your environment.
  • Enable LLDP or CDP in both send and receive mode.
  • Redundant uplinks for vSAN and all other traffic.

Stretched vSAN Cluster & 2 Node vSAN Cluster

  • Upgrading the on-disk format to v2 for improved performance and scalability, as well as stretched cluster support. In vSAN 6.2 clusters, the v3 on-disk format allows for additional features specific to 6.2.
  • vSAN communication between the data sites be over stretched L2. vSAN communication between the data sites and the witness site is routed over L3.
  • For most workloads, VMware recommends a minimum of 10Gbps or greater bandwidth between sites. In use cases such as 2 Node configurations for Remote Office/Branch Office deployments, dedicated 1Gbps bandwidth can be sufficient with less than 10 Virtual Machines.
  • The latency to the witness is dependent on the number of objects in the cluster. VMware recommends that on vSAN Stretched Cluster configurations up to 10+10+1, a latency of less than or equal to 200 milliseconds is acceptable, although if possible, a latency of less than or equal to 100 milliseconds is preferred. For configurations that are greater than 10+10+1, VMware recommends a latency of less than or equal to 100 milliseconds is required.
  • VMware recommends that customers should run their hosts at 50% of maximum number of virtual machines supported in a standard vSAN cluster to accommodate a full site failure. In the event of full site failures, the virtual machines on the failed site can be restarted on the hosts in the surviving site.
  • If using a physical ESXi host, a single physical disk can support a maximum of 21,000 components. Each witness component in a vSAN stretch cluster requires 16MB storage. To support 21,000 components on a magnetic disk, VMware recommends a disk of approximately 350GB in size. To accommodate the full 45,000 components on the witness host, VMware recommends 3 magnetic disks of approximately 350GB are needed, keeping the limit of 21,000 components per disk in mind.
  • VMware recommends the flash device capacity (e.g. SSD) on the witness host should be approximately 10GB in size for the maximum number of 45,000 components is required. In the witness appliance, one of the VMDKs is tagged as a flash device. There is no requirement for an actual flash device.
  • For full availability, VMware recommends that customers should be running at 50% of resource consumption across the vSAN Stretched Cluster. In the event of a complete site failure, all of the virtual machines could be run on the surviving site (aka fault domain)
  • VMware recommends the following network types for a vSAN Stretched Cluster:
    • Management network: L2 stretched or L3 (routed) between all sites. Either option should work fine. The choice is left up to the customer.
    • VM network: VMware recommends L2 stretched between data sites. In the event of a failure, the VMs will not require a new IP to work on the remote site
    • vMotion network: L2 stretched or L3 (routed) between data sites should work fine. The choice is left up to the customer.
    • vSAN network: VMware recommends L2 stretched between the two data sites and L3 (routed) network between the data sites and the witness site. L3 support for the vSAN network was introduced in vSAN 6.0.
  • To avoid the situation previously outlined, and to ensure that data traffic is not routed through the witness site, VMware recommends the following network topology:
    • Between Site 1 and Site 2, implement either a stretched L2 (same subnet) or a L3 (routed) configuration.
    • Between Site 1 and Witness Site 3, implement a L3 (routed) configuration.
    • Between Site 2 and Witness Site 3, implement a L3 (routed) configuration.
    • In the event of a failure on either of the data sites network, this configuration will prevent any traffic from Site 1 being routed to Site 2 via Witness Site 3, and thus avoid any performance degradation.
  • The data multiplier is comprised of overhead for vSAN metadata traffic and miscellaneous related operations. VMware recommends a data multiplier of 1.4.
  • While it might be possible to use 1Gbps connectivity for very small vSAN Stretched Cluster implementations, the majority of implementations will require 10Gbps connectivity between sites. Therefore, VMware recommends a minimum of 10Gbps network connectivity between sites for optimal performance and for possible future expansion of the cluster.
  • When vSphere HA is configured on a vSAN Stretched Cluster, VMware recommends the following:
    • Enabling vSphere HA Admission Control
    • Configuring the admission control policy to 50 percent for both memory and CPU.
    • Leaving VM Component Protection (VMCP) disabled.
    • Disable Datastore for Heartbeating.
    • Response for Host Isolation is to Power off and restart VMs.
    • Enabling host isolation response and specifying an isolation response addresses that is on the vSAN network rather than the management network.
    • Specifying two additional isolation response addresses, and each of these addresses should be site specific.
    • Enabling vSphere DRS on vSAN Stretched Clusters where the vSphere edition allows it.
    • DRS is placed in partially automated mode if there is an outage.
    • Enabling vSphere DRS to allow for the creation of Host-VM affinity rules to do initial placement of VMs and to avoid unnecessary vMotion of VMs between sites, and impacting read locality.
    • When configuring DRS with a vSAN Stretched Cluster, VMware recommends creating two VM-Host affinity groups.
    • Implement “should respect rules” in the VM/Host Rules configuration section.
    • When recovering from a failure, especially a site failure, all nodes in the site should be brought back online together to avoid costly resync and reconfiguration overheads.


Published by


Chief Enterprise Architect and Strategist, 4xVCDX#133, NPX#8, DECM-EA.

2 thoughts on “vSAN Design Considerations”

Comments are closed.