Nutanix XCP Deep-Dive – Part 12 – ESXi Design Considerations

This is Part 12 of the Nutanix XCP Deep-Dive, covering ESXi design considerations.

This will be a multi-part series, describing how to design, install, configure and troubleshoot an advanced Nutanix XCP solution from start to finish for vSphere, AHV and Hyper-V deployments:

I have aggregated all of the design considerations I could find that need to be assessed in a Nutanix XCP architecture design with VMware vSphere/ESXi.  Brevity and bullet-points are used to keep the information concise and readable.  If you want more information on a concept use the NPX Link-O-Rama.

This post will be updated with additional information as part of the NPX Link-O-Rama.  If you have content to contribute, post a comment below.

I have separated the design decisions into the areas specified by the NPX blueprint.

Business Goals

  • What are the business goals of the solution?

Requirements/Constraints/Assumptions

  • What are the requirements, constraints and assumptions of the solution?

Risks

A. Data Center Facility

Logical Design Decisions

  • Single-site or Multi-site Data Center Facilities?
  • Data Center type – “Bricks & Mortar”, Co-location, Pre-Fabricated or Performance Optimised Data Centers?
  • Management & Control Plane will be separated from the Data Plane?

Physical Design Decisions

  • Physical location(s) of Data Center Facilities?
  • Distances between Data Centers?
  • Type of Data Center Facilities?
  • Power and Cooling requirements for the solution?
  • Can the Data Center Facility handle high density infrastructure?
  • Rack layouts for the solution?

B. Virtual Infrastructure Management

Logical Design Decisions

  • Number of Pooled Compute, Network and Storage resources?
  • What services are you delivering?
  • Required availability levels of virtualisation management systems?
  • 3rd party integrations: IT Service Management, Infrastructure Management systems, Enterprise services (DNS, LDAP, NTP, PKI, Syslog, SNMP Traps), Vendor Data collection
  • Advanced Operations
  • Hypervisor Workload Protection mechanisms?
  • Hypervisor Workload Resource Balancing mechanisms?

Physical Design Decisions

  • Hypervisor: ESXi and which version? (Hyper-V and AHV have been dropped to align with the Conceptual Model/Logical Design)
  • Dedicated Management Cluster?
  • Standalone or Linked-Mode vCenters?
  • vCenter Server version and installation type?
  • vCenter Server database?
  • vCenter Server protection mechanism?
  • vSphere components that will be used? Only use what you need.
  • Host profiles?
  • Update management of ESXi, VM Hardware version and VMtools?
  • Consider using AOS One-Click upgrades for ESXi.  They are validated by Nutanix before the JSON file is published.
  • Antivirus integration via vShield?  If yes, with vCNS vShield Manager or NSX Manager?
  • vRealize Suite being used for advanced operations and cloud management?
  • vRO for workflows?
  • Enterprise Management solution to integrate with?
  • Prism Central to aggregate clusters?
  • Automated vendor support mechanisms? How will vCenter Support Assistant, Nutanix Pulse and Nutanix Remote Access be used?
  • Any Service Desk or Change Management requirements that must be met?
  • What are the vSphere HA and vSphere DRS requirements?
  • Make sure you understand the mandatory Nutanix configuration settings for vSphere.
  • DNS and NTP integration?
  • Role Based Access Control and LDAP integration?
  • Which vCenter Server User Interface for Administration and Operations?
  • What vSphere and Nutanix licencing is required?
  • 3rd party software licencing considerations? Per physical socket/core or vCPU?  DRS VM-Host rules required?

C. Compute

Logical Design Decisions

  • Traditional Monolithic Compute, Server-Side Flash Cache Acceleration with legacy infrastructure, Converged Infrastructure or Hyper-Converged Infrastructure? Obviously this must align with the Storage section.
  • Minimum number of Hypervisor Hosts per Cluster
  • Host sizing: Scale Up or Scale Out?
  • Homogeneous or Heterogeneous nodes?
  • Number of Sockets per Host?
  • Host Spanning for Failure Domains?
  • Required CPU Capacity?
  • Required Memory Capacity?

Physical Design Decisions

  • HCI Vendor: Nutanix XCP, Dell XC or Lenovo (all other vendors/technologies have been dropped to align with the Logical Design/Conceptual Model)
  • Processor type: Intel (AMD not supported by Nutanix)
  • Intel CPU Features: VT-x supported, Hyper-threading, Turbo Boost, NUMA enabled?
  • Cluster Hardware and Configuration?
  • Inter-Mix rules are being followed?
  • Number of vSphere clusters per Nutanix cluster?
  • Nutanix family and model number?
  • Number of CPU sockets per node?
  • Model of Intel Processor, number of cores and GHz per core?
  • GPU required?
  • Host locations?
  • Single Rack, Multi-Rack with striping?
  • Cluster Availability requirements?
  • Nutanix Redudancy Factor?
  • Nutanix Availability Domains?
  • Align compute availability with storage availability?
  • Future expansion?

D. Storage

Logical Design Decisions

  • Traditional Monolithic Storage, Server-Side Flash Cache Acceleration with legacy infrastructure, Converged Infrastructure or Hyper-Converged Infrastructure?  Obviously this must align with the Compute section.
  • Block-based or IP-based Storage Access?
  • Homogeneous or Heterogeneous storage nodes?
  • Automated storage management?
  • RDM devices allowed?
  • Hypervisor boot method? DAS, LUN or PXE?
  • Thin or Thick provisioning for Back-end and VMs?
  • Required storage resources (performance and capacity)?
  • Storage replication?

Physical Design Decisions

  • HCI Vendor: Nutanix XCP, Dell XC or Lenovo and AOS version? (all other vendors/technologies have been dropped to align with the Logical Design/Conceptual Model)  Obviously this must align with the Compute section.
  • Usable Storage Calculation, considering Storage Pools, Replication Factor, Usable Capacity and Usable Performance?
  • Number of SSD and HDD drives per Node?
  • Nutanix used to publish the Diagnostics results in the release notes of each NOS version, but has stopped doing this.
  • Also consider Number of Containers, Free-Space Reservations, Deduplication, Compression, Erasure Coding and Acropolis Volumes API.
  • Controller VM Sizing across the cluster?
  • Capacity nodes required for existing or new clusters?
  • Inter-Mix rules are being followed?
  • The performance of each release is very subjective and the Diagnostics results are useful as an indicator and benchmark for basic verification.
  • Proper verification of storage performance should be validated during the Test Phase of the Implementation Plan.
  • The Public version of the Nutanix Sizer Tool does not include storage performance, only capacity.  Contact your Nutanix Partner for a cluster design that meets your required performance profile.
  • Active Working Set required for each node?
  • Self-Encrypting Disks?  If yes, consider the KMS requirements.
  • ESXi host boot must be from SATA-DOM (USB), this is a Nutanix constraint.
  • Default Auto-Tiering (ILM) thresholds?
  • VM DirectPath I/O and SR-IOV cannot be used, this is a Nutanix constraint.
  • Datastores per Nutanix cluster?  Ideally, go with one Datastore per vSphere Cluster.
  • Storage DRS and SIOC being used?  This is not required.
  • Different VMDK shares being used?
  • VAAI being used?
  • VASA and VM storage profiles?  VASA not supported and VM storage profiles could be manually configured for a multi-container cluster with different settings.
  • Asynchronous DR, Metro or Synchronous DR required? (mentioned again in Backup/Recovery and BC/DR sections)
  • Future expansion?

E. Network

Logical Design Decisions

  • Legacy 3-Tier Switch, Collapsed Core or Clos-type Leaf/Spine?
  • Clustered Physical or Standalone EoR/ToR Switches?
  • Stretched or Per Rack VLANs?
  • Functional traffic types separated with vSwitches or VLANs?
  • Jumbo Frames?
  • Quality of Service?
  • Load Balancing?
  • IP version?
  • Inter-Data Center links, including RTT?
  • Required Network Capacity?
  • Single vNIC or Multi vNIC VMs allowed?

Physical Design Decisions

  • Clos-type Leaf/Spine vendor selection for large installations?
  • Blocking or non-blocking Data Center switch fabric?
  • If blocking, what is the over-subscription ratio?
  • What is the traffic path for North/South and East/West traffic?
  • Where are the Layer 3 gateways for each IP Subnet?
  • Any Dynamic Routing requirements?
  • Is Multi-Cast required?
  • End-to-End Jumbo Frames?
  • Host interfaces: 1GbE and/or 10GbE? How many per node?
  • LAGs or unbonded host interfaces?
  • Management overlay required for KVM and IPMI?
  • Physical LAN Performance?
  • Host interface connectivity matrix?
  • Metro Ethernet required between Data Centers?
  • QoS, Network Control and vSphere Network I/O Control?
  • Edge QoS enforced or End-to-End QoS?
  • vSphere NIOC System and User-Defined Network Resource Pools?
  • Multi-NIC vMotion?
  • VLAN Pruning?
  • Spanning Tree considerations?
  • VM DirectPath I/O and SR-IOV cannot be used, this is a Nutanix constraint.
  • TCP Offload enabled?
  • VSS, VDS or Cisco Nexus 1000V?
  • Separate vSwitches per Cluster or shared?
  • Teaming and Load Balancing?
  • VMkernel ports?
  • Portgroups?
  • VMware NSX-v required?
  • Future Expansion?

F. Backup/Recovery

Logical Design Decisions

  • VM Image Backup Frequency?
  • Application and Database Consistent Backup Frequency?
  • Backup Restore Times?
  • Physical Separation of Operational Data and Backup Data?
  • Required Backup Resources
  • Required Backup and Restore Performance

Physical Design Decisions

  • VADP used?
  • Backup/Recovery solution?
  • Backup/Restore mechanism?
  • VM-Centric Snapshots?
  • Async DR Replication of VM-Centric Snapshots to remote cluster/cloud connect (AWS/Azure)?
  • Backup frequency?
  • Retention period?
  • Backup capacity and performance?
  • Fast restore of management cluster direct to host?
  • Future expansion?

G. Virtual Machine

Logical Design Decisions

  • Standard VM T-shirt sizes?
  • VM CPU and RAM management mechanisms used?
  • Location of VM files?
  • Guest OS standardisation?
  • 64-bit and 32-bit?
  • Templates used?

Physical Design Decisions

  • Standard VMs of what size?
  • vApps and Resource Pools?
  • VM files on shared storage?
  • Standard vDisk setups per VM?
  • Thin provisioned vDisks?
  • Nutanix or vSphere Snapshots allowed?
  • CBT enabled?
  • 64-bit/32-bit Guest OS versions?
  • vSCSI adapters?
  • vNIC adapters?
  • VM Hardware version?
  • VMtools installed and version?
  • VM Options?
  • VM Templates?
  • VM Template Repository?
  • Mission-Critical/Business-Critical Application considerations?

H. Security

Logical Design Decisions

  • Zones of Trust?
  • Defence-in-Depth?
  • Multi-Vendor?
  • Physical separation requirements?
  • Compliance standards?
  • Virtualisation security requirements?
  • Required Network Security Capacity?

Physical Design Decisions

  • Physical and Virtual Network Zoning?
  • Application-level, Network-level Firewalls?
  • IDS and IPS?
  • SSL and IP-Sec VPNs?
  • Unified Threat Management?
  • Vendor selection?
  • VMware vCNS/NSX-v required?
  • Anti-Virus? Endpoint Protection?
  • Network Security Performance?
  • Security Information & Event Management (SIEM)?
  • Public Key Infrastructure (PKI)?
  • Nutanix Cluster security? STIG?
  • ESXi host security?
  • Network security?
  • Storage security?
  • Backup security?
  • VM security?
  • Future Expansion?

I. BC/DR

Logical Design Decisions

  • Protection Mechanisms?
  • Manual or Automated Run-books?
  • RPO, RTO, WRT and MTD of Mission-Critical, Business-Critical and Non-Critical applications?
  • Global Site Load Balancers?
  • DNS TTL for clients?

Physical Design Decisions

  • DR Automation solution?
  • VMware SRM?
  • GSLB solution?
  • Internal and External DNS servers?
  • Metro or Synchronous DR to remote clusters?
  • Multi-Site Application, Database or Message Queue clustering/replication?

Published by

vcdx133

Chief Enterprise Architect and Strategist, 4xVCDX#133, NPX#8, DECM-EA.