Nutanix XCP Deep-Dive – Part 13 – AHV Design Considerations

This is Part 13 of the Nutanix XCP Deep-Dive, covering AHV design considerations.

This will be a multi-part series, describing how to design, install, configure and troubleshoot an advanced Nutanix XCP solution from start to finish for vSphere, AHV and Hyper-V deployments:

I have aggregated all of the design considerations I could find that need to be assessed in a Nutanix XCP architecture design with Acropolis Hypervisor.  Brevity and bullet-points are used to keep the information concise and readable.  If you want more information on a concept use the NPX Link-O-RamaNOTE: Clearly understand what AHV is capable of.

This post will be updated with additional information as part of the NPX Link-O-Rama.  If you have content to contribute, post a comment below.

Business Goals

  • What are the business goals of the solution?

Requirements/Constraints/Assumptions

  • What are the requirements, constraints and assumptions of the solution?

Risks

A. Data Center Facility

Logical Design Decisions

  • Single-site or Multi-site Data Center Facilities?
  • Data Center type – “Bricks & Mortar”, Co-location, Pre-Fabricated or Performance Optimised Data Centers (PODs)?
  • Management & Control Plane will be separated from the Data Plane?

Physical Design Decisions

  • Physical location(s) of Data Center Facilities?
  • Distances between Data Centers?
  • Type of Data Center Facilities?
  • Power and Cooling requirements for the solution?
  • Can the Data Center Facility handle high density infrastructure?
  • Rack layouts for the solution?

B. Virtual Infrastructure Management

Logical Design Decisions

  • Number of Pooled Compute, Network and Storage resources?
  • What services are you delivering?
  • Required availability levels of virtualisation management systems?
  • 3rd party integrations: IT Service Management, Infrastructure Management systems, Enterprise services (DNS, LDAP, NTP, PKI, Syslog, SNMP Traps), Vendor Data collection
  • Advanced Operations
  • Hypervisor Workload Protection mechanisms?
  • Hypervisor Workload Resource Balancing mechanisms?

Physical Design Decisions

  • Hypervisor: AHV and which version? (Hyper-V and ESXi have been dropped to align with the Conceptual Model/Logical Design)
  • Prism Central to aggregate clusters?
  • AHV/AMF components that will be used? Only use what you need.
  • Enterprise Management solution to integrate with?
  • How will Nutanix Pulse and Nutanix Remote Access be used?
  • Any Service Desk or Change Management requirements that must be met?
  • What are the VM-HA requirements?
  • DNS and NTP integration?
  • Role Based Access Control and LDAP integration?
  • What Nutanix licencing is required?
  • 3rd party software licencing considerations? Per physical socket/core or vCPU? Dedicated clusters required?

C. Compute

Logical Design Decisions

  • Traditional Monolithic Compute, Server-Side Flash Cache Acceleration with legacy infrastructure, Converged Infrastructure or Hyper-Converged Infrastructure? Obviously this must align with the Storage section.
  • Minimum number of Hypervisor Hosts per Cluster
  • Host sizing: Scale Up or Scale Out?
  • Homogeneous or Heterogeneous nodes?
  • Number of Sockets per Host?
  • Host Spanning for Failure Domains?
  • Required CPU Capacity?
  • Required Memory Capacity?

Physical Design Decisions

  • HCI Vendor: Nutanix XCP, Dell XC or Lenovo (all other vendors/technologies have been dropped to align with the Logical Design/Conceptual Model)
  • Processor type: Intel (AMD not supported by Nutanix)
  • Intel CPU Features: VT-x, Hyper-threading, Turbo Boost, NUMA enabled?
  • Cluster Hardware and Configuration?
  • Inter-Mix rules are being followed?
  • Number of Acropolis clusters?
  • Nutanix family and model number?
  • Number of CPU sockets per node?
  • Model of Intel Processor, number of cores and GHz per core?
  • vGPU required? Not supported.
  • Host locations?
  • Single Rack, Multi-Rack with striping?
  • Cluster Availability requirements?
  • Nutanix Redudancy Factor?
  • Nutanix Availability Domains?
  • Align compute availability with storage availability?
  • NUMA boundaries?
  • Future expansion?

D. Storage

Logical Design Decisions

  • Traditional Monolithic Storage, Server-Side Flash Cache Acceleration with legacy infrastructure, Converged Infrastructure or Hyper-Converged Infrastructure?  Obviously this must align with the Compute section.
  • Block-based or IP-based Storage Access?
  • Homogeneous or Heterogeneous storage nodes?
  • Automated storage management?
  • RDM devices allowed?
  • Hypervisor boot method? DAS, LUN or PXE?
  • Thin or Thick provisioning for Back-end and VMs?
  • Required storage resources (performance and capacity)?
  • Storage replication?

Physical Design Decisions

  • HCI Vendor: Nutanix XCP, Dell XC or Lenovo and AOS version? (all other vendors/technologies have been dropped to align with the Logical Design/Conceptual Model)  Obviously this must align with the Compute section.
  • Usable Storage Calculation, considering Storage Pools, Replication Factor, Usable Capacity and Usable Performance?
  • Number of SSD and HDD drives per Node?
  • Nutanix used to publish the Diagnostics results in the release notes of each NOS version, but has stopped doing this.
  • Also consider Number of Containers, Free-Space Reservations, Deduplication, Compression, Erasure Coding and Acropolis Volumes API.
  • Controller VM Sizing across the cluster?
  • Capacity nodes required for existing or new clusters?
  • Inter-Mix rules are being followed?
  • The performance of each release is very subjective and the Diagnostics results are useful as an indicator and benchmark for basic verification.
  • Proper verification of storage performance should be validated during the Test Phase of the Implementation Plan.
  • The Public version of the Nutanix Sizer Tool does not include storage performance, only capacity.  Contact your Nutanix Partner for a cluster design that meets your required performance profile.
  • Active Working Set required for each node?
  • Self-Encrypting Disks?  If yes, consider the KMS requirements.
  • AHV host boot must be from SATA-DOM (USB), this is a Nutanix constraint.
  • Default Auto-Tiering (ILM) thresholds?
  • iSCSI vDisks per Nutanix cluster?
  • Image Service?
  • Asynchronous DR required?
  • Future expansion?

E. Network

Logical Design Decisions

  • Legacy 3-Tier Switch, Collapsed Core or Clos-type Leaf/Spine?
  • Clustered Physical or Standalone EoR/ToR Switches?
  • Stretched or Per Rack VLANs?
  • Functional traffic types separated with vSwitches or VLANs?
  • Jumbo Frames?
  • Quality of Service?
  • Load Balancing?
  • IP version?
  • Inter-Data Center links, including RTT?
  • Required Network Capacity?
  • Single vNIC or Multi vNIC VMs allowed?

Physical Design Decisions

  • Clos-type Leaf/Spine vendor selection for large installations?
  • Blocking or non-blocking Data Center switch fabric?
  • If blocking, what is the over-subscription ratio?
  • What is the traffic path for North/South and East/West traffic?
  • Where are the Layer 3 gateways for each IP Subnet?
  • Any Dynamic Routing requirements?
  • Is Multi-Cast required?
  • End-to-End Jumbo Frames?
  • Host interfaces: 1GbE and/or 10GbE? How many per node?
  • LAGs or unbonded host interfaces?
  • Management overlay required for KVM and IPMI?
  • Physical LAN Performance?
  • Host interface connectivity matrix?
  • Metro Ethernet required between Data Centers?
  • QoS and Network Control?
  • Edge QoS not supported, use Access/Leaf Switches for QoS?
  • VLAN Pruning?
  • Spanning Tree considerations?
  • TCP Offload enabled?
  • Separate OVS Bridges per node?
  • Teaming and Load Balancing (ab, slb, tcp)?
  • VLAN Networks with or without IPAM?
  • Future Expansion?

F. Backup/Recovery

Logical Design Decisions

  • VM Image Backup Frequency?
  • Application and Database Consistent Backup Frequency?
  • Backup Restore Times?
  • Physical Separation of Operational Data and Backup Data?
  • Required Backup Resources
  • Required Backup and Restore Performance

Physical Design Decisions

  • Backup/Recovery solution?
  • Prism Element REST-API integration with CommVault IntelliSnap?
  • Backup/Restore mechanism?
  • VM-Centric Snapshots?
  • Async DR Replication of VM-Centric Snapshots to remote cluster/cloud connect (AWS/Azure)?
  • Backup frequency?
  • Retention period?
  • Backup capacity and performance?
  • Fast restore of management cluster direct to host?
  • Future expansion?

G. Virtual Machines

Logical Design Decisions

  • Standard VM T-shirt sizes?
  • VM CPU and RAM management mechanisms used?
  • Location of VM files?
  • Guest OS standardisation?
  • 64-bit and 32-bit?
  • Templates used?

Physical Design Decisions

  • Standard VMs of what size?
  • Largest VM fits within NUMA boundaries?
  • vApps and Resource Pools?
  • VM files on shared storage?
  • Standard vDisk setups per VM?
  • Thin provisioned vDisks?
  • Nutanix or vSphere Snapshots allowed?
  • CBT enabled?
  • 64-bit/32-bit Guest OS versions?
  • vSCSI adapters?
  • vNIC adapters?
  • VM Hardware version?
  • VirtIO drivers installed and which version?
  • VM Options?
  • VM Templates?
  • VM Template Repository?
  • Mission-Critical/Business-Critical Application considerations?

H. Security

Logical Design Decisions

  • Zones of Trust?
  • Defence-in-Depth?
  • Multi-Vendor?
  • Physical separation requirements?
  • Compliance standards?
  • Virtualisation security requirements?
  • Required Network Security Capacity?

Physical Design Decisions

  • Physical and Virtual Network Zoning?
  • Application-level, Network-level Firewalls?
  • IDS and IPS?
  • SSL and IP-Sec VPNs?
  • Unified Threat Management?
  • Vendor selection?
  • Anti-Virus? Endpoint Protection?
  • Network Security Performance?
  • Security Information & Event Management (SIEM)?
  • Public Key Infrastructure (PKI)?
  • Nutanix Cluster security? STIG?
  • AHV host security?
  • Network security?
  • Storage security?
  • KMS for SED?
  • Backup security?
  • VM security?
  • Future Expansion?

I. BC/DR

Logical Design Decisions

  • Protection Mechanisms?
  • Manual or Automated Run-books?
  • RPO, RTO, WRT and MTD of Mission-Critical, Business-Critical and Non-Critical applications?
  • Global Site Load Balancers?
  • DNS TTL for clients?

Physical Design Decisions

  • DR Automation solution?
  • GSLB solution?
  • Internal and External DNS servers?
  • Multi-Site Application, Database or Message Queue clustering/replication?

Additional Resources

Published by

vcdx133

Chief Enterprise Architect and Strategist, 4xVCDX#133, NPX#8, DECM-EA.