This is Part 13 of the Nutanix XCP Deep-Dive, covering AHV design considerations.
This will be a multi-part series, describing how to design, install, configure and troubleshoot an advanced Nutanix XCP solution from start to finish for vSphere, AHV and Hyper-V deployments:
- Nutanix XCP Deep-Dive – Part 1 – Overview
- Nutanix XCP Deep-Dive – Part 2 – Hardware Architecture
- Nutanix XCP Deep-Dive – Part 3 – Platform Installation
- Nutanix XCP Deep-Dive – Part 4 – Building a Nutanix SE Toolkit
- Nutanix XCP Deep-Dive – Part 5 – Installing ESXi Manually with Phoenix
- Nutanix XCP Deep-Dive – Part 6 – Installing ESXi with Foundation
- Nutanix XCP Deep-Dive – Part 7 – Installing AHV Manually
- Nutanix XCP Deep-Dive – Part 8 – Installing AHV with Foundation
- Nutanix XCP Deep-Dive – Part 9 – Installing Hyper-V Manually with Phoenix
- Nutanix XCP Deep-Dive – Part 10 – Installing Hyper-V with Foundation
- Nutanix XCP Deep-Dive – Part 11 – Benchmark Performance Testing
- Nutanix XCP Deep-Dive – Part 12 – ESXi Design Considerations
- Nutanix XCP Deep-Dive – Part 13 – AHV Design Considerations
- Nutanix XCP Deep-Dive – Part 14 – Hyper-V Design Considerations
- Nutanix XCP Deep-Dive – Part 15 – Data Center Facility Design Considerations
- Nutanix XCP Deep-Dive – Part 16 – The Risks
- Nutanix XCP Deep-Dive – Part 17 – CVM Autopathing with ESXi
- Nutanix XCP Deep-Dive – Part 18 – more to come as the series evolves (Cloud Connect to AWS and Azure, Prism Central, APIs, Metro, DR, etc.)
I have aggregated all of the design considerations I could find that need to be assessed in a Nutanix XCP architecture design with Acropolis Hypervisor. Brevity and bullet-points are used to keep the information concise and readable. If you want more information on a concept use the NPX Link-O-Rama. NOTE: Clearly understand what AHV is capable of.
This post will be updated with additional information as part of the NPX Link-O-Rama. If you have content to contribute, post a comment below.
Business Goals
- What are the business goals of the solution?
Requirements/Constraints/Assumptions
- What are the requirements, constraints and assumptions of the solution?
Risks
A. Data Center Facility
Logical Design Decisions
- Single-site or Multi-site Data Center Facilities?
- Data Center type – “Bricks & Mortar”, Co-location, Pre-Fabricated or Performance Optimised Data Centers (PODs)?
- Management & Control Plane will be separated from the Data Plane?
Physical Design Decisions
- Physical location(s) of Data Center Facilities?
- Distances between Data Centers?
- Type of Data Center Facilities?
- Power and Cooling requirements for the solution?
- Can the Data Center Facility handle high density infrastructure?
- Rack layouts for the solution?
B. Virtual Infrastructure Management
Logical Design Decisions
- Number of Pooled Compute, Network and Storage resources?
- What services are you delivering?
- Required availability levels of virtualisation management systems?
- 3rd party integrations: IT Service Management, Infrastructure Management systems, Enterprise services (DNS, LDAP, NTP, PKI, Syslog, SNMP Traps), Vendor Data collection
- Advanced Operations
- Hypervisor Workload Protection mechanisms?
- Hypervisor Workload Resource Balancing mechanisms?
Physical Design Decisions
- Hypervisor: AHV and which version? (Hyper-V and ESXi have been dropped to align with the Conceptual Model/Logical Design)
- Prism Central to aggregate clusters?
- AHV/AMF components that will be used? Only use what you need.
- Enterprise Management solution to integrate with?
- How will Nutanix Pulse and Nutanix Remote Access be used?
- Any Service Desk or Change Management requirements that must be met?
- What are the VM-HA requirements?
- DNS and NTP integration?
- Role Based Access Control and LDAP integration?
- What Nutanix licencing is required?
- 3rd party software licencing considerations? Per physical socket/core or vCPU? Dedicated clusters required?
C. Compute
Logical Design Decisions
- Traditional Monolithic Compute, Server-Side Flash Cache Acceleration with legacy infrastructure, Converged Infrastructure or Hyper-Converged Infrastructure? Obviously this must align with the Storage section.
- Minimum number of Hypervisor Hosts per Cluster
- Host sizing: Scale Up or Scale Out?
- Homogeneous or Heterogeneous nodes?
- Number of Sockets per Host?
- Host Spanning for Failure Domains?
- Required CPU Capacity?
- Required Memory Capacity?
Physical Design Decisions
- HCI Vendor: Nutanix XCP, Dell XC or Lenovo (all other vendors/technologies have been dropped to align with the Logical Design/Conceptual Model)
- Processor type: Intel (AMD not supported by Nutanix)
- Intel CPU Features: VT-x, Hyper-threading, Turbo Boost, NUMA enabled?
- Cluster Hardware and Configuration?
- Inter-Mix rules are being followed?
- Number of Acropolis clusters?
- Nutanix family and model number?
- Number of CPU sockets per node?
- Model of Intel Processor, number of cores and GHz per core?
- vGPU required? Not supported.
- Host locations?
- Single Rack, Multi-Rack with striping?
- Cluster Availability requirements?
- Nutanix Redudancy Factor?
- Nutanix Availability Domains?
- Align compute availability with storage availability?
- NUMA boundaries?
- Future expansion?
D. Storage
Logical Design Decisions
- Traditional Monolithic Storage, Server-Side Flash Cache Acceleration with legacy infrastructure, Converged Infrastructure or Hyper-Converged Infrastructure? Obviously this must align with the Compute section.
- Block-based or IP-based Storage Access?
- Homogeneous or Heterogeneous storage nodes?
- Automated storage management?
- RDM devices allowed?
- Hypervisor boot method? DAS, LUN or PXE?
- Thin or Thick provisioning for Back-end and VMs?
- Required storage resources (performance and capacity)?
- Storage replication?
Physical Design Decisions
- HCI Vendor: Nutanix XCP, Dell XC or Lenovo and AOS version? (all other vendors/technologies have been dropped to align with the Logical Design/Conceptual Model) Obviously this must align with the Compute section.
- Usable Storage Calculation, considering Storage Pools, Replication Factor, Usable Capacity and Usable Performance?
- Number of SSD and HDD drives per Node?
- Nutanix used to publish the Diagnostics results in the release notes of each NOS version, but has stopped doing this.
- Also consider Number of Containers, Free-Space Reservations, Deduplication, Compression, Erasure Coding and Acropolis Volumes API.
- Controller VM Sizing across the cluster?
- Capacity nodes required for existing or new clusters?
- Inter-Mix rules are being followed?
- The performance of each release is very subjective and the Diagnostics results are useful as an indicator and benchmark for basic verification.
- Proper verification of storage performance should be validated during the Test Phase of the Implementation Plan.
- The Public version of the Nutanix Sizer Tool does not include storage performance, only capacity. Contact your Nutanix Partner for a cluster design that meets your required performance profile.
- Active Working Set required for each node?
- Self-Encrypting Disks? If yes, consider the KMS requirements.
- AHV host boot must be from SATA-DOM (USB), this is a Nutanix constraint.
- Default Auto-Tiering (ILM) thresholds?
- iSCSI vDisks per Nutanix cluster?
- Image Service?
- Asynchronous DR required?
- Future expansion?
E. Network
Logical Design Decisions
- Legacy 3-Tier Switch, Collapsed Core or Clos-type Leaf/Spine?
- Clustered Physical or Standalone EoR/ToR Switches?
- Stretched or Per Rack VLANs?
- Functional traffic types separated with vSwitches or VLANs?
- Jumbo Frames?
- Quality of Service?
- Load Balancing?
- IP version?
- Inter-Data Center links, including RTT?
- Required Network Capacity?
- Single vNIC or Multi vNIC VMs allowed?
Physical Design Decisions
- Clos-type Leaf/Spine vendor selection for large installations?
- Blocking or non-blocking Data Center switch fabric?
- If blocking, what is the over-subscription ratio?
- What is the traffic path for North/South and East/West traffic?
- Where are the Layer 3 gateways for each IP Subnet?
- Any Dynamic Routing requirements?
- Is Multi-Cast required?
- End-to-End Jumbo Frames?
- Host interfaces: 1GbE and/or 10GbE? How many per node?
- LAGs or unbonded host interfaces?
- Management overlay required for KVM and IPMI?
- Physical LAN Performance?
- Host interface connectivity matrix?
- Metro Ethernet required between Data Centers?
- QoS and Network Control?
- Edge QoS not supported, use Access/Leaf Switches for QoS?
- VLAN Pruning?
- Spanning Tree considerations?
- TCP Offload enabled?
- Separate OVS Bridges per node?
- Teaming and Load Balancing (ab, slb, tcp)?
- VLAN Networks with or without IPAM?
- Future Expansion?
F. Backup/Recovery
Logical Design Decisions
- VM Image Backup Frequency?
- Application and Database Consistent Backup Frequency?
- Backup Restore Times?
- Physical Separation of Operational Data and Backup Data?
- Required Backup Resources
- Required Backup and Restore Performance
Physical Design Decisions
- Backup/Recovery solution?
- Prism Element REST-API integration with CommVault IntelliSnap?
- Backup/Restore mechanism?
- VM-Centric Snapshots?
- Async DR Replication of VM-Centric Snapshots to remote cluster/cloud connect (AWS/Azure)?
- Backup frequency?
- Retention period?
- Backup capacity and performance?
- Fast restore of management cluster direct to host?
- Future expansion?
G. Virtual Machines
Logical Design Decisions
- Standard VM T-shirt sizes?
- VM CPU and RAM management mechanisms used?
- Location of VM files?
- Guest OS standardisation?
- 64-bit and 32-bit?
- Templates used?
Physical Design Decisions
- Standard VMs of what size?
- Largest VM fits within NUMA boundaries?
- vApps and Resource Pools?
- VM files on shared storage?
- Standard vDisk setups per VM?
- Thin provisioned vDisks?
- Nutanix or vSphere Snapshots allowed?
- CBT enabled?
- 64-bit/32-bit Guest OS versions?
- vSCSI adapters?
- vNIC adapters?
- VM Hardware version?
- VirtIO drivers installed and which version?
- VM Options?
- VM Templates?
- VM Template Repository?
- Mission-Critical/Business-Critical Application considerations?
H. Security
Logical Design Decisions
- Zones of Trust?
- Defence-in-Depth?
- Multi-Vendor?
- Physical separation requirements?
- Compliance standards?
- Virtualisation security requirements?
- Required Network Security Capacity?
Physical Design Decisions
- Physical and Virtual Network Zoning?
- Application-level, Network-level Firewalls?
- IDS and IPS?
- SSL and IP-Sec VPNs?
- Unified Threat Management?
- Vendor selection?
- Anti-Virus? Endpoint Protection?
- Network Security Performance?
- Security Information & Event Management (SIEM)?
- Public Key Infrastructure (PKI)?
- Nutanix Cluster security? STIG?
- AHV host security?
- Network security?
- Storage security?
- KMS for SED?
- Backup security?
- VM security?
- Future Expansion?
I. BC/DR
Logical Design Decisions
- Protection Mechanisms?
- Manual or Automated Run-books?
- RPO, RTO, WRT and MTD of Mission-Critical, Business-Critical and Non-Critical applications?
- Global Site Load Balancers?
- DNS TTL for clients?
Physical Design Decisions
- DR Automation solution?
- GSLB solution?
- Internal and External DNS servers?
- Multi-Site Application, Database or Message Queue clustering/replication?
Additional Resources