Nutanix .NEXT 2020 Announcements

This week, Nutanix .NEXT 2020 is being held as a digital event due to COVID-19. This combines the traditional US and EU .NEXT programs into one event.

Nutanix Core is the leader in the HCI market. With that being said, Nutanix is certainly not resting on its laurels and continues to innovate in that space with the new BlockStore/SPDK and Optane announcements. They continue to innovate and blaze a trail for the competition to follow. Moving the governance/security module from Xi Beam to Flow in Prism Central is an interesting move. Consuming this service from Prism Central will increase adoption I think. VPCs on-prem (along with Flow) is beefing up the Nutanix offering to complete in the Network Virtualization market, which was always a hole in their game.

The announcements:

  • Foundation Central will support 50K VMs and 500 Clusters
  • Self Tuning feature to view and resolve application issues
  • New licensing tier: Prism Ultimate – App Insights and Cost Showback, Metrics to drive business efficiency and new tier to drive AI Ops
  • AOS performance improvements with Block Store, SPDK and Optane support
  • Deploy Files & Objects anywhere from Prism Central
  • 60 second RPO support for Files & Objects
  • Cold Data Tier support for Files & Objects
  • Ransomware Protection with Detection, Prevention (immutable snapshots) and Recovery (immutable objects WORM storage)
  • Security Central with security module from Beam moved to Flow (in Prism Central)
  • VPCs On-Prem with AHV (Layer 2 extension over Layer 3 networks)
  • Nutanix Central announced (Multi-Cloud DevOps SaaS)
  • Karbon Services PaaS Family announced (Multi-Cloud PaaS)
  • Citrix on Nutanix Clusters announced
  • Nutanix Era multi-cluster support announced
  • Nutanix Clusters on Azure announced
  • Calm-as-a-Service announced
  • Service Providers running Nutanix software

Screenshots:

VMware Livefire for Partners and Employees

VMware offers employees and VMware Partners the opportunity to attend VMware Livefire advanced training each year. These are advanced courses where experts collaborate through training, lab exercises and discussions on how to implement enterprise-level VMware solutions.

VMware Livefire course catalog (as of writing – it constantly changes):

The value of attending:

  • Hands-on Lab access to latest VMware technologies, including VMC, AWS, Azure services (varies per track).
  • Training is aimed at experts who design and deliver advanced enterprise VMware solutions to the customer.
  • Great way to up-skill rapidly.
  • Normally you are required to attend an on-site class, however with COVID-19 you can attend virtual sessions, which is a really easy way to learn without the hassle and expense of travel.
  • You get an Acclaim badge for each 4-day Livefire course you attend and complete.

Requirements to be invited to attend:

  • Be a VMware Partner or VMware Employee.
  • Be VCP certified for the track you want to attend.

How it works:

  • VMware Livefire is offered for free. It does have an estimated $4K value.
  • Each 4-day/1-day session has limited seats (due to lab resources).
  • Courses are offered in three regions: Americas, EMEA & APJ.
  • Be invited by the VMware Technical Enablement Manager or Partner Business Manager in your region.
  • On the first day of course, you will be assigned access to the lab environments and manuals for the duration.
  • The Livefire teams assigned to each track (some are VCDX certified) spend a considerable amount of time updating and evolving the lab scenarios. So it is worth your time to attend updated Livefire courses every few years.
  • The Livefire team is not a service delivery function, you must use VMware PSO or Partner PS for this.

Book 4 of the IT Architect Series announced

The IT Architect Series is getting ready to release their 4th book in the series titled “Stories from the Field – Horror stories and lessons learned from IT Architects, Operations and Project Management.” This effort was led by Matthew Wood, John Arrasjid and Mark Gabryjelski with a total of 34 stories from 34 contributors. The contributor list is a who’s who of vCommunity, vExpert, Nutanix NTC, VCDX, NPX and DECM-EA certified individuals. If you want to learn from the mistakes of the best, this book is for you.

Each story is tagged with Topic Codes that assist the reader in locating their areas of interest. The chapters are consistently structured to explain the What, How, Why, etc. for ease of reading.

The impressive cover art was designed by Ioannis Dangerous Age.

When this book is released, you will be able to buy it as an eBook or printed copy. Keep on eye on the IT Architect Series site for updates.

The other books available in the IT Architect Series are:

Nutanix Clusters is Live

Nutanix Clusters on AWS is now live. Formerly known as Xi Clusters, this offering has been talked about for a few years; great to see it has finally arrived. I think the reason for the long incubation period is that Nutanix wanted to get it right. This is a great offering for those customers that want to continue their journey to hybrid cloud using Nutanix software.

Using it is quite simple, you subscribe to Nutanix Clusters, link to your AWS account and deploy. Then you can link your existing Prism Central instance to the AWS-based Nutanix Cluster to provide a single management plane. For the budget constrained, they also have a pause button to save the state of the cluster which avoids expensive AWS charges.

Additional Information:

Performance Considerations when running Nutanix on vSphere

Here are some performance considerations for running Nutanix AOS 5.10 or higher on vSphere 6.7 U3b.

In vSphere 6.7 you may have noticed the introduction of Skyline Health (vSphere Client, vCenter Server object, Monitor, Skyline Health) and the reporting of the Compute Health Checks. You may have also noticed the informational alert in the ESXi summary tab that L1TF is present (vSphere Client, ESXi object, Summary tab). This is the VMware alert to mitigate CVE-2018-3646, a vulnerability in Intel processors; VMware KB 55636 covers it in detail. All of the other Skyline Health Compute Health Check alerts can be mitigated by using vUM to apply the latest ESXi security patches/ESXi driver updates and using Nutanix LCM to apply the latest Firmware updates.

In the screenshots below (via Nutanix X-Ray), the Random Write IOPS values (this metric correlates to CPU performance) for a Nutanix on vSphere cluster with SCAv2 enabled and disabled; if you do that math it is a 10% performance drop as advertised in VMware KB 55806. SCAv1 is a 30% CPU performance impact. If your organization deems L1TF to be a vulnerability that must be mitigated, build it into your cluster sizing calculations. Also consult with Nutanix Support on the correct CVM vCPU sizing, since Nutanix Sizer and Nutanix Foundation do not account for it.

If you decide to leave CVE-2018-3646 unresolved, you will have to delete the “Warning” Rule from the vSphere Health Alarm Definition (vSphere Client, vCenter Server object, Configure, Alarm Definitions, Filter “vSphere Health”, Edit), this removes the continuous “vSphere Health detected new issues in your environment” warning from vCenter Server (but leaves the “Critical” Rule in play). It is not possible to disable specific items from Skyline Health in vSphere 6.7, although you can disable Skyline Health entirely by leaving the CEIP.

If you have a node with 6-cores per socket (possibly to mitigate application licensing costs), be aware that Nutanix Foundation will deploy an 8 vCPU CVM that exceeds the NUMA boundaries of the 6-core Intel socket. Work with Nutanix Support to configure the “numa.nodeAffinity” setting for each Nutanix CVM.

Nutanix on vSphere must use NFSv3 Datastores. Make sure you account for the fact that the NFSv3 software in VMware vSphere 6.7 has a read performance limitation per host (approx. 130K Random Read IOPS @ 8K and approx. 2.12 GB/s Sequential Read @ 1M.). This can be mitigated by adding a second Datastore and spreading the vDisks of a Monster VM across two Datastores. You can also choose to use Nutanix Volume Groups instead of VMDKs (Guest OS iSCSI Initiator required with a Data Services IP on the Nutanix AOS cluster).

Not Quite Right Infrastructure Platforms

Have you worked with infrastructure platforms that were not quite right? Niggling little annoyances that do not impact delivering services but add that extra effort to get your job done? Things like self-signed SSL certificates, local user accounts and naming standards that make no sense.

These things translate into technical debt, that additional friction that makes it harder for an operations team to do their jobs effectively. When we add the time lost over the years the solution runs for, this amounts to hundreds of man-hours. The amount of effort to fix these things after an infrastructure platform is in production is so much harder than taking care of it when the platform was being built.

My message to the delivery architects and delivery engineers out there, as you are deploying your solutions, ensure you are making your infrastructure platforms as easy to own and operate as possible. Considerations such as:

  • SSL certificates from the company Certificate Authority: nothing screams “amateur” more than having to accept self-signed certificates in a Web browser. It only takes a little more effort to complete the CSR request and CER import process and this will save future operators years of mouse clicks to “Add Exception” for “Invalid Security Certificate” messages.
  • All infrastructure Syslog endpoints should point to a central Syslog server: Syslogs that are cached locally are of no use to you when that device is down for the count. A centralized syslog server gives you a time machine into holistically working out what happened with your entire infrastructure for a past event. Open Source Syslog servers like syslog-ng are free. If you are running vSphere, get licensed for vRealize Log Insight, the plug-ins for vSphere are built into the product.
  • All infrastructure management interfaces are integrated with AD and use RBAC via AD groups: Maintaining a bunch of local accounts with separate passwords for the different components of an infrastructure solution make no sense. Configure SSO for the entire solution, so that the operators can login using their domain credentials. Use AD groups for role-based access control, that way when a new employee joins the team, they are placed into the same AD group as their colleagues and they immediately have the access they need.
  • Common naming standard that is human readable: another pet peeve of mine, use a naming standard that applies to every facet of the infrastructure solution (App, Compute, Network, Storage, DR, Data Protection, Cloud, etc.). One that someone can read and instantly understand what they are looking at and does not require them to open a spreadsheet to decode an obscure alpha-numeric string.
  • Day-2 Lifecycle Management: most platforms now have some type of lifecycle management that allows the automated deployment of patches and updates. Design, build and test them as part of the solution. Do not leave this for the operations team to take care of after the fact. Things such as vRealize Suite Lifecycle Manager, vSphere Update Manager, Nutanix Lifecycle Manager. If you are designing a VMware SDDC, look at VCF with vSAN-Ready Nodes and VCF on VxRail or better yet, consider VMC on AWS. If you are going down the Nutanix route, take a look at Nutanix with AHV.

If you have other “Not Quite Right” examples, feel free to add a comment. Thanks for reading this far!

vUM Scan Cluster Error

This post is applicable to customers using VMware vCenter Server Appliance (vCSA) 6.7 Update 3d with vSphere Update Manager.

Problem:

  1. vCenter Server Appliance 6.7 Update 3d with vSphere Update Manager configured to scan a cluster of ESXi hosts. DNS is fully configured and functioning correctly for ESXi and vCSA.
  2. The Check Compliance action from the vSphere Client returns the error message: “There are errors during the scan operation. Check the events and log files for details.”
  3. In the /var/log/vmware/vmware-updatemgr/vum-server/vmware-vum-server-log4cpp.log file of the vCSA instance, there are “HostUpdateDepotManager” ERROR messages with http://<vcsa url>:9048/vum/repository/hostupdate/vmw/vmw-ESXi-6.7.0-metadata.zip – Name or service not known” listed.

Continue reading vUM Scan Cluster Error