Have you worked with infrastructure platforms that were not quite right? Niggling little annoyances that do not impact delivering services but add that extra effort to get your job done? Things like self-signed SSL certificates, local user accounts and naming standards that make no sense.
These things translate into technical debt, that additional friction that makes it harder for an operations team to do their jobs effectively. When we add the time lost over the years the solution runs for, this amounts to hundreds of man-hours. The amount of effort to fix these things after an infrastructure platform is in production is so much harder than taking care of it when the platform was being built.
My message to the delivery architects and delivery engineers out there, as you are deploying your solutions, ensure you are making your infrastructure platforms as easy to own and operate as possible. Considerations such as:
- SSL certificates from the company Certificate Authority: nothing screams “amateur” more than having to accept self-signed certificates in a Web browser. It only takes a little more effort to complete the CSR request and CER import process and this will save future operators years of mouse clicks to “Add Exception” for “Invalid Security Certificate” messages.
- All infrastructure Syslog endpoints should point to a central Syslog server: Syslogs that are cached locally are of no use to you when that device is down for the count. A centralized syslog server gives you a time machine into holistically working out what happened with your entire infrastructure for a past event. Open Source Syslog servers like syslog-ng are free. If you are running vSphere, get licensed for vRealize Log Insight, the plug-ins for vSphere are built into the product.
- All infrastructure management interfaces are integrated with AD and use RBAC via AD groups: Maintaining a bunch of local accounts with separate passwords for the different components of an infrastructure solution make no sense. Configure SSO for the entire solution, so that the operators can login using their domain credentials. Use AD groups for role-based access control, that way when a new employee joins the team, they are placed into the same AD group as their colleagues and they immediately have the access they need.
- Common naming standard that is human readable: another pet peeve of mine, use a naming standard that applies to every facet of the infrastructure solution (App, Compute, Network, Storage, DR, Data Protection, Cloud, etc.). One that someone can read and instantly understand what they are looking at and does not require them to open a spreadsheet to decode an obscure alpha-numeric string.
- Day-2 Lifecycle Management: most platforms now have some type of lifecycle management that allows the automated deployment of patches and updates. Design, build and test them as part of the solution. Do not leave this for the operations team to take care of after the fact. Things such as vRealize Suite Lifecycle Manager, vSphere Update Manager, Nutanix Lifecycle Manager. If you are designing a VMware SDDC, look at VCF with vSAN-Ready Nodes and VCF on VxRail or better yet, consider VMC on AWS. If you are going down the Nutanix route, take a look at Nutanix with AHV.
If you have other “Not Quite Right” examples, feel free to add a comment. Thanks for reading this far!