Diskeeper Corporation reached out to me via email last week letting me know that they’ve released a new white paper on optimizing VMs. I’m making the three page document available for download via the following link:
Archive for June, 2011
VMware ESXi is mainstream. If you’ve ever deployed a VMware vMA appliance to manage ESXi (or heck, even ESX for that matter), you may have noticed the enforcement of a complex password policy for the vi-admin account. For example, setting a password of password is denied because it is based on a dictionary word (in addition to other morally obvious reasons).
However, you can bend the complexity rules and force a simple password after the initial deployment using sudo. You’ll still be warned about the violation of the complexity policy but by using sudo, the policy is allowed to be bypassed by a higher authority:
sudo passwd vi-admin
This tip isn’t specific to VMware or the vMA appliance. It is general *nix knowledge. There is ample documentation available which discusses the password complexity mechanism in various versions of *nix. Another approach to bypassing the complexity requirement would be to actually relax the requirement itself but this would impact other local accounts potentially in use on the vMA appliance which may still require complex passwords. Using the sudo command will be faster and leaves the default complex security mechanism in place.
After enabling Local Tech Support Mode on an ESXi host via the DCUI (Direct Console User Interface), a yellow balloon styled warning will be displayed in the vSphere Client:
The Local Tech Support Mode for the host has been enabled
Likewise, if you’ve enabled Remote Tech Support Mode via SSH, you’ll see:
Remote Tech Support Mode(SSH) for the host has been enabled
KB Article 1016205 describes this condition as a security measure. Adhering to the warnings would be a best practice for a production or high risk environment. However, for lab, development, or environments with adequate perimeter security, it may be desirable to have either or both modes enabled but the warnings throughout the vSphere Client aren’t welcomed.
The VMware KB article goes on to say that there is no way to eliminate the warnings while leaving Local or Remote Tech Support Mode enabled.
Disabling Remote Tech Support Mode (SSH) and Local Tech Support Mode is the only way to prevent this warning.
While there may not be be an advanced configuration exposed, rebooting the host eliminates the conditional warnings. It has also been reported in the VMware community forums that restarting the hostd service also works as follows, but as a side effect, will likely and temporarily disconnect the host from a vCenter Server:
Xangati Packs More Power in Free VMware Management Tool
Expands Functionality of Xangati for ESX with Performance Health Engine for Any Given Host
Cupertino, CA – June 22, 2011 – Xangati, the recognized leader in infrastructure performance management, today announced that it has expanded the capabilities and power offered in its free VMware management tool, Xangati for ESX. Xangati for ESX now includes several features from its recently announced and patent-pending Performance Health Engine – a real-time health index that monitors the health of every object within the virtualized infrastructure and a key component of Xangati’s multi-host Xangati Virtual Infrastructure (VI) and Virtual Desktop Infrastructure (VDI) Dashboards. With the updated Xangati for ESX, virtualization managers now have an even clearer picture of their VM activity, as well as the ability to fully monitor a single ESX host – all at no cost.
“Xangati is continuously looking for ways to improve our infrastructure performance management solutions in order to provide the highest value to virtualization managers – and that objective is absolutely no different for our free Xangati for ESX tool,” said Alan Robin, CEO of Xangati. “The response to our Performance Health Engine – for both our VI and VDI dashboards – inspired us to incorporate some of its capabilities into our free tool, so that everyone can experience and benefit from real-time health analysis – in any stage of their virtualization initiative.”
“By incorporating its health index into the free Xangati for ESX, Xangati allows virtualization managers to create a baseline for the infrastructure,” said David Davis, vExpert and blogger. “When any unusual activity occurs on the infrastructure, the tool alerts you and identifies the problem area. This ability – plus Xangati’s trademark DVR recordings – provide for the most comprehensive troubleshooting available, differentiating Xangati from other virtual performance monitoring tools – all for free.”
New Capabilities Streamline Management and Ensure User Satisfaction
With its new enhancements, Xangati for ESX gives managers deeper insights into any potential problems within virtualized environments by immediately and visually alerting them to any anomalies. Xangati achieves this unique health alert system by comparing real-time data feeds with established performance profiles for up to 10 VMs running on an ESX host supporting virtualized servers or virtual desktops. Its memory-based architecture allows Xangati to compare this data and identify any performance shifts live and continuously – not through intermittent polling intervals – giving managers unparalleled insights for faster troubleshooting. These insights, in turn, provide confidence for the migration of mission-critical applications in the VI and ensure end user satisfaction – the biggest factor in determining the success of VDI initiatives.
Xangati for ESX still includes all of Xangati’s trademark features, including: continuous scroll-bar and drill-down user interface (UI) capabilities for dynamic and real-time navigation; visibility into more than 100 metrics on an ESX/ESXi host and its VMs activity; and automated DVR recordings (triggered by VMware alerts) to capture critical events for replay analysis for precision troubleshooting as opposed to sifting through unstructured log files. Xangati for ESX is also deployed in Open Virtualization Format (OVF) to facilitate a faster and easier installation process. Xangati is committed to continue to incorporate capabilities that add value and help accelerate virtualization initiatives.
Available immediately, the updated Xangati for ESX works with VMware 3.5, 4.0 and 4.1 for ESX and ESXi. Xangati has also created an updated installation video and documentation for additional background about the new features in order to enable virtualization managers to begin using and benefiting from the free tool as quickly as possible. To access the installation video and download a copy of the free Xangati for ESX, go to http://xangati.com/xangati-for-esx-new-features/.
Xangati, the recognized leader in Infrastructure Performance Management (IPM), provides unparalleled performance management for the emerging and transformational data center architectures impacting IT today, including server virtualization, cloud computing and VDI. Its award-winning suite of IPM solutions accelerates cloud computing and virtualization initiatives by providing unprecedented visibility and real-time continuous insights into the entire infrastructure. Leveraging its powerful precision analytics, Xangati’s health performance index provides a new way to view and manage performance – in real-time – at a scale previously not possible.
Founded in 2006, Xangati, Inc. is a privately held company with corporate headquarters based in Cupertino, California. Xangati has been granted numerous technology patents for its unique and comprehensive approach to Infrastructure Performance Management. Xangati is a VMware Technology Alliance Partner and certified Citrix Ready Partner and supports both VMware View and Citrix XenDesktop, as well as other virtualization environments. For more information, visit the company website at http://www.xangati.com.
There is a VMware storage whitepaper available which is titled Scalable Storage Performance. It is an oldie but goodie. In fact, next to VMware’s Configuration Maximums document, it is one of my favorites and I’ve referenced it often. I like it because it is efficient in specifically covering block storage LUN queue depth and SCSI reservations. It was written pre-VAAI but I feel the concepts are still quite relevant in the block storage world.
One of the interrelated components of queue depth on the VMware side is the advanced VMkernel parameter Disk.SchedNumReqOutstanding. This setting determines the maximum number of active storage commands (IO) allowed at any given time at the VMkernel. In essence, this is queue depth at the hypervisor layer. Queue depth can be configured at various points in the path of an IO such as the VMkernel which I already mentioned, in addition to the HBA hardware layer, the kernel module (driver) layer, as well as at the guest OS layer.
Getting back to Disk.SchedNumReqOutstanding, I’ve always lived by the definition I felt was most clear in the Scalable Storage Performance whitepaper. Disk.SchedNumReqOutstanding is the maximum number of active commands (IO) per LUN. Clustered hosts don’t collaborate on this value which implies this queue depth is per host. In other words, each host has its own independent queue depth, again, per LUN. How does Disk.SchedNumReqOutstanding impact multiple VMs living on the same LUN (again, same host)? The whitepaper states each VM will evenly share the queue depth (assuming each VM has identical shares from a storage standpoint).
When virtual machines share a LUN, the total number of outstanding commands permitted from all virtual machines to that LUN is governed by the Disk.SchedNumReqOutstanding configuration parameter that can be set using VirtualCenter. If the total number of outstanding commands from all virtual machines exceeds this parameter, the excess commands are queued in the ESX kernel.
I was recently challenged by a statement agreeing to all of the above but with one critical exception: Disk.SchedNumReqOutstanding provides an independent queue depth for each VM on the LUN. In other words, if Disk.SchedNumReqOutstanding is left at its default value of 32, then VM1 has a queue depth of 32, VM2 has a queue depth of 32, and VM3 has its own independent queue depth of 32. Stack those three VMs and we arrive at a sum total of 96 outstanding IOs on the LUN. A few sources were provided to me to support this:
You can adjust the maximum number of outstanding disk requests with the Disk.SchedNumReqOutstanding parameter in the vSphere Client. When two or more virtual machines are accessing the same LUN, this parameter controls the number of outstanding requests that each virtual machine can issue to the LUN.
You can adjust the maximum number of outstanding disk requests with the Disk.SchedNumReqOutstanding parameter. When two or more virtual machines are accessing the same LUN (logical unit number), this parameter controls the number of outstanding requests each virtual machine can issue to the LUN.
The problem with the two statements above is that I feel they are poorly worded, and as a result, misinterpreted. I understand what the statement is trying to say, but it’s implying something quite a bit different depending on how a person reads it. Each statement is correct in that Disk.SchedNumReqOutstanding will gate the amount of active IO possible per LUN and ultimately per VM. However, the wording implies that the value assigned to Disk.SchedNumReqOutstanding applies individually to each VM which is not the case. The reason I’m pointing this out is due to the number of misinterpretations I’ve subsequently discovered via Google which I gather are the result of reading one of the latter sources above.
The scenario can be quickly proven in the lab. Disk.SchedNumReqOutstanding is configured for the default value of 32 active IOs. Using resxtop, I see my three VMs cranking out IO with IOMETER. Each VM is configured with IOMETER to create 32 active IOs. If what I’m being told by the challenge is true, I should be seeing 96 active IO being generated to the LUN from the combined activity of the three VMs.
But that’s not what’s happening. Instead what I see is approximately 32 ACTV (active) IOs on the LUN, with another 67 IOs waiting in queue (by the way, ESXTOP statistic definitions can be found here). In my opinion, the Scalable Storage Performance whitepaper most accurately and best defines the behavior of the Disk.SchedNumReqOutstanding value.
Now going back to the possibility of the Disk.SchedNumReqOutstanding stacking, LUN utilization could get out of hand rapidly with 10, 15, 20, 25 VMs per LUN. We’d quickly exceed the max supported value of Disk.SchedNumReqOutstanding (and all HBAs I’m aware of) which is 256. HBA ports themselves typically support a few thousand IOPS. Stacking the queue depths for each VM could quickly saturate an HBA meaning we’d get a lot less mileage out of those ports as well.
While having a queue depth discussion, it’s also worth noting the %USD value is at 100% and LOAD is approximately 3. The LOAD statistic corroborates the 3:1 ratio of total IO:queue depth and both figures paint the picture of an oversubscribed LUN from an IO standpoint.
In conclusion, I’d like to see VMware modify the wording in their documentation to provide better understanding leaving nothing open to interpretation.
Update 6/23/11: Duncan Epping at Yellow Bricks responded with a great followup Disk.SchedNumReqOutstanding the story.
Detect corruption/integrity problems with your .vmx files with the following Service Console command (ESX only):
/usr/bin/vmware-configcheck | grep -v PASS
The command walks through the inventory and checks whether each virtual machine in the inventory matches the set of rules on configuration that is indicated in the rules file on the host: /etc/vmware/configrules
Why would you do this? As stated above, the utility will detect corruption in the construct of a .vmx file. Integrity issues in the .vmx file can cause a failed vMotion and the VM will crash. I’ve seen it happen twice (out of tens of thousands of vMotion operations).
The command appears to have been deprecated and is not included in ESXi.