That’s right. We lose uptime during every VMotion. Relax just a bit – I’m not talking about actual uptime/downtime availability of the guest VM in the datacenter. I’m speaking to the uptime performance metric tracked in the VirtualCenter database. It’s a bug that was introduced in VirtualCenter 2.0 and has remained in the code to this day in VirtualCenter 2.5.0 Update 3. Here’s how it works:
We’ve got a VM named Exchange1. VirtualCenter displays its uptime as 28 days is indicated by the screenshots below:
Now the VM has been VMotioned from host SOLO to host LANDO. Notice what has happened to uptime. It has disappeared from the console:
The real proof in what has ultimately happened is we see from the performance chart the latest uptime metric has been reset from 27.99 days as shown above to 0.0026620 days:
Sometimes the VIC console will show the uptime counter start over at 0 days, then on to 1 days, etc. Other times the uptime counter will remain blank for days or weeks as you can see from my three other VMs in the first screenshot which show no uptime.
This brings us to an interesting discussion. What would you like uptime in the VIC to mean exactly? Following are my observations and thoughts on VMware’s implementation of the uptime metric in VirtualCenter.
In previous versions of VirtualCenter, a soft reboot of the VM inside of the OS would reset the uptime statistic in VirtualCenter. I believe this was a function of VMware Tools that triggered this.
Today in VirtualCenter 2.5.0 Update 3, a soft reboot inside the guest VM does not reset the uptime statistic back to zero.
A VM which has no VMware Tools installed that is soft rebooted inside of the OS (ie. we’re not talking about any VMware console power operation here) does not reset the uptime statistic.
I could see the community take a few different sides on this as there are two variations of the definition of uptime we’re dealing with here. Uptime of the guest VM OS and uptime of the VM’s virtual hardware.
- Should uptime translate into how long the VM virtual hardware has been powered on from a virtual infrastructure standpoint?
- Or should uptime translate into how long the OS inside the VM has been up, tracked by VMware Tools?
The VMware administrator cares about the length of time a VM has been powered on. It is the powered on VM that consumes resources from the four resource food groups and impacts capacity.
The guest VM OS administrator, whether it be Windows or Linux, cares about uptime of the guest OS. The owner of the OS is held to SLAs by the business lines.
My personal opinion is that the intended use of the Virtual Infrastructure Client is for the VMware administrator and thus should reflect virtual infrastructure information. My preference is that the uptime statistic in VirtualCenter tracks power operations of the VM irregardless of any reboot sequences of the OS inside the VM. In other words, uptime is not impacted by VMware Tools heartbeats or reboots inside the guest VM. The uptime statistic should only be reset when the VM is powered off or power reset (including instances where HA has recovered a VM).
At any rate, due to the bug that uptime has in VirtualCenter 2.0 and above, it’s a fairly unreliable performance metric for any virtual infrastructure using VMotion and DRS. Furthermore, the term itself can be misleading depending on the administrators interpretation of uptime versus what’s written in the VirtualCenter code.
I submitted a post in VMware’s Product and Feature Suggestions forum in January of 2007 recording the uptime reset on VMotion issue. As this problem periodically bugs me, I followed up a few times. Once in a follow up post in the thread above, and at least one time externally requesting someone from VMware take a look at it. Admittedly I do not have an SR open.
VMware, can we get this bug fixed? After all, if the hypervisor has become an every day commodity item leaving the management tools as the real meat and potatoes, you should make sure our management tools work properly.
Thank you,
Jas