Uptime lost during VMotion

January 11th, 2009 by jason Leave a reply »

That’s right. We lose uptime during every VMotion. Relax just a bit – I’m not talking about actual uptime/downtime availability of the guest VM in the datacenter. I’m speaking to the uptime performance metric tracked in the VirtualCenter database. It’s a bug that was introduced in VirtualCenter 2.0 and has remained in the code to this day in VirtualCenter 2.5.0 Update 3. Here’s how it works:

We’ve got a VM named Exchange1. VirtualCenter displays its uptime as 28 days is indicated by the screenshots below:

1-11-2009 12-04-06 AM

1-11-2009 12-10-14 AM

Now the VM has been VMotioned from host SOLO to host LANDO. Notice what has happened to uptime. It has disappeared from the console:

1-11-2009 12-07-37 AM

The real proof in what has ultimately happened is we see from the performance chart the latest uptime metric has been reset from 27.99 days as shown above to 0.0026620 days:

1-11-2009 12-10-54 AM

Sometimes the VIC console will show the uptime counter start over at 0 days, then on to 1 days, etc. Other times the uptime counter will remain blank for days or weeks as you can see from my three other VMs in the first screenshot which show no uptime.

This brings us to an interesting discussion. What would you like uptime in the VIC to mean exactly? Following are my observations and thoughts on VMware’s implementation of the uptime metric in VirtualCenter.

In previous versions of VirtualCenter, a soft reboot of the VM inside of the OS would reset the uptime statistic in VirtualCenter. I believe this was a function of VMware Tools that triggered this.

Today in VirtualCenter 2.5.0 Update 3, a soft reboot inside the guest VM does not reset the uptime statistic back to zero.

A VM which has no VMware Tools installed that is soft rebooted inside of the OS (ie. we’re not talking about any VMware console power operation here) does not reset the uptime statistic.

I could see the community take a few different sides on this as there are two variations of the definition of uptime we’re dealing with here. Uptime of the guest VM OS and uptime of the VM’s virtual hardware.

  1. Should uptime translate into how long the VM virtual hardware has been powered on from a virtual infrastructure standpoint?
  2. Or should uptime translate into how long the OS inside the VM has been up, tracked by VMware Tools?

The VMware administrator cares about the length of time a VM has been powered on. It is the powered on VM that consumes resources from the four resource food groups and impacts capacity.

The guest VM OS administrator, whether it be Windows or Linux, cares about uptime of the guest OS. The owner of the OS is held to SLAs by the business lines.

My personal opinion is that the intended use of the Virtual Infrastructure Client is for the VMware administrator and thus should reflect virtual infrastructure information. My preference is that the uptime statistic in VirtualCenter tracks power operations of the VM irregardless of any reboot sequences of the OS inside the VM. In other words, uptime is not impacted by VMware Tools heartbeats or reboots inside the guest VM. The uptime statistic should only be reset when the VM is powered off or power reset (including instances where HA has recovered a VM).

At any rate, due to the bug that uptime has in VirtualCenter 2.0 and above, it’s a fairly unreliable performance metric for any virtual infrastructure using VMotion and DRS. Furthermore, the term itself can be misleading depending on the administrators interpretation of uptime versus what’s written in the VirtualCenter code.

I submitted a post in VMware’s Product and Feature Suggestions forum in January of 2007 recording the uptime reset on VMotion issue. As this problem periodically bugs me, I followed up a few times. Once in a follow up post in the thread above, and at least one time externally requesting someone from VMware take a look at it. Admittedly I do not have an SR open.

VMware, can we get this bug fixed? After all, if the hypervisor has become an every day commodity item leaving the management tools as the real meat and potatoes, you should make sure our management tools work properly.

Thank you,

Jas

Advertisement

No comments

  1. Steffen says:

    I totally agree with you, Jason. The “VM-internal” uptime shouldnt be measured from the “outside”, the Virtual Infrastructure. This should be part of the operating system inside the VM (e.g. uptime in Linux). The uptime metric should reflect the uptime from the virtual hardware. Hopefully this bug gets fixed someday.

    BR
    Steffen

  2. Hany Michael says:

    Yes, I’ve noticed this as well sometime back when I was tracking the uptime of certain VMs. It’s very annoying since this information is sometimes important to show the availability of the VMs to the management for example.

    Regards

  3. Habeeba says:

    As you said, you are using wp-super-cache.

    I cud see the comment “

    on the bottom of the code.

    But when i refreshed ur page, the time in the comment became 06:49:06. in next refresh it went to 06:50:01 etc.

    Doesn’t it mean that the cache file is creating all the time, one access the page? and it will give a reverse effect.

    I am also facing the same problem on my site eplmatches.com

    but if i switch it to wp-cache mode, it works. Do you have any idea. If you have please let me know b email.

    Many many thanks

  4. daVikes says:

    Perhaps the answer is a metric for both types. Vmotion shouldn’t count as downtime though.

  5. Susana says:

    Actually I had installed Kasperskry 2013 beta edtioin, but the license expired in just 10 days and not 90. Also the KTR trial reset tool does not work on this version.