Archive for the ‘Virtualization’ category

First vSphere Patches Released by VMware

July 10th, 2009

Approximately six weeks after the vSphere launch, the first batch of ESX/ESXi 4.0 patches have downloaded by vSphere Update Manager. I was notified this morning at 3am via an Email from vSphere Update Manager. Here is the patch list:

The number of patch definitions downloaded:

10 critical

16 total

ESX:

ID: ESX400-200906401-BG Impact: Critical Release date: 2009-07-09 Products: esx 4.0.0
Updates VMX

ID: ESX400-200906402-BG Impact: Critical Release date: 2009-07-09 Products: esx 4.0.0
Updates ESX Scripts

ID: ESX400-200906403-BG Impact: HostGeneral Release date: 2009-07-09 Products: esx 4.0.0
Updates VMware Tools

ID: ESX400-200906404-BG Impact: Critical Release date: 2009-07-09 Products: esx 4.0.0
Updates CIM

ID: ESX400-200906405-SG Impact: HostSecurity Release date: 2009-07-09 Products: esx 4.0.0
Updates krb5 and pam_krb5

ID: ESX400-200906406-SG Impact: HostSecurity Release date: 2009-07-09 Products: esx 4.0.0
Updates sudo

ID: ESX400-200906407-SG Impact: HostSecurity Release date: 2009-07-09 Products: esx 4.0.0
Updates curl

ID: ESX400-200906408-BG Impact: Critical Release date: 2009-07-09 Products: esx 4.0.0
Updates SCSI Driver for QLogic FC

ID: ESX400-200906409-BG Impact: Critical Release date: 2009-07-09 Products: esx 4.0.0
Updates LSI storelib Library

ID: ESX400-200906410-BG Impact: Critical Release date: 2009-07-09 Products: esx 4.0.0
Updates hostd

ID: ESX400-200906411-SG Impact: HostSecurity Release date: 2009-07-09 Products: esx 4.0.0
Updates udev

ID: ESX400-200906412-BG Impact: Critical Release date: 2009-07-09 Products: esx 4.0.0
Updates esxupdate

ID: ESX400-200906413-BG Impact: Critical Release date: 2009-07-09 Products: esx 4.0.0
Updates vmkernel iSCSI Driver

ESXi:

ID: ESXi400-200906401-BG Impact: Critical Release date: 2009-07-09 Products: embeddedEsx 4.0.0
Updates Firmware

ID: ESXi400-200906402-BG Impact: Critical Release date: 2009-07-09 Products: embeddedEsx 4.0.0
Updates Tools

ID: VEM400-200906002-BG Impact: HostGeneral Release date: 2009-07-09 Products: embeddedEsx 4.0.0
Cisco Nexus 1000V VEM

vSphere Virtual Machine Performance Counters Integration into Perfmon

July 8th, 2009

VMware introduced the VMware Descheduled Time Accounting Service as a new VMware Tools component in ESX 3.0. The goal was to account for inconsistent CPU cycles allocated to the guest VM by the VMkernel to provide accurate performance statistics using standard performance monitoring tools within the guest VM. Although the service was not installed and enabled with VMware Tools by default, nor did it ever escape the bonds of experimental support status, I found the service to be both stable and reliable and it was a standard installation component in one of my production datacenters. One caveat was that the service only supported uniprocessor guest VMs having a single vCPU.

The VMware Descheduled Time Accounting Service was deprecated in VMware vSphere. More accurately, it was sort of replaced by a new vSphere feature called Virtual Machine Performance Counters (Integrated into Perfmon). To quote VMware:

“Virtual Machine Performance Counters Integration into Perfmon — vSphere 4.0 introduces the integration of virtual machine performance counters such as CPU and memory into Perfmon for Microsoft Windows guest operating systems when VMware Tools is installed. With this feature, virtual machine owners can do accurate performance analysis within the guest operating system. See the vSphere Client Online Help.”

The vSphere Client Online Help has this to say about Virtual Machine Performance:

“In a virtualized environment, physical resources are shared among multiple virtual machines. Some virtualization processes dynamically allocate available resources depending on the status, or utilization rates, of virtual machines in the environment. This can make obtaining accurate information about the resource utilization (CPU utilization, in particular) of individual virtual machines, or applications running within virtual machines, difficult. VMware now provides virtual machine-specific performance counter libraries for the Windows Performance utility. Application administrators can view accurate virtual machine resource utilization statistics from within the guest operating system’s Windows Performance utility.”

Did you notice the explicit statement about Perfmon? Perfmon is Microsoft Windows Performance Monitor or perfmon.exe for short. Whereas the legacy VMware Descheduled Time Accounting Service supported both Windows and Linux guest VMs, its successor currently supports Perfmon ala Windows guest VMs only. It seems we’ve gone backwards in functionality from a Linux guest VM perspective. Another pie in the face for shops with Linux guest VMs.

Rant…

I understand that Windows guest VMs are the low hanging fruit for software development and features, but VMware needs to make sure some love is spread through the land of Linux as well. Folks with Linux shops are still struggling with basic concepts such as Linux guest customization as well as flexibility and automation of VMware Tools installation in the Linux guest OS. If VMware is going to tout their support for Linux guest VMs, I’d like to see more of a commitment than what is currently being offered. There’s more to owning a virtualized infrastructure than powering on instances on top of a hypervisor. Building it is the easy part. Managing it can be much more difficult without the right tools. Flexibility and ease with in the management tools is critical, especially as virtual infrastructures grow.

/Rant…

So, taking a look at a VMware vSphere Windows VM with current VMware Tools, I launched Perfmon. The installation of VMware Tools installs two new Performance Objects along with various associated counters:

  • VM Memory
    • Memory Active in MB
    • Memory Ballooned in MB
    • Memory Limit in MB
    • Memory Mapped in MB
    • Memory Overhead in MB
    • Memory Reservation in MB
    • Memory Shared in MB
    • Memory Shared Saved in MB
    • Memory Shares
    • Memory Swapped in MB
    • Memory Used in MB
  • VM Processor
    • % Processor Time
    • Effective VM Speed in MHz
    • Host processor speed in MHz
    • Limit in MHz
    • Reservation in MHz
    • Shares

Observing some of the counter names, it’s interesting to see that VMware has given us direct insight into the hypervisor resource configuration settings via Performance Monitor from inside the guest VM. While this may be useful for VI Administrators who manage both the VI as well as the guest operating systems, it may be disservice to VI Administrators in environments where guest OS administration is delegated to another support group. The reason why I say this is that some of these new counters disclose an “over commit” or “thin provisioning” of virtual hardware resources which I’d rather not reveal to other supports groups. What they don’t know won’t hurt them. Revealing some of the tools in our bag of virtualization tricks may bring about difficult discussions we don’t really want to get into or perhaps provoke the finger of blame to be perpetually pointed in our direction whenever a guest OS problem is encountered.

I’ve grabbed a few screen shots from my lab which show the disparity between native Perfmon metrics and the new vSphere Virtual Machine Performance Counters. In this example, I compare %Processor Time from the Perfmon native Processor object against the %Processor Time from the VM Processor object which was injected into the VM during the vSphere VMware Tools installation. It’s interesting to note, and you should be able to clearly see it in the graph, that the VM Processor %Processor time is consistently double that of the Perfmon native Processor % Processor Time counter. Consider this when you are providing performance information for a guest VM or one of its applications. If you choose the native Perfmon counter, you could be reporting performance data with 100% margin of error as shown in the case below. This is significant and if used for capacity planning purposes could lead to all sorts of problems.

7-8-2009 9-15-20 PM

7-8-2009 10-17-02 PM

One other important item to note is that you may recall I said towards the beginning that the legacy VMware Descheduled Time Accounting Service only supported uniprocessor VMs. The same appears to be true for the new vSphere Virtual Machine Performance Counters. In the lab I took a single CPU VM which had the vSphere Virtual Machine Performance Counters, and I adjusted the vCPU count to 4. After powering on with the new vCPU count, the vSphere Virtual Machine Performance Counters disappeared from the pulldown list. VMware needs to address this shortcoming. Performance statistics on vSMP VMs are just as important, if not more important, than performance statistics on uniprocessor VMs. vSMP VM resource utilization needs to be watched more closely for vSMP justification purposes.

So VMware, in summary, here is what needs work with vSphere Virtual Machine Performance Counters:

  1. Must support vSMP VMs
  2. Must support Linux VMs
  3. Support for Solaris VMs would also be nice
  4. More objects: VM Disk and VM Networking

Update: On Friday July 11th, 2009, I received the following email response from Praveen Kannan, a VMware Product Manager. Praveen has given me permission to reprint the response here. It is an encouraging read:

Hi Jason,

I read your recent blog post on the Perfmon integration in vSphere 4.0. I’m the product manager for the feature and wanted to reach out to on your findings and feedback regarding the feature.

First off, thanks for the detailed post on the intricacies of the feature and the screenshots. I think this post would be very helpful to the community! Much appreciated…

1) note on vmdesched

We’ve deprecated vmdesched in vSphere 4.0 because it was primarily an experimental feature that we didn’t recommend putting in production. More importantly, vmdesched adds overhead to the guest and is not compatible with some of the newer kernels out there and so the Perfmon integration is our answer to improve on the current state and provide accurate CPU accounting to VM owners that can be deployed in production and is integrated well with VMware Tools for out-of-box functionality.

2) Linux support for accurate counters

The Perfmon integration in vSphere 4.0 leverages the guest SDK API to get to the accurate counters from the hypervisor and that is available on Linux GOS as well. All you need is to have the VMware Tools installed to get access to the guest SDK interface. We couldn’t provide something like Perfmon on Linux since there aren’t many broadly used tools/APIs that we can standardize on.

There are some discussions internally to solve the accounting issue on Linux guests in a much simplified manner but I can’t go into the specific details at this time. Rest assured, I can tell you that we are looking into the problem for Linux workloads.

On a side note, the Perfmon implementation exposes the two new counter groups through WMI (you can almost think of the Perfmon integration as a WMI provider that sits on top of the guest SDK interface and provide access to the counters). What this means is any in-guest agent, benchmarking, reporting tool etc. can quickly adapt to use these “accurate” counters using WMI

So for Linux guests, you can refer to the guest SDK documentation on how someone can modify their Linux agents, tools etc. to talk to the “accurate” counters. The programming guide for vSphere guest SDK 4.0 is available at http://www.vmware.com/support/developer/guest-sdk/. The list of available perf counters is in Page 11 of the PDF (Accessor functions for VM data).

You can in fact use the older 3.5 version of the guest SDK API as well if you want to implement something that works with existing VI3 environments (yes, this SDK has been around for a while!). The only difference is that the vSphere version of the API has a few extra counters but you will get access to the important counters such as CPU utilization in the older API itself.

3) over commit, thin provisioning counters

Interesting feedback that I’ll take back to engineering 🙂 This is something that we need to think about for sure

4) uni-processor Perfmon?

I’m really surprised with your observations after moving to a 4 vCPUs. Not sure what’s going on but AFAIK, we report the _Total (aggregate) of all CPU utilization in one metric in the “VM Processor” counter group in Perfmon. What that means is regardless of how many CPUs in-guest, we do provide the _Total of CPU Utilization. Maybe you may have run into a bug. I’ll check with engineering on this anyways to confirm my understanding.

Just so you know we have a “standalone” version of the Perfmon tool that works with existing VI 3.5 environments. We’ve posted details about this experimental tool and the binaries on our performance blog here:

http://communities.vmware.com/blogs/drummonds/2009/06/18/using-perfmon-for-accurate-esx-performance-counters

The reason I mentioned the standalone version is because on my test box running 3.5 with the standalone version of Perfmon, I was able to see the _Total on a 2 vCPU VM. I haven’t yet tested your findings on a vSphere test box yet but I look into it…

So to help us investigate this, could you please do the following?

a. re-install VMware tools on a test Windows VM after switching to 4 vCPUs and check if the problem is reproducible

b. if you have the 3.5 version of VMware tools running on a VI3 setup, download the standalone version of the Perfmon tool and install it on a Windows VM and check if the 4-vCPU problem is observed. I haven’t tested the same standalone version of Perfmon on a vSphere 4.0 setup (with 4.0 version of the tools) but I wouldn’t be surprised if the standalone version does work. You may want to snapshot the VM before you attempt this though so you can rollback.

5) more counters such as disk and networking

Some background…our main focus in 4.0 was to solve the immediate customer pain-point, namely the CPU accounting issue inside the guest for VM owners. Also, what we heard is that VI admins didn’t want to give out VI client access to VM owners whenever they wanted to look at “accurate” counters for CPU utilization. In fact, the memory counters in Perfmon were sort of a bonus since it was already available in the guest SDK interface 🙂

Importantly, other counters when measured inside the guest such as Memory, Disk and Network don’t really suffer from accounting problems (i.e. they are accurate) as compared to CPU utilization numbers captured over a period of time (which may be accounted different due to the scheduling and de-scheduling the hypervisor does). So the numbers for Disk, Memory and Network when captured inside the Windows guest will be the same as the VI client.

However, I do recognize that as more and more customers start using this integration, there will soon be a need for providing disk and network counters as well. This is definitely on my radar to address in a future release.

Hope the information I provided helps in better understanding the Perfmon integration in vSphere 4.0 and also answer some of your questions in the blog post.

Looking forward to your findings with the 4 vCPU VMs. LMK if you have any questions in the interim.

P.S: Do feel free to use the information discussed here for your blog where you deem useful…

Have a good weekend…


Praveen Kannan
Product Manager
VMware, Inc.


After some more investigation in another test VM, I replied to Praveen with the following information:

Praveen,

In my previous test, I had a 1 vCPU Windows Server 2003 VM. The VM Memory
and VM Processor objects were listed in the pulldown lits in perfmon. After
upgrading the VM to 4 vCPUs, the VM Memory and VM Processor objects were no
longer listed in the pulldown list in perfmon. So you see, the objects were
not available thus the counters (including _Total) were not available.

Today, I deployed a 1 vCPU Windows Server 2003 VM from a 1 vCPU template.
When I ran perfrmon, the VM Memory and VM CPU objects were missing (VMware
Tools was up to date). I closed perfmon and reopened it. Then the 2 VM
objects were there.

Then I upgraded the VM to a 4 vCPU VM. I ran perfmon and both the VM
objects were there.

Following that, I encountered more problems. I was able to choose the VM Processor object, but the counters for the object were all missing. Definitely a bug somewhere with these. Please advise.


Virtualizing the grid

July 8th, 2009

I picked up this interesting map off Christopher Crowhurst’s blog. It’s a visualization of the United States power grid. The source comes from NPR’s article “Visualizing The Grid“. Follow the link to NPR and click on the various tabs at the top to see power plant, solar power, and wind power sources across the United States.

How much power are you saving due to virtualization?  Don’t forget virtualization cuts power consumption in more ways than just one.  The most obvious would be the reduction in server hardware count in the datacenter.  There are other indirect power savings vectors such as reduction in cooling, reduction in network and SAN switches due to server consolidation, less UPS utilization, and maybe even a reduction in datacenter size which in and of itself presents more indirect savings:  security, plumbing, utility lighting, cleaning, maintenance, real estate, etc.

7-8-2009 9-46-46 AM

VMware ESX Configuration Maximums Comparison Matrix

July 7th, 2009

Some of the this blog’s readers may know that one of my favorite VMware documents is the Configuration Maximums.  I’ve mentioned it a few times.  Sid Smith over at Daily Hypervisor put together a very nice consolidation of configuration maximums for ESX3, ESX3.5, and vSphere (ESX4.0).

The .PDF is assembled in a matrix format making comparison of the various VMware hypervisors easy to differentiate, compare, and contrast.  Sid, you read my mind.  Well done.  Thank you for the early birthday/Christmas present!

Update Manager does not download host updates

July 1st, 2009

Scenario: You build a brand new vCenter and Update Manager server. After the installation is complete, you decide to get a jump on things by starting the download of all the ESX/ESXi host updates. You force Update Manager to download updates and the task completes surprisingly fast for the amount of ESX/ESXi content expected to be downloaded:

7-1-2009 8-54-08 PM

A problem is discovered in that Update Manager has downloaded metadata for guest OS updates (Windows, Linux, applications, etc.), but no ESX/ESXi update information is downloaded. The baselines are verified as OK, internet connectivity and proxy configuration checks out OK. What is the problem?

Cause: There are no ESX/ESXi hosts in vCenter Server. Per VMware KB 1008308, ESX/ESXi hosts must be present in vCenter Server before Update Manager will download the update metadata and the updates themselves.

7-1-2009 9-00-41 PM

This is one of those embarrassing forehead slapper type problems, however, Windows administrators who are used to working with and relying on the predicable behavior of WSUS are likely to encounter this at some point in time and are exempt from chastising. Swallow your pride and don’t tell anyone. 🙂

VMware is entitled to their opinion on how their software should function but to me this is a UI/usability issue that doesn’t make a lot of sense. What adds to the confusion is the inconsistent behavior in that in the absence of both hosts and guests in vCenter Server, guest OS update information appears in Update Manager but host update information does not. Yes I’m aware that host updates come from VMware and guest updates come from Shavlik.  No that’s not an acceptable excuse.

While we’re on the subject, there are a handful of other reasons why Update Manager may malfunction. Take a look at VMware’s KB index and use your browser search to find all instances of “Update Manager”. There you’ll find all known solutions to Update Manager issues as well as some best practices and port requirements.

vSphere upgrade experience, day 1

June 24th, 2009

A few nights ago, I began the VI3 to vSphere upgrade in my home lab and I thought I would share a few experiences. This day 1 post will cover vSphere management tools (vCenter, Update Manager, etc.) and not the hypervisor itself (ESX or ESXi).

My VI3 environment has been through some wear and tear over the years, including a few unexpected power outages which could have caused corruption on the vCenter server or the back end databases. Although the part of me which desires peace of mind wanted to start “clean” with an empty database, I knew that I must go the upgrade route, maintaining my existing data because frankly this is the method I will likely be using to deploy most vSphere installations.

I run a lot of what I would consider “production workloads” on my home lab, including domain controllers, messaging servers for registered domains, web servers, Citrix servers, this blog, etc. Failure is an option as well as a good learning experience (after all, this is a lab), however, long term outage of my production workloads is not an option. My vCenter server is virtualized on VMware Server 2.x so I started out by shutting down its OS and taking a VMware snapshot. After the vCenter shutdown, I also captured a full backup of my SQL server databases (both the vCenter database as well as the Update Manager database). I now have a solid backout plan which does not incorporate crash consistent data.

I powered the vCenter VM back up. I then copied over the vCenter 4.0 .zip package and extracted it into a temp directory on the vCenter server. This was the first mistake I made. Not thinking clearly about my snapshotted VM, I had just inflated the VM’s delta file by a few GB. What I should have done is to perform the vCenter copy and extraction before the snapshot. This is not the end of the world. After the installation of vCenter 4 and Update Manager, the snapshot would have grown by several hundred MBs if not a few GBs anyway. The .zip file and extracted contents were just a lot of extra non-contiguous I/O baggage.

I’m going to perform an upgrade of the databases, but I don’t care to actually “upgrade” vCenter and all of its components from 2.5 Update 4 to version 4.0. I’ve never ever had good luck with vCenter upgrades. My method, therefore, is complete uninstall of vCenter and all components, then a new installation of vCenter while attaching to the existing database which will in turn be upgraded. During the uninstall of vCenter, I typically find that the vCenter uninstall routine leaves bits and pieces behind in folder structures as well as the registry. I manually deleted the remaining pieces and rebooted the vCenter server for good measure and a clean start for the vCenter 4.0 installation. In retrospect, the manual deletion of left over files and uninstall of the vCenter license server will turn out to be my second and third mistakes which I’ll talk about shortly.

After the reboot, I began the vCenter 4.0 installation. I also made sure my vCenter SQL account had DBO rights to the MSDB database, the vCenter database, and the Update Manager database. This is a new requirement during the installation of vSphere. I wasn’t too far into the installation when I ran into failure and the installation routine rolled back. The error message was rather cryptic and I’m sorry I don’t have a screenshot but it had to do with the installer’s inability to properly install and configure the local ADAM instance which I believe is used for vCenter federation (linked vCenter servers). I quickly found a long thread on the VMTN forums which pointed me to the solution in VMware’s knowledgebase. KB1010938 talks about NETWORK SERVICE NTFS directory permissions (READ) that are required on the root of the drive where vCenter is being installed. A quick check showed I lacked the necessary permissions. I resolved this quickly and re-ran the installation.

During the re-installation, I ran into my second problem (self inflicted). Way back when, I had set up SSL certificates for VI3. The certificate files are required to be present during the database upgrade because the certs are tattooed to the database as well. During my “cleanup” process I spoke about above, I had deleted the SSL folder containing the certificate files VMware had left behind. Turns out this was by design. Thankfully when I performed the cleanup, all files and folders went to the recycle bin and I was able to quickly retrieve them. Without the certificate files, I would have been looking at a new database installation which would have deleted all vCenter data including performance history.

After restoring the certificate files, I reran the installation a third time. The installation of vCenter Server and all of its components was successful. I was able to open the vSphere client and connect to the vCenter server. My hosts, VMs, and all data was present. All looked to be successful until I tried a VMotion. The ESX hosts which were still on VI3 were no longer licensed. Refer to my comment further up about uninstalling the license server being a mistake. vCenter 4.0 license keys do not license VI3 legacy hosts. A VI3 license server or host based license keys must be plugged into vCenter 4.0 in order to properly license VI3 legacy hosts. I resolved the issue by re-installing the VI3 license server on some junk VM in the interim and then plugged the license server name into vCenter 4.0’s licensing configuration. Viola! The ESX3 and ESXi3 hosts are now licensed and everything is working properly. After feeling confident in the installation, I removed the vCenter snapshot.

This was enough change for one night. The ESX host upgrades (rebuilds) will come a few days later. If I uncover any gotchas during host installations, I’ll be sure to share but I expect those to be fairly uneventful. I’ve installed a lot of ESX4/ESXi4 hosts during the vSphere beta and it’s straight forward, not unlike ESX3 /ESXi3 installations. Most of the ~150 changes in vSphere will be evident in vCenter and the various components. There’s a few enhancements in the hypervisor installation but nothing that hasn’t already been pointed out in various other blogs and installation videos.

In search of an application migration solution

June 22nd, 2009

I’m reaching out to software vendors and/or readers who might be aware of an end to end application migration solution or an application migration story outlining solutions, challenges, successes, etc. The solution should be software driven. The solution will seamlessly migrate applications, services, and daemons from one platform (Windows or Linux) to another platform in the same family. As an example, the solution would migrate applications and/or services from Windows 2000 Server to Windows Server 2003. On the Linux side, another example would be migrating applications and/or daemons from SLES 9.x to SLES 10.x.

As I stated, the solution would need to be as seamless and end to end as possible. Application and platform dependencies would need to be taken into consideration, and addressed or mitigated. For example, service packs, .DLLs, .NET framework, PERL, etc. on the Windows side. Kernel versioning, compiling, PERL, etc. on the Linux side.

I’m not exactly looking for professional services, however, I would be interested in hearing about your process and the tools you use. The solution need not be virtualization specific. The right tool would work with IBM, HP, Dell, whitebox, or virtual hardware.

If you are a vendor with a solution, or a reader with some application migration background, please drop me a line at jason@boche.net or feel free to reply to this blog entry with a comment. Feedback both small or large is welcomed as usual.

Thank you in advance.