Posts Tagged ‘vCenter Server’

Performance charts fail after Daylight Savings changes are applied

November 5th, 2010

Daylight savings changes this weekend allow many folks to get an extra hour of sleep.  However, a VMware vSphere 4.1 bug has surfaced which may spoil the fun. 

VMware has published KB 1030305 (Performance charts fail after Daylight Savings changes are applied) which serves as a reminder that the pitfalls and treachery of mixing daylight savings changes and million dollar datacenters are not behind us yet.  Those who are on vSphere 4.1 and observe the weekend time change will run into problems come Sunday morning:

After Daylight Savings settings are applied:

  • Performance charts do not display data
  • Past week, month, and year performance overview charts are not displayed
  • Datastore performance/space data charts are not displayed
  • You receive the error: The chart could not be loaded
  • This occurs when clocks are set back 1 hour from Daylight Savings Time to Standard Time

VMware offers the following workaround:

Use Advanced Chart Options:

  1. Click Performance
  2. Click Advanced
  3. Click Chart Options and then choose the chart you want to review

Use a custom time range when viewing performance charts after clocks are set back:

  1. Click Performance
  2. Click the Time Range dropdown
  3. Choose Custom
  4. Specify From and To options that exclude the hours for when the time change occurred

For example:

If Standard Time settings were applied on November 7, at 01:00 AM, you could use these ranges:
Before the time change:
From 1/11/2010 12:00 AM To 7/11/2010 12:00 AM
After the time change:
From 7/11/2010 03:00 AM To 8/11/2010 15:00 PM

Have a great weekend!

vCenter MAC Address allocation and conflicts

November 3rd, 2010

Paul Davey of Xtravirt published a VMware networking article a few weeks ago called vCenter MAC Address allocation and conflicts.  The article describes the mechanism behind MAC address assignments in vCenter, and more specifically how conflicts are avoided:

When a vCenter server is installed a unique ID is generated. This ID is randomly generated and is in the range of 0 to 64. The ID gets used when generating MAC address and the UUIDS, or unique identifiers for virtual machines. You can see that if two vCenter servers had the same unique ID, a possibility exists that duplicate MAC addresses might get generated; cue packet loss, connectivity issues and your desk phone ringing a lot…

 SnagIt Capture

I receive email updates from Xtravirt at regular intervals – about one every week or two.  Each update contains new virtualization content as well as links to their most popular content.  I find the content very interesting and always look forward to opening new email from them.  I think this speaks volumes considering how much of a chore email can be at times.  If this sounds appealing to you, head to their site and look at the bottom of the page to sign up for the Xtravirt Newsletter.   No purchase or invitation necessary.

Updated 3/14/11: I thought this might also be helpful for this article.  VMware explains the automatic MAC address generation process as follows:

The VMware Universally Unique Identifier (UUID) generates MAC addresses that are checked for conflicts. The generated MAC addresses are created by using three parts: the VMware OUI, the SMBIOS UUID for the physical ESXi machine, and a hash based on the name of the entity that the MAC address is being generated for.

VMware DPM Issue

October 24th, 2010

I’ve been running into a DPM issue in the lab recently.  Allow me briefly describe the environment:

  • 3 vCenter Servers 4.1 in Linked Mode
  • 1 cluster with 2 hosts
    • ESX 4.1, 32GB RAM, ~15% CPU utilization, ~65% Memory utilization, host DPM set for disabled meaning the host should never be placed in standby by DPM.
    • ESXi 4.1, 24GB RAM, ~15% CPU utilization, ~65% Memory utilization, host DPM set for automatic meaning the host is always a candidate to be placed in standby by DPM.
  • Shared storage
  • DRS and DPM enabled for full automation (both configured at Priority 4, almost the most aggressive setting)

Up until recently, the ESX and ESXi hosts weren’t as loaded and DPM was working reliably.  Each host had 16GB RAM installed.  When aggregate load was light enough, all VMs were moved to the ESX host and the ESXi host was placed in standby mode by DPM.  Life was good.

There has been much activity in the lab recently.  The ESX and ESXi host memory was upgraded to 32GB and 24GB respectively.  Many VMs were added to the cluster and powered on for various projects.  The DPM configuration remained as is.  Now what I’m noticing is that with a fairly heavy memory load on both hosts in cluster, DPM moves all VMs to the ESX host and places the ESXi host in standby mode.  This places a tremendous amount of memory pressure and over commit on the solitary ESX host.  This extreme condition is observed by the cluster and nearly as quickly, the ESXi host is taken back out of standby mode to balance the load.  Then maybe about an hour later, the process repeats itself again.

I then configured DPM to manual mode so that I could examine the recommendations being made by the calculator.  The VMs were being evacuated for the purposes of DPM via a Priority 3 recommendation which is half way between Conservative and Aggressive recommendations.

SnagIt Capture

What is my conclusion?  I’m surprised at the perceived increase in aggressiveness of DPM.  In order to avoid the extreme memory over commit, I’ll need to configure DPM slide bar for Priority 2.  In addition, I’d like to get a better understanding of the calculation.  I have a difficult time believing the amount of memory over commit being deemed acceptable in a neutral configuration (Priority 3) which falls half way between conservative and aggressive.  In addition to that, I’m not a fan of a host continuously entering and exiting standby mode, along with the flurry of vMotion activity which results.  This tells me that the calculation isn’t accounting for the amount of memory pressure which is actually occurring once a host goes into standby mode, or coincidentally there are significant shifts in the workload patterns shortly after each DPM operation.

If you are a VMware DPM product manager, please see my next post Request for UI Consistency.

vCenter Server Linked Mode Configuration Error

October 23rd, 2010

As of vCenter Server 4.1, VMware supports Windows Server 2008 R2 as a vCenter platform (remember 2008 R2 is 64-bit only).  With this, I expect many environments will be configured with vCenter Server on Microsoft’s newest Server operating system.

While working in the lab with vCenter Server 4.1 on Windows Server 2008 R2, I ran into an issue configuring Linked Mode via the vCenter Server Linked Mode Configuration shortcut.  Error 28035. Setup failed to copy LDIFDE.EXE from System folder to ‘%windir%\ADAM’ folder.

SnagIt Capture

After no success in relaxing Windows NTFS permissions, I remembered it’s a Windows Server 2008 R2 permissions issue.  The resolution is quite simple and is often the solution when running into similar errors on Windows 7 and Windows Server 2008 R2.  In addition, I found the workaround documented in VMware KB 1025637.  Rather than launching the vCenter Server Linked Mode Configuration as you normally would by clicking on the icon, instead, right click the shortcut and choose Run as administrator.

SnagIt Capture 

You should find that launching the shortcut in the administrator context grants the installer the permissions necessary to complete Linked Mode configuration.

CPU Ready to %RDY Conversion

October 21st, 2010

Most customers expect x amount of performance out of their virtual machines which is going to be dependent on established Service Level Agreements (SLAs).  Capacity planning, tuning, and monitoring performance all play a role in meeting customer SLAs.  When questioning performance of a physical machine, one of the first ubiquitous metrics that comes to mind is CPU utilization.  Server administrators are naturally inclined to look at this metric on virtual machines as well.  However, when looking at VM performance, Ready time is an additional metric to be examined from a CPU standpoint.  This metric tells us how much time the guest VM is waiting for its share of CPU execution from the host.

I began learning ESX in 2005 on version 2.0.  At that time, the VMware ICM class focused a lot on leveraging the Service Console.  At that time, vCenter Server 1.x was brand new and as such, ESXTOP was king for performance monitoring.  In particular, the %RDY metric in ESXTOP was used to reveal CPU bottlenecks as described above.  %RDY provides statistics in a % format.  I learned what acceptable tolerances were, I learned when to be a little nervous, and I could pretty well predict when the $hit was hitting the fan inside a VM from a CPU standpoint.  Duncan Epping at Yellow Bricks dedicates a page to ESXTOP statistics on his blog and at the very beginning, you’ll see a threshold he has published which you should keep in the back of your mind.

Today, ESXTOP still exists fortunately (it’s one of my favorite old-school-go-to tools).  The Service Console is all but gone, however, you’ll still find resxtop in VMware’s vMA appliance which is used to remotely manage ESXi (and ESX as well).  But what about the vSphere Client and vCenter Server?  With the introduction of vCenter Server, the disappearance of the Service Console, and the inclination of a Windows based administrator to lean on GUI based tools as a preference, notable focus has moved away from the CLI approach in lieu of the vSphere Client (in conjunction with the vCenter Server). 

Focusing on a VM in the vSphere Client, you’ll find a performance metric called CPU Ready.  This is the vSphere Client metric which tells us how much time the guest VM is waiting for its share of CPU execution from the host just as %RDY did in ESXTOP.  But when you look at the statistics, you’ll notice a difference.  %RDY in ESXTOP provides us with metrics in a % format.  CPU Ready in the vSphere Client provides metrics in a millisecond summation format.  I learned way back from the ICM class and through trench experience that ~10% RDY (per each vCPU) is a threshold to watch out for.  How does a % value from ESXTOP translate to a millisecond value in the vSphere Client?  It doesn’t seem to be widely known or published but I’ve found it explained a few places.  A VMware communities document here and a Josh Townsend blog post here.

There’s a little math involved.  To convert the vSphere Client CPU Ready metric to the ESXTOP %RDY metric, you divide the CPU Ready metric by the rollup summation (which are both values in milliseconds).  What does this mean?  Say for instance you’re looking at the overall CPU Ready value for a VM in Real-time.  Real-time is refreshed every 20 seconds and represents a rollup of values over a 20 second period (that’s 20,000 milliseconds).  Therefore…

  • If the CPU Ready value for the VM is, say 500 milliseconds, we divide 500 milliseconds by 20,000 milliseconds and arrive at nearly 3% RDY.  Hardly anything to be concerned about. 
  • If the CPU Ready time were 7,500, we divide 7,500 milliseconds by 20,000 milliseconds and arrive at 37.5% RDY or $hit hitting the fan assuming a 1 vCPU VM. 

What do I mean above by 1 vCPU VM?  The overall VM CPU Ready metric is the aggregate total of CPU Ready for each vCPU.  This should sound familiar – if you know how %RDY works in ESXTOP, then you’re armed with the knowledge needed to understand what I’m explaining.  The %RDY value in ESXTOP is the aggregate total of CPU Ready for each vCPU.  In other words, if you saw a 20% RDY value in ESXTOP for a 4 vCPU VM, the actual %RDY for each vCPU is 5% which is well under the 10% threshold we generally watch for.  In the vSphere Client, not only can you look at the overall aggregate CPU Ready for a particular VM (which should be divided by the number of assigned vCPUs for the VM), but you can also look at the CPU Ready values for the individual vCPUs themselves.  It is the per CPU Ready value which should be compared with published and commonly known thresholds.  When looking at Ready values, it’s important to interpret the data correctly in order to compare the right data to thresholds.

I’ve often heard the conversation of “how do I convert millisecond values in the vSphere Client to % values in ESXTOP?”  I’ve provided a working example using CPU Ready data.  Understand it can be applied to other metrics as well.  Hopefully this helps.

Hardware Status and Maintenance Mode

October 20th, 2010

I’m unable to view hardware health status data while a host is in maintenance mode in my vSphere 4.0 Update 1 environment.

SnagIt Capture

A failed memory module was replaced on a host but I’m skeptical about taking it out of maintenance mode until I am sure it is healthy.  There is enough load on this cluster such that removing the host from maintenance mode will result in DRS moving VM workloads onto it within five minutes.  For obvious reasons, I don’t want VMs running on an unhealthy host.

So… I need to disable DRS at the cluster level, take the host out of maintenance mode, verify the hardware health on the Hardware Status tab, then re-enable DRS.  It’s a round about process, particularly if it’s a production environment which requires a Change Request (CR) with associated approvals and lead time to toggle the DRS configuration. 

Taking a look at KB 1011284, VMware acknowledges the steps above and considers the following a resolution to the problem:

Resolution

By design, the host monitoring agents (IPMI) are not supported while the ESX host is in maintenance mode. You must exit maintenance mode to view the information on the Hardware Status tab. To take the ESX host out of maintenance mode:

1.Right click ESX host within vSphere Client.

2.Click on Exit Maintenance Mode.

Fortunately, this design specification has been improved by VMware in vSphere 4.1 where I have the ability to view hardware health while a host is in maintenance mode.

vCenter Storage Monitoring Plug-in Disabled

October 18th, 2010

Those who have upgraded to vSphere (hopefully most of you by now) may become accustomed to the new tab in vCenter labeled Storage Views. From time to time, you may notice that this tab mysteriously disappears from a view where it should normally be displayed.  If you’re a subscriber to my vCalendar, you’ll find a tip on July 18th which speaks to this:

Is your vSphere Storage Views tab or host Hardware Status tab not functioning or missing? Make sure the VMware VirtualCenter Management Webservices service is running on the vCenter Server.

The solution above is an easy enough resolution, but what if that doesn’t fix the problem?  I ran into another instance of the Storage Views tab disappearing and it was not due to a stopped VMware VirtualCenter Management Webservices service.  After a short investigation, I found a failed or disabled vCenter Storage Monitoring (Storage Monitoring and Reporting) plug-in:

SnagIt Capture

For those who cannot read the screen shot detail above, and for the purposes of Google search, I’ll paste the error code below:

The plug-in failed to load on server(s) <your vCenter Server> due to the following error: Could not load file or assembly ‘VpxClientCommon, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7c8-0a434483c7c50’ or one of its dependencies. The system cannot find the file specified.

I performed some testing in the lab and here’s what I found.  Long story short, installation of the vSphere 4.1 Client on a system which already has the the vSphere 4.0 Update 1 Client installed causes the issue.  The 4.1 Client installs a file called SMS.dll (dated 5/13/2010) into the directory C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure Client\Plugins\SMS\ overwriting the previous version (dated 11/7/2009).  While the newer version of the SMS.dll file causes no issues and works fine when connecting to vCenter 4.1 Servers, it’s not backward compatible with vCenter 4.0 Update 1.  The result is what you see in the image above, the plugin is disabled and cannot be enabled.

Furthermore, if you investigate your vSphere Client log files at C:\Users\%username%\AppData\Local\VMware\vpx\ you’ll find another similar entry:

System.IO.FileNotFoundException: Could not load file or assembly ‘VpxClientCommon, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7c80a434483c7c50’ or one of its dependencies. The system cannot find the file specified.

Copying the old version of the SMS.dll file into its proper location resolves the plug-in issue when connecting to a vSphere 4.0 Update 1 vCenter Server, this much I tested, however I’m sure it immediately breaks the plug-in when connecting to a vCenter 4.1 Server (I didn’t go so far as to testing this).

Essentially what this boils down to is a VMware vSphere Client bug which is going to bite people who have both vCenter Server 4.0 and 4.1 in their environment, and the respective clients are installed on the same endpoint machine.  I expect to hear about this more as people start their upgrades from vSphere 4.0 to vSphere 4.1.  Some may not even realize they have the issue, after all, I didn’t notice it until I was looking for the Storage Views tab and it wasn’t there.  After lab testing, I did some looking around on the net to see if anyone had discovered or documented this issue and the only hit I came across was a recently started VMware Communities thread, however, there was no posted solution.  The thread does contain a few hints which would have pointed me in the right direction much quicker had I read it ahead of time.  Nonetheless, time spent in the lab is time well spent as far as I’m concerned.  Unfortunately, there’s no fix here I can offer.  This one is on VMware to fix with a new release of the vSphere 4.1 Client.

Update 12/1/10:  VMware has released KB 1024493 to identify this problem and temporarily address the issue with a workaround:

Installing each Client version in different folders does not work. When you install the first Client you are asked where you want to install it. When you install the second Client, you are not asked for a location. Instead, the installer sees that you have already installed a Client and automatically tries and install the second client in the same directory.

To install vSphere Client 4.0 and 4.1 in separate directories:

  1. Install vSphere Client 4.0 in C:\Client4.0.
  2. Copy C:\Client4.0 to an external drive (such as a share or USB).
  3. Uninstall vSphere Client 4.0. Do not skip this step.
  4. Install vSphere Client 4.1 in C:\Client4.1.
  5. Copy the 4.0 Client folder from the external drive to the machine.
  6. Run vpxClient.exe from the 4.0 or 4.1 folder.

I’m expecting a more permanent fix in the future which addresses the .DLL incompatibility in the 4.1 vSphere Client.

Update 2/15/11:  Through some lab testing, it looks as if VMware has resolved this issue with the release of vSphere 4.1 Update 1 although KB 1024493 has not been updated yet to reflect this.  I uninstalled all vSphere Clients, then installed vSphere Client 4.0 Update 1, then installed vSphere Client 4.1 Update 1.  The result is the vCenter Storage Monitoring plug-in is no longer malfunctioning.  The Storage Views tab is also available.  Both of those items are a positive reflection of a resolution.  The Search function is failing in a different way but I’m not convinced it has anything to do with two installed vSphere Clients because it is also failing on a different machine which has only one vSphere Client installed.