Posts Tagged ‘Rant’

Request for UI Consistency

October 25th, 2010

Sometimes it’s the little things that can make life easier.  This post is actually a fork from the one which I originally wrote previously on a DPM issue.  Shortly after that post, it was pointed out that I was reading DPM Priority Recommendations incorrectly.  Indeed that was the case.

Where did I go wrong?  Look at the priority descriptions between DRS and DPM, where both slide bars are configured with the same aggressiveness:

 “Apply priority 1, priority 2, priority 3, and priority 4 recommendations”

SnagIt Capture

 “Apply priority 4 or higher recommendations”

SnagIt Capture

My brief interpretation was that a higher recommendation meant a higher number (ie. priority 4 is a higher recommendation than priority 3).  It’s actually the opposite that is true.  A higher recommendation is a lower number.

I believe there is too much left open to interpretation on the DPM screen as to what is higher and what is lower.  The DRS configuration makes sense because it’s clear as to what is going to be applied; no definition of high or low to be (mis-)interpreted.  The fix?  Make the DPM configuration screen mirror DRS configuration screen.  Development consistency goes a long way.  As a frequent user of the tools, I expect it.  I view UI inconsistency as sloppy.

If you are a VMware DPM product manager, please see my previous post VMware DPM Issue.

Hardware Status and Maintenance Mode

October 20th, 2010

I’m unable to view hardware health status data while a host is in maintenance mode in my vSphere 4.0 Update 1 environment.

SnagIt Capture

A failed memory module was replaced on a host but I’m skeptical about taking it out of maintenance mode until I am sure it is healthy.  There is enough load on this cluster such that removing the host from maintenance mode will result in DRS moving VM workloads onto it within five minutes.  For obvious reasons, I don’t want VMs running on an unhealthy host.

So… I need to disable DRS at the cluster level, take the host out of maintenance mode, verify the hardware health on the Hardware Status tab, then re-enable DRS.  It’s a round about process, particularly if it’s a production environment which requires a Change Request (CR) with associated approvals and lead time to toggle the DRS configuration. 

Taking a look at KB 1011284, VMware acknowledges the steps above and considers the following a resolution to the problem:

Resolution

By design, the host monitoring agents (IPMI) are not supported while the ESX host is in maintenance mode. You must exit maintenance mode to view the information on the Hardware Status tab. To take the ESX host out of maintenance mode:

1.Right click ESX host within vSphere Client.

2.Click on Exit Maintenance Mode.

Fortunately, this design specification has been improved by VMware in vSphere 4.1 where I have the ability to view hardware health while a host is in maintenance mode.

VMware Workstation Upgrade to 7.1

May 26th, 2010

Microsoft Windows 7 Professional 64-bit
VMware Workstation 7.0.1 build-227600

I had heard VMware Workstation 7.1 was released.  Unfortunately, the VMware Workstation “check for updates” feature doesn’t seem to be serving its intended purpose as it told me no updates were available.

I downloaded the installation package manually and performed the upgrade.  Two reboots were required:

  1. After the uninstall of my previous version of Workstation
  2. After the install of Workstation 7.1

I hope the usability experience is better than my upgrade experience.  I realize some of the reboot business is on the Microsoft Windows 7 operating system but come on, would someone please figure this out?  Is there no way to perform an in place upgrade of Workstation to minimize the reboots to one?

What’s New in VMware Workstation 7.1

•Support for 8 virtual processors (or 8 virtual cores) and 2 TB virtual disks.

•Support for OpenGL 2.1 for Windows Vista and Windows 7 guests.

•Greatly improved DirectX 9.0 graphics performance for Windows Vista and Windows 7 guests. Up to 2x faster than Workstation 7.

•Launch virtualized applications directly from the Windows 7 taskbar to create a seamless experience between applications in your virtual machines and the desktop.

•Optimized performance for Intel’s Core i3, i5, i7 processor family for faster virtual machine encryption and decryption.

•Support for more Host and Guest Operating Systems, including: Hosts: Windows 2008 R2, Ubuntu 10.04, RHEL 5.4, and more Guests: Fedora 12, Ubuntu 10.04, RHEL 5.4, SEL 11 SP1, and more.

•Now includes built in Automatic Updates feature to check, download, and install VMware Workstation updates.

•Ability to import and export Open Virtualization Format (OVF 1.0) packaged virtual machines and upload directly to VMware vSphere, the industry’s best platform for building cloud infrastructures.

VMware Update Manager Becomes Self-Aware

March 4th, 2010

@Mikemohr on Twitter tonight said it best:

“Haven’t we learned from Hollywood what happens when the machines become self-aware?”

I got a good chuckle.  He took my comment of VMware becoming “self-aware” exactly where I wanted it to go.  A reference to The Terminator series of films in which a sophisticated computer defense system called Skynet becomes self-aware and things go downhill for mankind from there.

Metaphorically speaking in today’s case, Skynet is VMware vSphere and mankind is represented by VMware vSphere Administrators.

During an attempt to patch my ESX(i)4  hosts, I received an error message (click the image for a larger version):

At that point, the remediation task fails and the host is not patched.  The VUM log file reflects the same error in a little more detail:

[2010-03-04 14:58:04:690 ‘JobDispatcher’ 3020 INFO] [JobDispatcher, 1616] Scheduling task VciHostRemediateTask{675}
[2010-03-04 14:58:04:690 ‘JobDispatcher’ 3020 INFO] [JobDispatcher, 354] Starting task VciHostRemediateTask{675}
[2010-03-04 14:58:04:690 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 INFO] [vciTaskBase, 534] Task started…
[2010-03-04 14:58:04:908 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 INFO] [vciHostRemediateTask, 680] Host host-112 scheduled for patching.
[2010-03-04 14:58:05:127 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 INFO] [vciHostRemediateTask, 691] Add remediate host: vim.HostSystem:host-112
[2010-03-04 14:58:13:987 ‘InventoryMonitor’ 2180 INFO] [InventoryMonitor, 427] ProcessUpdate, Enter, Update version := 15936
[2010-03-04 14:58:13:987 ‘InventoryMonitor’ 2180 INFO] [InventoryMonitor, 460] ProcessUpdate: object = vm-2642; type: vim.VirtualMachine; kind: 0
[2010-03-04 14:58:17:533 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 WARN] [vciHostRemediateTask, 717] Skipping host solo.boche.mcse as it contains VM that is running VUM or VC inside it.
[2010-03-04 14:58:17:533 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 INFO] [vciHostRemediateTask, 786] Skipping host 0BC5A140, none of upgrade and patching is supported.
[2010-03-04 14:58:17:533 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 ERROR] [vciHostRemediateTask, 230] No supported Hosts found for Remediate.
[2010-03-04 14:58:17:737 ‘VciRemediateTask.RemediateTask{674}’ 2676 INFO] [vciTaskBase, 583] A subTask finished: VciHostRemediateTask{675}

Further testing in the lab revealed that this condition will be caused with a vCenter VM and/or a VMware Update Manager (VUM) VM. I understand from other colleagues on the Twitterverse that they’ve seen the same symptoms occur with patch staging.

The work around is to manually place the host in maintenance mode, at which time it has no problem whatsoever evacuating all VMs, including infrastructure VMs.  At that point, the host in maintenance mode can be remediated.

VMware Update Manager has apparently become self-aware in that it detects when its infrastructure VMs are running on the same host hardware which is to be remediated.  Self-awareness in and of itself isn’t bad, however, its feature integration is.  Unfortunately for the humans, this is a step backwards in functionality and a reduction in efficiency for a task which was once automated.  Previously, a remediation task had no problem evacuating all VMs from a host, infrastructure or not. What we have now is… well… consider the following pre and post “self-awareness” remediation steps:

Pre “self-awareness” remediation for a 6 host cluster containing infrastructure VMs:

  1. Right click the cluster object and choose Remediate
  2. Hosts are automatically and sequentially placed in maintenance mode, evacuated, patched, rebooted, and brought out of maintenance mode

Post “self-awareness” remediation for a 6 host cluster containing infrastructure VMs:

  1. Right click Host1 object and choose Enter Maintenance Mode
  2. Wait for evacutation to complete
  3. Right click Host1 object and choose Remediate
  4. Wait for remediation to complete
  5. Right click Host1 object and choose Exit Maintenance Mode
  6. Right click Host2 object and choose Enter Maintenance Mode
  7. Wait for evacutation to complete
  8. Right click Host2 object and choose Remediate
  9. Wait for remediation to complete
  10. Right click Host2 object and choose Exit Maintenance Mode
  11. Right click Host3 object and choose Enter Maintenance Mode
  12. Wait for evacutation to complete
  13. Right click Host3 object and choose Remediate
  14. Wait for remediation to complete
  15. Right click Host3 object and choose Exit Maintenance Mode
  16. Right click Host4 object and choose Enter Maintenance Mode
  17. Wait for evacutation to complete
  18. Right click Host4 object and choose Remediate
  19. Wait for remediation to complete
  20. Right click Host4 object and choose Exit Maintenance Mode
  21. Right click Host5 object and choose Enter Maintenance Mode
  22. Wait for evacutation to complete
  23. Right click Host5 object and choose Remediate
  24. Wait for remediation to complete
  25. Right click Host5 object and choose Exit Maintenance Mode
  26. Right click Host6 object and choose Enter Maintenance Mode
  27. Wait for evacutation to complete
  28. Right click Host6 object and choose Remediate
  29. Wait for remediation to complete
  30. Right click Host6 object and choose Exit Maintenance Mode

It’s Saturday and your kids want to go to the park. Do the math.

Update 5/5/10: I received this response back on 3/5/10 from VMware but failed to follow up with finding out if it was ok to share with the public.  I’ve received the blessing now so here it is:

[It] seems pretty tactical to me. We’re still trying to determine if this was documented publicly, and if not, correct the documentation and our processes.

We introduced this behavior in vSphere 4.0 U1 as a partial fix for a particular class of problem. The original problem is in the behavior of the remediation wizard if the user has chosen to power off or suspend virtual machines in the Failure response option.

If a stand-alone host is running a VM with VC or VUM in it and the user has selected those options, the consequences can be drastic – you usually don’t want to shut down your VC or VUM server when the remediation is in progress. The same applies to a DRS disabled cluster.

In DRS enabled cluster, it is also possible that VMs could not be migrated to other hosts for configuration or other reasons, such as a VM with Fault Tolerance enabled. In all these scenarios, it was possible that we could power off or suspend running VMs based on the user selected option in the remediation wizard.

To avoid this scenario, we decided to skip those hosts totally in first place in U1 time frame. In a future version of VUM, it will try to evacuate the VMs first, and only in cases where it can’t migrate them will the host enter a failed remediation state.

One work around would be to remove such a host from its cluster, patch the cluster, move the host back into the cluster, manually migrate the VMs to an already patched host, and then patch the original host.

It would appear VMware intends to grant us back some flexibility in future versions of vCenter/VUM.  Let’s hope so. This implementation leaves much to be desired.

Update 5/6/10: LucD created a blog post titled Counter the self-aware VUM. In this blog post you’ll find a script which finds the ESX host(s) that is/are running the VUM guest and/or the vCenter guest and will vMotion the guest(s) to another ESX host when needed.

VMkernel Networks, Jumbo Frames, and ESXi 4

February 12th, 2010

Question:  Can I implement jumbo frames on ESXi 4 Update 1 VMkernel networks?

Answer:  Who in the hell knows?

You see, the ESXi 4.0 Update 1 Configuration Guide states on page 54:

“Jumbo frames are not supported for VMkernel networking interfaces in ESXi.”

Duncan Epping of Yellow Bricks also reports:

“Jumbo frames are not supported for VMkernel networking interfaces in ESXi. (page 54)”

One month after the release of ESXi 4 Update 1, Charu Chaubal of VMware posted on the ESXi Chronicles blog:

“I am happy to say that this is merely an error in the documentation. In fact, ESXi 4.0 DOES support Jumbo Frames on VMkernel networking interfaces. The correction will hopefully appear in a new release of the documentation, but in the meantime, go ahead and configure Jumbo frames for your ESXi 4.0 hosts.”

Shortly after, Duncan Epping of Yellow Bricks confirms Charu Chaubal’s report that jumbo frames are supported on ESXi VMkernel networks.

Now, nearly two months after Charu’s clarification and three months after the release of ESXi 4 Update 1, the documentation remains dubious on page 54 stating that jumbo frames are not supported on ESXi 4 VMkernel networks which is a direct contradition to a VMware ESXi blog.

I opened a Business Critical Support SR with VMware on the question.  I was told by VMware BCS that jumbo frames are NOT supported on ESXi 4 Update 1 VMkernel networks and a reference was made to the documentatation on page 54. 

Our dedicated VMware onsite Engineer escalated and I was then told ESXi 4 Update 1 DOES support jumbo frames on VMkernel networks, making reference to Charu’s article.

Hey VMware, which is it?  If this is a documentation mistake, why are you dragging your feet in getting the documentation updated two months after a VMware employee discovers the error and blogs it?  Waiting for the next release of ESXi?  Unacceptable!  You update the public documentation as soon as you discover the error and be damned sure your BCS support Engineers know the right answer!  Do you know how much companies pay for BCS?  You owe your customers the correct answer.  If misinformation comes as a result of a known documentation error, SHAME ON YOU!  Architecture and design decisions are being made daily on this information or misinformation, which ever it may be.

Update 2/23/10:  Toby Kraft (@vmwarewriter on Twitter) will be updating the documentation by next week.  Thank you Toby!

Update 3/1/10:  VMware has updated their documentation to reflect currently supported configurations.  Thank you VMware (and Toby)!

VMware Releases ESX(i) 3.5 Update 5; Critical Updates

December 5th, 2009

VMware apparently released ESX(i) 3.5 Update 5 dated 12/3/09, however it became available on Update Manager late this afternoon.  VMware is extremely poor at communicating anything but major releases, so to get the fastest notification possible about security patches and updates, I configure my VMware Update Manager servers to check for updates every 6 hours and provide me with email notification of anything it finds.  VMware doesn’t listen to me much when it comes to feature requests so I’ll shelve the ranting.

So what’s new in ESX 3.5 Update 5?  The major highlights are guest VM support for Windows 7 and Windows Server 2008 R2 (reminder, 64-bit only), as well as Ubuntu 9.04, and added hardware support for processors and NICs.  Before you get too excited about Windows 7, remember that it is not a supported guest operating system in VMware View.  Even in the new View 4 release, Windows 7 has “Technology Preview” support status only.

If you track the updates from VMware Update Manager, the 12/3 releases amount to 20 updates including Update 5, 16 updates of which are rated critical.  If you’re still a ways out on vSphere deployment, you’ll probably want to take a look at the critical updates for your 3.x environment.

Enablement of Intel Xeon Processor 3400 Series – Support for the Intel Xeon processor 3400 series has been added. Support includes Enhanced VMotion capabilities. For additional information on previous processor families supported by Enhanced VMotion, see Enhanced VMotion Compatibility (EVC) processor support (KB 1003212).

Driver Update for Broadcom bnx2 Network Controller – The driver for bnx2 controllers has been upgraded to version 1.6.9. This driver supports bootcode upgrade on bnx2 chipsets and requires bmapilnx and lnxfwnx2 tools upgrade from Broadcom. This driver also adds support for Network Controller – Sideband Interface (NC-SI) for SOL (serial over LAN) applicable to Broadcom NetXtreme 5709 and 5716 chipsets.

Driver Update for LSI SCSI and SAS Controllers – The driver for LSI SCSI and SAS controllers is updated to version 2.06.74. This version of the driver is required to provide a better support for shared SAS environments.

Newly Supported Guest Operating Systems – Support for the following guest operating systems has been added specifically for this release:

For more complete information about supported guests included in this release, see the VMware Compatibility Guide: http://www.vmware.com/resources/compatibility/search.php?deviceCategory=software.

•Windows 7 Enterprise (32-bit and 64-bit)
•Windows 7 Ultimate (32-bit and 64-bit)
•Windows 7 Professional (32-bit and 64-bit)
•Windows 7 Home Premium (32-bit and 64-bit)
•Windows 2008 R2 Standard Edition (64-bit)
•Windows 2008 R2 Enterprise Edition (64-bit)
•Windows 2008 R2 Datacenter Edition (64-bit)
•Windows 2008 R2 Web Server (64-bit)
•Ubuntu Desktop 9.04 (32-bit and 64-bit)
•Ubuntu Server 9.04 (32-bit and 64-bit)

Newly Supported Management Agents – See VMware ESX Server Supported Hardware Lifecycle Management Agents for current information on supported management agents.

Newly Supported Network Cards – This release of ESX Server supports HP NC375T (NetXen) PCI Express Quad Port Gigabit Server Adapter.

Newly Supported SATA Controllers – This release of ESX Server supports the Intel Ibex Peak SATA AHCI controller.

Note:

•Some limitations apply in terms of support for SATA controllers. For more information, see SATA Controller Support in ESX 3.5. (KB 1008673)

•Storing VMFS datastores on native SATA drives is not supported.

Lab Manager 4 and vDS

September 19th, 2009

VMware Lab Manager 4 enables new functionality in that fenced configurations can now span ESX(i) hosts by leveraging vNetwork Distributed Switch (vDS) technology which is a new feature in VMware vSphere. Before getting overly excited, remember that vDS is a VMware Enterprise Plus feature only and it’s only found in vSphere. Without vSphere and VMware’s top tier license, vDS cannot be implemented and thus you wouldn’t be able to enable fenced Lab Manager 4 configurations to span hosts.

Host Spanning is enabled by default when a Lab Manager 4 host is prepared as indicated by the green check marks below:

When Host Spanning is enabled, an unmanageable Lab Manager service VM is pinned to each host where Host Spanning is enabled. This Lab Manager service VM cannot be powered down, suspended, VMotioned, etc.:

One ill side effect of this new Host Spanning technology is that an ESX(i) host will not enter maintenance mode while Host Spanning is enabled. For those new to Lab Manager 4, the cause may not be so obvious and it can lead to much frustration. An unmanageable Lab Manager service VM is pinned to each host where Host Spanning is enabled and a running VM will prevent a host from entering maintenance mode. Maintenance mode will hang at the infamous 2% complete status:

The resolution is to first cancel the maintenance mode request. Then, manually disable host spanning in the Lab Manager host configuration property sheet by unchecking the box. Notice the highlighted message in pink telling us that Host Spanning must be disabled in order for the host to enter standby or maintenance mode. Unpreparing the host will also accomplish the goal of removing the service VM but this is much more drastic and should only be done if no other Lab Manager VMs are running on the host:

After reconfiguring the Lab Manager 4 host as described above, vSphere Client Recent Tasks shows the service VM is powered off and then removed by the Lab Manager service account:

At this time, invoke the maintenance mode request and the host will now be able to migrate all VMs off and successfully enter maintenance mode.

While Lab Manager 4 Host Spanning is a step in the right direction for more flexible load distribution across hosts in a Lab Manager 4 cluster, I find the process for entering maintenance mode counter intuitive, cumbersome, and at the beginning when I didn’t know what was going on, frustrating. Unsuccessful maintenance mode attempts have always been somewhat mysterious in the past because vCenter Server doesn’t give us much information to pinpoint the problem as far as what’s preventing the maintenance mode. This situation now adds another element to the complexity. VMware should have enough intelligence to disable Host Spanning for us in the event of a maintenance mode request, or at the very least, tell us to shut it off since it is conveniently and secretly enabled by default during host preparation. Of course, all of this information is available in the Lab Manager documentation, but who reads that, right? :)