Windows 2008 R2 and Windows 7 on vSphere

March 28th, 2010 by jason No comments »

If you run Windows Server 2008 R2 or Windows 7 as a guest VM on vSphere, you may be aware that it was advised in VMware KB Article 1011709 that the SVGA driver should not be installed during VMware Tools installation.  If I recall correctly, this was due to a stability issue which was seen in specific, but not all, scenarios:

If you plan to use Windows 7 or Windows 2008 R2 as a guest operating system on ESX 4.0, do not use the SVGA drivers included with VMware Tools. Use the standard SVGA driver instead.

Since the SVGA driver is installed by default in a typical installation, it was necessary to perform a custom installation (or scripted perhaps) to exclude the SVGA driver for these guest OS types.  Alternatively, perform a typical VMware Tools installation and remove the SVGA driver from the Device Manager afterwards.  What you ended up with, of course, is a VM using the Microsoft Windows supplied SVGA driver and not the VMware Tools version shown in the first screenshot.  The Microsoft Windows supplied SVGA driver worked and provided stability as well, however one side effect was that mouse movement via VMware Remote Console felt a bit sluggish.

Beginning with ESX(i) 4.0 Update 1 (released 11/19/09), VMware changed the behavior and revised the above KB article in February, letting us know that they now package a new version of the SVGA driver in VMware Tools in which the bits are populated during a typical installation but not actually enabled:

The most effective solution is to update to ESX 4.0 Update 1, which provides a new WDDM driver that is installed with VMware Tools and is fully supported. After VMware Tools upgrade you can find it in C:\Program Files\Common Files\VMware\Drivers\wddm_video.

After a typical VMware Tools installation, you’ll still see a standard SVGA driver installed.  Following the KB article, head to Windows Device Manager and update the driver to the bits located in C:\Program Files\Common Files\VMware\Drivers\wddm_video:

    

The result is the new wddm driver, which ships with the newer version of VMware Tools, is installed: 

After a reboot, the crisp and precise mouse movement I’ve become accustomed to over the years with VMware has returned.  The bummer here is that while the appropriate VMware SVGA drivers get installed in previous versions of Windows guest operating systems, Windows Server 2008 R2 and Windows 7 require manual installation steps, much like VMware Tools installation on Linux guest VMs.  Add to this the fact that the automated installation/upgrade of VMware Tools via VMware Update Manager (VUM) does not enable the wddm driver.  In short, getting the appropriate wddm driver installed for many VMs will require manual intervention or scripting.  One thing you can do is to get the wddm driver installed in your Windows Server 2008 R2 and Windows 7 VM templates.  This will ensure VMs deployed from the templates have the wddm driver installed and enabled.

The wddm driver install method from VMware is helpful for the short term, however, it’s not the scalable and robust long term solution.  We need an automated solution from VMware to get the wddm driver installed.  It needs to be integrated with VUM.  I’m interested in finding out what happens with the next VMware Tools upgrade – will the wddm driver persist, or will the VMware Tools upgrade replace the wddm version with the standard version?  Stay tuned.

Update 11/6/10:  While working in the lab tonight, I noticed that with vSphere 4.1, the correct wddm video driver is installed as part of a standard VMware Tools installation on Windows 7 Ultimate x64 – no need to manually replace the Microsoft video driver with VMware’s wddm version as this is done automatically now.

Update 12/10/10: As a follow up to these tests, I wanted to see what happens when the wddm driver is installed under ESX(i) 4.0 Update 1 and its corresponding VMware Tools, and then the VM is moved to an ESX(i) 4.1 cluster and the VMware Tools are upgraded.  Does the wddm driver remain in tact, or will the 4.1 tools upgrade somehow change the driver?  During this test, I opted to use Windows 7 Ultimate 32-bit as the guest VM guinea pig.  A few discoveries were made, one of which was a surprise:

1.  Performing a standard installation of VMware Tools from ESXi 4.0 Update 1 on Windows 7 32-bit will automatically install the wddm driver, version 7.14.1.31 as shown below.  No manual steps were required to install this driver forcing a 2nd reboot.  I wasn’t counting on this.  I expected the Standard VGA Graphics Adapter driver to be installed as seen previously.  This is good.

SnagIt Capture

After moving the VM to a 4.1 cluster and performing the VMware Tools upgrade, the wddm driver was left in tact, however, its version was upgraded to 7.14.1.40.  This is also good in that the tools ugprade doesn’t negatively impact the desired results of leveraging the wddm driver for best graphics performance.

SnagIt Capture

More conclusive testing should be done with Windows 7 and Windows Server 2008 R2 64-bit to see if the results are the same.  I’ll save this for a future lab maybe.

Configuring disks to use VMware Paravirtual SCSI (PVSCSI) adapters

March 25th, 2010 by jason No comments »

This is one of those “I’m documenting it for my own purposes” articles.  Yes I read my own blog once in a while to find information on past topics.  Here I’m basically copying a VMware KB article but I’ll provide a brief introduction.

So your wondering if you should use VMware Paravirtual SCSI?  I’ve gotten this question a few times.  PVSCSI is one of those technologies where “should I implement it” could be best answered with the infamous consulting reply “it depends”.  One person asked if it would be good to use as a default configuration for all VMs.  One notion that I would agree on by and large is that I feel the support complexity increases when using PVSCSI and it should only be used as needed for VMs which need an additional bit of performance squeezed from the disk subsystem.  This is not a technology I would implement by default on all VMs.  Dissecting the practical beneifts and ROI of implementing PVSCSI should be performed, but before that, your valuable time may be better spent finding out if your environment will support it to begin with.  Have a look at VMware KB Article 1010398 which is where the following information comes from, verbatim.

It’s important to identify the guest VMs which support PVSCSI:

Paravirtual SCSI adapters are supported on the following guest operating systems:

  • Windows Server 2008
  • Windows Server 2003
  • Red Hat Enterprise Linux (RHEL) 5

It’s important to further identify more ambiguous type situations where PVSCSI may or may not not fit:

Paravirtual SCSI adapters also have the following limitations:

  • Hot add or hot remove requires a bus rescan from within the guest.
  • Disks with snapshots might not experience performance gains when used on Paravirtual SCSI adapters or if memory on the ESX host is overcommitted.
  • If you upgrade from RHEL 5 to an unsupported kernel, you might not be able to access data on the virtual machine’s PVSCSI disks. You can runvmware-config-tools.pl with the kernel-version parameter to regain access.
  • Because the default type of newly hot-added SCSI adapter depends on the type of primary (boot) SCSI controller, hot-adding a PVSCSI adapter is not supported.
  • Booting a Linux guest from a disk attached to a PVSCSI adapter is not supported. A disk attached using PVSCSI can be used as a data drive, not a system or boot drive. Booting a Microsoft Windows guest from a disk attached to a PVSCSI adapter is not supported in versions of ESX prior to ESX 4.0 Update 1

For more information on PVSCSI, including installation steps, see VMware KB Article 1010398.  One more important thing to note is that in some operating system types, to install PVSCSI, you need to create a virtual machine with the LSI controller, install VMware Tools, then change to the drives to paravirtualized mode.

Meet the VMware Certified Design Experts

March 22nd, 2010 by jason No comments »

VMware Education Services has unveiled a new site called Meet the VMware Certified Design Experts.  The site serves as a directory of individuals who have achieved VMware VCDX certification, currently VMware’s premier level achievement.  Here you’ll find a short profile and a photo of each of the 39 existing VMware Certified Design Experts around the world.

VMware Education describes the site as follows: 

VMware Certified Design Experts have achieved the VMware Certified Design Expert (VCDX) certification by building on their VMware Certified Professional (VCP) certification. Each expert has taken the extra steps of passing both the Enterprise Administration Exam and the Design Exam, as well as successfully defending a VMware infrastructure design and implementation plan.

VMware Certified Design Experts are part of an elite group of architects leading virtualization implementations around the world. Join this community of experts by earning your VCDX certification.

VCDX’s can use the site to meet others around the world who have been through the process.  Pursuant to VCDX Tip #38, customers, employers, or recruiters can use the site to validate a candidate’s credentials for a project or employment.  For verification purposes, keep in mind that the site is not automated or updated the moment a candidate is minted.  There could be a delay of a few weeks before a certified individual appears on the site.

VMware is working hard to expand the pool of VCDX certified Architects and Engineers.  Many candidates have already satisfied the three written exam requirements and are waiting for their shot at a design submission and defense panel.  VMware typically holds defense panels at major VMware events in the US and Europe.  Just last week, VMware held VCDX defense panels in Munich, Germany, breaking the mold of holding the panels around major VMware events.  Design submissions and defense panels are proctored by existing VCDX’s in the pool.  As the pool grows, VMware should be able to handle larger volumes of candidates.  At that point the system will be pretty well primed and hopefully efficient from a candidate’s perspective.

The VMware Certified Professional Program is designed for individuals who want to demonstrate their expertise in virtual infrastructure and increase the potential for career advancement.  For additional information, you can learn more about the process here.

vpxd.cfg Advanced Configuration

March 13th, 2010 by jason No comments »

vpxd.cfg is an XML formatted file which can be modified to alter the native behavior of the VMware vCenter Server.  Sparse references on the internet document the changes that can be made in this environment.  Inspired by Ulli Hankeln, the purpose of this blog post is to collect and document all known, unknown, supported, and unsupported vpxd.cfg modifications in a centralized location. 

If you have any to add, please provide feedback in the form of a blog comment along with a link pointing to a reference and I’ll update the post.

**Disclaimer**
As with anything found on this site and much of the internet in general, information is provided “as is” without warranty.  Modify settings at your own risk.  I suggest thoroughly researching the changes first and also checking with VMware Support.

The vpxd.cfg file is located on the VMware vCenter Server by default at %ALLUSERPROFILE%\Application Data\VMware\VMware VirtualCenter\vpxd.cfg

  • On Windows Server 2008, this would generally be C:\ProgramData\VMware\VMware VirtualCenter\vpxd.cfg
  • On Windows Server 2003, this would generally be C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg

This collection of vpxd.cfg settings has been sourced from various places.  The parameters will generally apply to a version of vCenter Server ranging from 2.0 through 4.x.  A given parameter can apply to several or even all versions.  However, one thing I didn’t do was specify which version of vCenter Server the parameter applies to – too much work – sorry – you’ll have to experiment in your lab or DEV environment.  I do think it’s safe to say that most of these parameters focus on the latest releases of vCenter Server – 2.5 and 4.0.

Remember to restart the VMware VirtualCenter Server service in the Server Manager for changes to vpxd.cfg to take effect.

Tag:  blockingTimeoutSeconds
Nested In:  vmomi, soapStubAdapter
What It Does:  Defines the timeout value in seconds for SOAP layer blocking.  Use cases for increasing: slow connections, low bandwidth, or high latency between virtual infrastructure components.  Read more here and here.
Example:

<vmomi>
<soapStubAdapter>
<blockingTimeoutSeconds>10800</blockingTimeoutSeconds>
</soapStubAdapter>
</vmomi>

Tag:  calls
Nested In:  trace, vmomi
What It Does:  Unknown.  Read more here.
Example:

<trace>
<vmomi>
<calls>true</calls>
</vmomi>
</trace>

Tag:  cipherList
Nested In:  vmacore, ssl
What It Does:  Reverts to cipher suites used in previous versions of vCenter Server (2.5u3 and earlier) for browser/SSL compatibility issues.  Read more here.
Example:

<vmacore>
<ssl>
<cipherList>DEFAULT</cipherList>
</ssl>
</vmacore>

Tag:  compressOnRoll
Nested In:  log
What It Does:  Defines whether or not vCenter Server vpxd log files are rolled up and compressed into .gz files.  Read more here.
Example:

<log>
<compressOnRoll>false</compressOnRoll>
</log>

Tag:  cpuFeatureMask
Nested In:  guestOSDescriptor, esx-2-x-x, all-versions, all-guests
What It Does:  Masks CPU features to force VMotion compatibility between hosts. VMware neither supports nor recommends modifying the VMotion constraints for CPU features.  Read more here.
Example:

<guestOSDescriptor>
<esx-2-x-x>
<all-versions>
<all-guests>
<cpuFeatureMask>Elements and mask definition go in here</cpuFeatureMask>
</all-guests>
</all-versions>
</esx-2-x-x>
</guestOSDescriptor>

Tag:  directory
Nested In:  log
What It Does:  Defines the location for the vCenter logs.  Read more here.
Example:

<log>
<directory>D:\VC_Logs</directory>
</log>

Tag:  dontstartconsolidation
Nested In:  vcp2v
What It Does:  May resolve an issue where the Consolidation button is missing in the Virtual Infrastructure Client.  Read more here.
Example:

<vcp2v>
<dontstartconsolidation>true</dontstartconsolidation>
</vcp2v>

Tag:  filterOverheadLimitIssues
Nested In:  vpxd
What It Does:  Unknown.
Example:

<vpxd>
<filterOverheadLimitIssues>true</filterOverheadLimitIssues>
</vpxd>

Tag:  hostRescanFilter
Nested In:  unknown
What It Does:  Defines the behavior of mass ESX(i) host rescans of vmHBAs.  Read more here.
Example:

<hostRescanFilter>true</hostRescanFilter>

Tag:  IoMax
Nested In:  vmacore, threadpool
What It Does:  Unknown but my guess is it defines the maximum I/O for the vpxd.exe process (vCenter Server service). Influenced by TaskMax.
Example:

<vmacore>
<threadpool>
<IoMax>200</IoMax>
</threadpool>
</vmacore>

Tag:  level
Nested In:  log
What It Does:  Defines the logging level for vCenter logs.  Read more here.
Example:

 <log>
<level>trivia</level>
</log>

Tag:  logLevel
Nested In:  trace, vmomi
What It Does:  Enables debug logging level for vmomi?  Read more here.
Example:

<trace>
<vmomi>
<logLevel>verbose</logLevel>
</vmomi>
</trace>

Tag:  loglevel
Nested In:  nfc
What It Does:  Enables debug logging level for the NFC process.  Read more here.
Example:

<nfc>
<loglevel>debug</loglevel>
</nfc>

Tag:  managedIP
Nested In:  unknown
What It Does:  Defines the managed IP address used in vCenter Server Heartbeat.  Read more here.
Example:

<managedIP>10.10.0.1</managedIP>

Tag:  maxCostPerHost
Nested In:  ResourceManager
What It Does:  Defines the number of simultaneous VM migrations (both hot and cold) per ESX(i) host.  Read more here.
Example:

<ResourceManager>
<maxCostPerHost>8</maxCostPerHost>
</ResourceManager>

Tag:  maxFileNum
Nested In:  log
What It Does:  Defines the maximum number of log files for vCenter logs.  Read more here.
Example:

<log>
<maxFileNum>50</maxFileNum>
</log>

Tag:  maxFileSize
Nested In:  log
What It Does:  Defines the maximum log file size in Bytes and thus rollover interval for vCenter logs.  Read more here.
Example:

<log>
<maxFileSize>10485760</maxFileSize>
</log>

Tag:  name
Nested In:  log
What It Does:  Defines the log file prefix name for vCenter logs.  Read more here.
Example:

<log>
<name>vpxd</name>
</log>

Tag:  notRespondingTimeout
Nested In:  heartbeat
What It Does:  Defines the heartbeat timeout value in seconds between ESX(i) hosts and vCenter Server.  Use case would be to increase the value if remote ESX(i) hosts frequently go into a not responding state in vCenter Server due to WAN bandwidth or latency issues.  Read more here.
Example:

<heartbeat>
<notRespondingTimeout>60</notRespondingTimeout>
</heartbeat>

Tag:  portReserveTimeout
Nested In:  dvs
What It Does:  Defines the timeout value in minutes for unused dvPort reservations.  Lowering the value temporarily is helpful for unlocking dvPorts to remove a vDS or dvPort group.  Read more here.
Example:

<dvs>
<portReserveTimeout>10</portReserveTimeout>
</dvs>

Tag:  serializeadds
Nested In:  vpxd, das
What It Does:  Unknown but if I had to guess I’d say it defines the behavior of how the HA agent is installed on cluster hosts.
Example:

<vpxd>
<das>
<serializeadds>true</serializeadds>
</das>
</vpxd>

Tag:  slotCpuMinMHz
Nested In:  vpxd, das
What It Does:  Defines the minimum CPU calculation of a HA cluster slot size when there are no CPU reservations. Read more here.
Example:

<vpxd>
<das>
<slotCpuMinMHz>256</slotCpuMinMHz>
</das>
</vpxd>

Tag:  slotMemMinMB
Nested In:  vpxd, das
What It Does:  Defines the minimum memory calculation of a HA cluster slot size when there are no memory reservations. Read more here.
Example:

<vpxd>
<das>
<slotMemMinMB>0</slotMemMinMB>
</das>
</vpxd>

Tag:  sspiProtocol
Nested In:  unknown
What It Does:  Defines the authentication mechanism used with passthrough authentication between the Virtual Infrastructure Client and vCenter Server.  Read more here.
Example:

<sspiProtocol>Kerberos</sspiProtocol>

Tag:  TaskMax
Nested In:  vmacore, threadpool
What It Does:  Defines the number of worker threads for the vpxd.exe process (vCenter Server service). Influences IoMax.
Example:

<vmacore>
<threadpool>
<TaskMax>30</TaskMax>
</threadpool>
</vmacore>

Tag:  timeout
Nested In:  task
What It Does:  Defines the timeout value in seconds for long tasks.  Read more here.
Example:

<task>
<timeout>10800</timeout>
</task>

Tag:  verbose
Nested In:  trace, db
What It Does:  Enables database tracing.  Enables database logging in the vpxd log.  Read more here and here.
Example:

<trace>
<db>
<verbose>true</verbose>
</db>
</trace>

Tag:  verbosity
Nested In:  trace, vmomi
What It Does:  Unknown.  Read more here.
Example:

<trace>
<vmomi>
<verbosity>verbose</verbosity>
</vmomi>
</trace>

Tag:  verboseObjectSize
Nested In:  trace, vmomi
What It Does:  Unknown.  Read more here.
Example:

<trace>
<vmomi>
<verboseObjectSize>40</verboseObjectSize>
</vmomi>
</trace>

Tag:  VMOnVirtualIntranet
Nested In:  migrate, test, CompatibleNetworks
What It Does:  Setting to false enables VMotion for VMs connected to an internal vSwitch. Setting to false will turn off the internal vSwitch restriction on VMotion events. Useful for servers behind a firewall virtual appliance deployed in bridged networking mode.  Read more here.
Example:

<migrate>
<test>
<CompatibleNetworks>
<VMOnVirtualIntranet>false</VMOnVirtualIntranet>
</CompatibleNetworks>
</test>
</migrate>

Tag:  VMOverheadGrowthLimit
Nested In:  cluster
What It Does:  Defines the growth rate cap in terms of MB per minute for VM memory overhead at the cluster level. Can be adjusted to resolve high CPU utilization in guest VMs introduced in ESX(i) 3.5 and vCenter 2.5.  Read more here.
Example:

<cluster>
<VMOverheadGrowthLimit>5</VMOverheadGrowthLimit>
</cluster>

 

Slightly related, the vCenter Server process (vpxd.exe) can be launched at a command prompt on the vCenter Server (instead of starting as a service) for troubleshooting purposes.  The executable is located at:

<Install Directory>\VMware\Infrastructure\VirtualCenter Server>vpxd.exe

Usage: vpxd.exe [FLAGS]
Flags:
-r Register VMware VirtualCenter Server
-u Unregister VMware VirtualCenter Server
-s Run as a standalone server rather than a Service
-c Print vmdb schema to stdout
-b Recreate database repository
-f cfg Use the specified file instead of the default vpxd.cfg
-l licenseKey Store license key in ldap and assign it to VirtualCenter
-e feature Set the feature to be in use for VirtualCenter. This option takes only one feature at a time.
-p Reset the database password
-v Print the version number to stdout

Perf Charts Service Experienced An Internal Error

March 12th, 2010 by jason No comments »

Happy Friday evening y’all.  Tonight’s blog post comes from a former colleague of mine whom I will call “Paul Berg”.  Paul came across an error in VMware vSphere which he was able to resolve and he would like to share the solution with the VMware community. 

Paul uses an Oracle database to back end vCenter. When viewing the performance charts in Performance tab | Overview button, he received the following error:

Perf Charts service experienced an internal error.

Message:  Report application initialization is not completed successfully.  Retry in 60 seconds.

You can probably guess what followed… missing data in the charts.  No joy whatsoever.

Following is the resolution:

1. Get the fully qualified domain name or the global name of the TNS service from the Oracle database. This can be found in the file named tnsnames.ora on the Oracle database server

2. Add this FQDN to the registry key HKLM\Software\ODBC\ODBC.INI\VirtualCenter\ServerName on the VC server.

3. Restart the VMware VirtualCenter Server service.

For us, the database was listed as VMDB in the registry. We have moved to an Oracle RAC configuration so I needed to change the entry to VMDB.GLOBAL to match what was in the tnsnames.ora listing. I wasn’t aware that VMDB.GLOBAL was considered the FQDN for an Oracle DB.

The following VMware KB Article 1012812 documents the issue as well as a few different approaches to a resolution depending on root cause.  Again, this issue is specific to Oracle database environments.

Performance Overview charts fail with the error: STATs Report Service internal error

Thank you for sharing Paul.  I’ve got one more in the queue from you – I’ll try to get it out in the next couple of weeks.  Here’s a teaser: Poor vSphere performance on Nehalem processors.  Ouch!

VMware Update Manager Becomes Self-Aware

March 4th, 2010 by jason No comments »

@Mikemohr on Twitter tonight said it best:

“Haven’t we learned from Hollywood what happens when the machines become self-aware?”

I got a good chuckle.  He took my comment of VMware becoming “self-aware” exactly where I wanted it to go.  A reference to The Terminator series of films in which a sophisticated computer defense system called Skynet becomes self-aware and things go downhill for mankind from there.

Metaphorically speaking in today’s case, Skynet is VMware vSphere and mankind is represented by VMware vSphere Administrators.

During an attempt to patch my ESX(i)4  hosts, I received an error message (click the image for a larger version):

At that point, the remediation task fails and the host is not patched.  The VUM log file reflects the same error in a little more detail:

[2010-03-04 14:58:04:690 ‘JobDispatcher’ 3020 INFO] [JobDispatcher, 1616] Scheduling task VciHostRemediateTask{675}
[2010-03-04 14:58:04:690 ‘JobDispatcher’ 3020 INFO] [JobDispatcher, 354] Starting task VciHostRemediateTask{675}
[2010-03-04 14:58:04:690 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 INFO] [vciTaskBase, 534] Task started…
[2010-03-04 14:58:04:908 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 INFO] [vciHostRemediateTask, 680] Host host-112 scheduled for patching.
[2010-03-04 14:58:05:127 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 INFO] [vciHostRemediateTask, 691] Add remediate host: vim.HostSystem:host-112
[2010-03-04 14:58:13:987 ‘InventoryMonitor’ 2180 INFO] [InventoryMonitor, 427] ProcessUpdate, Enter, Update version := 15936
[2010-03-04 14:58:13:987 ‘InventoryMonitor’ 2180 INFO] [InventoryMonitor, 460] ProcessUpdate: object = vm-2642; type: vim.VirtualMachine; kind: 0
[2010-03-04 14:58:17:533 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 WARN] [vciHostRemediateTask, 717] Skipping host solo.boche.mcse as it contains VM that is running VUM or VC inside it.
[2010-03-04 14:58:17:533 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 INFO] [vciHostRemediateTask, 786] Skipping host 0BC5A140, none of upgrade and patching is supported.
[2010-03-04 14:58:17:533 ‘VciHostRemediateTask.VciHostRemediateTask{675}’ 2676 ERROR] [vciHostRemediateTask, 230] No supported Hosts found for Remediate.
[2010-03-04 14:58:17:737 ‘VciRemediateTask.RemediateTask{674}’ 2676 INFO] [vciTaskBase, 583] A subTask finished: VciHostRemediateTask{675}

Further testing in the lab revealed that this condition will be caused with a vCenter VM and/or a VMware Update Manager (VUM) VM. I understand from other colleagues on the Twitterverse that they’ve seen the same symptoms occur with patch staging.

The work around is to manually place the host in maintenance mode, at which time it has no problem whatsoever evacuating all VMs, including infrastructure VMs.  At that point, the host in maintenance mode can be remediated.

VMware Update Manager has apparently become self-aware in that it detects when its infrastructure VMs are running on the same host hardware which is to be remediated.  Self-awareness in and of itself isn’t bad, however, its feature integration is.  Unfortunately for the humans, this is a step backwards in functionality and a reduction in efficiency for a task which was once automated.  Previously, a remediation task had no problem evacuating all VMs from a host, infrastructure or not. What we have now is… well… consider the following pre and post “self-awareness” remediation steps:

Pre “self-awareness” remediation for a 6 host cluster containing infrastructure VMs:

  1. Right click the cluster object and choose Remediate
  2. Hosts are automatically and sequentially placed in maintenance mode, evacuated, patched, rebooted, and brought out of maintenance mode

Post “self-awareness” remediation for a 6 host cluster containing infrastructure VMs:

  1. Right click Host1 object and choose Enter Maintenance Mode
  2. Wait for evacutation to complete
  3. Right click Host1 object and choose Remediate
  4. Wait for remediation to complete
  5. Right click Host1 object and choose Exit Maintenance Mode
  6. Right click Host2 object and choose Enter Maintenance Mode
  7. Wait for evacutation to complete
  8. Right click Host2 object and choose Remediate
  9. Wait for remediation to complete
  10. Right click Host2 object and choose Exit Maintenance Mode
  11. Right click Host3 object and choose Enter Maintenance Mode
  12. Wait for evacutation to complete
  13. Right click Host3 object and choose Remediate
  14. Wait for remediation to complete
  15. Right click Host3 object and choose Exit Maintenance Mode
  16. Right click Host4 object and choose Enter Maintenance Mode
  17. Wait for evacutation to complete
  18. Right click Host4 object and choose Remediate
  19. Wait for remediation to complete
  20. Right click Host4 object and choose Exit Maintenance Mode
  21. Right click Host5 object and choose Enter Maintenance Mode
  22. Wait for evacutation to complete
  23. Right click Host5 object and choose Remediate
  24. Wait for remediation to complete
  25. Right click Host5 object and choose Exit Maintenance Mode
  26. Right click Host6 object and choose Enter Maintenance Mode
  27. Wait for evacutation to complete
  28. Right click Host6 object and choose Remediate
  29. Wait for remediation to complete
  30. Right click Host6 object and choose Exit Maintenance Mode

It’s Saturday and your kids want to go to the park. Do the math.

Update 5/5/10: I received this response back on 3/5/10 from VMware but failed to follow up with finding out if it was ok to share with the public.  I’ve received the blessing now so here it is:

[It] seems pretty tactical to me. We’re still trying to determine if this was documented publicly, and if not, correct the documentation and our processes.

We introduced this behavior in vSphere 4.0 U1 as a partial fix for a particular class of problem. The original problem is in the behavior of the remediation wizard if the user has chosen to power off or suspend virtual machines in the Failure response option.

If a stand-alone host is running a VM with VC or VUM in it and the user has selected those options, the consequences can be drastic – you usually don’t want to shut down your VC or VUM server when the remediation is in progress. The same applies to a DRS disabled cluster.

In DRS enabled cluster, it is also possible that VMs could not be migrated to other hosts for configuration or other reasons, such as a VM with Fault Tolerance enabled. In all these scenarios, it was possible that we could power off or suspend running VMs based on the user selected option in the remediation wizard.

To avoid this scenario, we decided to skip those hosts totally in first place in U1 time frame. In a future version of VUM, it will try to evacuate the VMs first, and only in cases where it can’t migrate them will the host enter a failed remediation state.

One work around would be to remove such a host from its cluster, patch the cluster, move the host back into the cluster, manually migrate the VMs to an already patched host, and then patch the original host.

It would appear VMware intends to grant us back some flexibility in future versions of vCenter/VUM.  Let’s hope so. This implementation leaves much to be desired.

Update 5/6/10: LucD created a blog post titled Counter the self-aware VUM. In this blog post you’ll find a script which finds the ESX host(s) that is/are running the VUM guest and/or the vCenter guest and will vMotion the guest(s) to another ESX host when needed.

11 New ESX(i) 4.0 Patch Definitions Released; 6 Critical

March 3rd, 2010 by jason No comments »

Eleven new patch definitions have been released for ESX(i) 4.0 (7 for ESX, 2 for ESXi, 2 for the Cisco Nexus 1000V).  Previous versions of ESX(i) are not impacted.

6 of the 11 patch definitions are rated critical and should be evaluated quickly for application in your virtual infrastructure.

ID: ESX400-201002401-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates vmkernel64,vmx,hostd etc

This patch provides support and fixes the following issues:

  • On some systems under heavy networking and processor load (large number of virtual machines), some NIC drivers might randomly attempt to reset the device and fail.
    The VMkernel logs generate the following messages every second:
    Oct 13 05:19:19 vmkernel: 0:09:22:33.216 cpu2:4390)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
    Oct 13 05:19:20 vmkernel: 0:09:22:34.218 cpu8:4395)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
  • ESX hosts do not display the proper status of the NFS datastore after recovering from a connectivity loss.
    Symptom: In vCenter Server, the NFS datastore is displayed as inactive.
  • When using NPIV, if the LUN on the physical HBA path is not same as the LUN on the virtual port (VPORT) path, though the LUNID:TARGETID pairs are same, then I/O might be directed to the wrong LUN causing a possible data corruption. Refer KB 1015290 for more information.
    Symptom: If NPIV is not configured properly, I/O might be directed to the wrong disk.
  • On Fujitsu systems, the OEM-IPMI-Command-Handler that lists the available OEM IPMI commands do not work as intended. No custom OEM IPMI commands are listed, though they were initialized correctly by the OEM. After applying this fix, running the VMware_IPMIOEMExtensionService and VMware_IPMIOEMExtensionServiceImpl objects displays the supported commands as listed in the command files.
  • Provides prebuilt kernel module drivers for Ubuntu 9.10 guest operating systems.
  • Adds support for upstreamed kernel PVSCSI and vmxnet3 modules.
  • Provides a change to the maintenance mode requirement during Cisco Nexus 1000V software upgrade. After installing this patch if you perform Cisco Nexus 1000V software upgrade, the ESX host goes into maintenance mode during the VEM upgrade.
  • In certain race conditions, freeing journal blocks from VMFS filesystems might fail. The WARNING: J3: 1625: Error freeing journal block (returned 0) <FB 428659> for 497dd872-042e6e6b-942e-00215a4f87bb: Lock was not free error is written to the VMware logs.
  • Changing the resolution of the guest operating system over a PCoIP connection (desktops managed by View 4.0) might cause the virtual machine to stop responding.
    Symptoms: The following symptoms might be visible:

    • When you try to connect to the virtual machine through a vCenter Server console, a black screen appears with the Unable to connect to MKS: vmx connection handshake failed for vmfs {VM Path} message.
    • Performance graphs for CPU and memory usage in vCenter Server drop to 0.
    • Virtual machines cannot be powered off or restarted.

ID: ESX400-201002402-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates initscripts

This patch fixes an issue where pressing Ctrl+Alt+Delete on service console causes ESX 4.0 hosts to reboot.

ID: ESX400-201002404-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates glib2

The service console package for GLib2 is updated to version glib2-2.12.3-4.el5_3.1. This GLib update fixes an issue where the functions inside GLib incorrectly allows multiple integer overflows leading to heap-based buffer overflows in GLib’s Base64 encoding and decoding functions. This might allow an attacker to possibly execute arbitrary code while a user is running the application. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2008-4316 to this issue.

ID: ESX400-201002405-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates megaraid-sas

This patch fixes an issue where some applications do not receive events even after registering for Asynchronous Event Notifications (AEN). This issue occurs when multiple applications register for AENs.

ID: ESX400-201002406-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates newt

The service console package for Newt library is updated to version newt-0.52.2-12.el5_4.1. This security update of Newt library fixes an issue where an attacker might cause a denial of service or possibly execute arbitrary code with the privileges of a user who is running applications using the Newt library. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2009-2905 to this issue.

ID: ESX400-201002407-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates nfs-utils

The service console package for nfs-utils is updated to version nfs-utils-1.0.9-42.el5. This security update of nfs-utils fixes an issue that might permit a remote attacker to bypass an intended access restriction. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2008-4552 to this issue.

ID: ESX400-201002408-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates Enic driver

In scenarios where Pass Thru Switching (PTS) is in effect, if virtual machines are powered on, the network interface might not come up. In PTS mode, when the network interface is brought up, PTS figures the MTU from the network. There is a race in this scenario, where the enic driver might incorrectly indicate that the driver fails. This issue might occur frequently on a CISCO UCS system. This patch fixes the issue.

ID: ESXi400-201002401-BG Impact: Critical Release date: 2010-03-03 Products: embeddedEsx 4.0.0 Updates Firmware

This patch provides support and fixes the following issues:

  • On some systems under heavy networking and processor load (large number of virtual machines), some NIC drivers might randomly attempt to reset the device and fail.
    The VMkernel logs generate the following messages every second:
    Oct 13 05:19:19 vmkernel: 0:09:22:33.216 cpu2:4390)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
    Oct 13 05:19:20 vmkernel: 0:09:22:34.218 cpu8:4395)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
  • ESX hosts do not display the proper status of the NFS datastore after recovering from a connectivity loss.
    Symptom: In vCenter Server, the NFS datastore is displayed as inactive.
  • When using NPIV, if the LUN on the physical HBA path is not same as the LUN on the virtual port (VPORT) path, though the LUNID:TARGETID pairs are same, then I/O might be directed to the wrong LUN causing a possible data corruption. Refer KB 1015290 for more information.
    Symptom: If NPIV is not configured properly, I/O might be directed to the wrong disk.
  • On Fujitsu systems, the OEM-IPMI-Command-Handler that lists the available OEM IPMI commands do not work as intended. No custom OEM IPMI commands are listed, though they were initialized correctly by the OEM. After applying this fix, running the VMware_IPMIOEMExtensionService and VMware_IPMIOEMExtensionServiceImpl objects displays the supported commands as listed in the command files.
  • Provides prebuilt kernel module drivers for Ubuntu 9.10 guest operating systems.
  • Adds support for upstreamed kernel PVSCSI and vmxnet3 modules.
  • Provides a change to the maintenance mode requirement during Cisco Nexus 1000V software upgrade. After installing this patch if you perform Cisco Nexus 1000V software upgrade, the ESX host goes into maintenance mode during the VEM upgrade.
  • In certain race conditions, freeing journal blocks from VMFS filesystems might fail. The WARNING: J3: 1625: Error freeing journal block (returned 0) <FB 428659> for 497dd872-042e6e6b-942e-00215a4f87bb: Lock was not free error is written to the VMware logs.
  • Changing the resolution of the guest operating system over a PCoIP connection (desktops managed by View 4.0) might cause the virtual machine to stop responding.
    Symptoms: The following symptoms might be visible:

    • When you try to connect to the virtual machine through a vCenter Server console, a black screen appears with the Unable to connect to MKS: vmx connection handshake failed for vmfs {VM Path} message.
    • Performance graphs for CPU and memory usage in vCenter Server drop to 0.
    • Virtual machines cannot be powered off or restarted.

ID: ESXi400-201002402-BG Impact: Critical Release date: 2010-03-03 Products: embeddedEsx 4.0.0 Updates VMware Tools

This patch fixes an issue where pressing Ctrl+Alt+Delete on service console causes ESX 4.0 hosts to reboot.

ID: VEM400-201002001-BG Impact: HostGeneral Release date: 2010-03-03 Products: embeddedEsx 4.0.0, esx 4.0.0 Cisco Nexus 1000V VEM

ID: VEM400-201002011-BG Impact: HostGeneral Release date: 2010-03-03 Products: embeddedEsx 4.0.0, esx 4.0.0 Cisco Nexus 1000V VEM