Posts Tagged ‘ESX’

11 New ESX(i) 4.0 Patch Definitions Released; 6 Critical

March 3rd, 2010

Eleven new patch definitions have been released for ESX(i) 4.0 (7 for ESX, 2 for ESXi, 2 for the Cisco Nexus 1000V).  Previous versions of ESX(i) are not impacted.

6 of the 11 patch definitions are rated critical and should be evaluated quickly for application in your virtual infrastructure.

ID: ESX400-201002401-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates vmkernel64,vmx,hostd etc

This patch provides support and fixes the following issues:

  • On some systems under heavy networking and processor load (large number of virtual machines), some NIC drivers might randomly attempt to reset the device and fail.
    The VMkernel logs generate the following messages every second:
    Oct 13 05:19:19 vmkernel: 0:09:22:33.216 cpu2:4390)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
    Oct 13 05:19:20 vmkernel: 0:09:22:34.218 cpu8:4395)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
  • ESX hosts do not display the proper status of the NFS datastore after recovering from a connectivity loss.
    Symptom: In vCenter Server, the NFS datastore is displayed as inactive.
  • When using NPIV, if the LUN on the physical HBA path is not same as the LUN on the virtual port (VPORT) path, though the LUNID:TARGETID pairs are same, then I/O might be directed to the wrong LUN causing a possible data corruption. Refer KB 1015290 for more information.
    Symptom: If NPIV is not configured properly, I/O might be directed to the wrong disk.
  • On Fujitsu systems, the OEM-IPMI-Command-Handler that lists the available OEM IPMI commands do not work as intended. No custom OEM IPMI commands are listed, though they were initialized correctly by the OEM. After applying this fix, running the VMware_IPMIOEMExtensionService and VMware_IPMIOEMExtensionServiceImpl objects displays the supported commands as listed in the command files.
  • Provides prebuilt kernel module drivers for Ubuntu 9.10 guest operating systems.
  • Adds support for upstreamed kernel PVSCSI and vmxnet3 modules.
  • Provides a change to the maintenance mode requirement during Cisco Nexus 1000V software upgrade. After installing this patch if you perform Cisco Nexus 1000V software upgrade, the ESX host goes into maintenance mode during the VEM upgrade.
  • In certain race conditions, freeing journal blocks from VMFS filesystems might fail. The WARNING: J3: 1625: Error freeing journal block (returned 0) <FB 428659> for 497dd872-042e6e6b-942e-00215a4f87bb: Lock was not free error is written to the VMware logs.
  • Changing the resolution of the guest operating system over a PCoIP connection (desktops managed by View 4.0) might cause the virtual machine to stop responding.
    Symptoms: The following symptoms might be visible:

    • When you try to connect to the virtual machine through a vCenter Server console, a black screen appears with the Unable to connect to MKS: vmx connection handshake failed for vmfs {VM Path} message.
    • Performance graphs for CPU and memory usage in vCenter Server drop to 0.
    • Virtual machines cannot be powered off or restarted.

ID: ESX400-201002402-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates initscripts

This patch fixes an issue where pressing Ctrl+Alt+Delete on service console causes ESX 4.0 hosts to reboot.

ID: ESX400-201002404-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates glib2

The service console package for GLib2 is updated to version glib2-2.12.3-4.el5_3.1. This GLib update fixes an issue where the functions inside GLib incorrectly allows multiple integer overflows leading to heap-based buffer overflows in GLib’s Base64 encoding and decoding functions. This might allow an attacker to possibly execute arbitrary code while a user is running the application. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2008-4316 to this issue.

ID: ESX400-201002405-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates megaraid-sas

This patch fixes an issue where some applications do not receive events even after registering for Asynchronous Event Notifications (AEN). This issue occurs when multiple applications register for AENs.

ID: ESX400-201002406-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates newt

The service console package for Newt library is updated to version newt-0.52.2-12.el5_4.1. This security update of Newt library fixes an issue where an attacker might cause a denial of service or possibly execute arbitrary code with the privileges of a user who is running applications using the Newt library. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2009-2905 to this issue.

ID: ESX400-201002407-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates nfs-utils

The service console package for nfs-utils is updated to version nfs-utils-1.0.9-42.el5. This security update of nfs-utils fixes an issue that might permit a remote attacker to bypass an intended access restriction. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2008-4552 to this issue.

ID: ESX400-201002408-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates Enic driver

In scenarios where Pass Thru Switching (PTS) is in effect, if virtual machines are powered on, the network interface might not come up. In PTS mode, when the network interface is brought up, PTS figures the MTU from the network. There is a race in this scenario, where the enic driver might incorrectly indicate that the driver fails. This issue might occur frequently on a CISCO UCS system. This patch fixes the issue.

ID: ESXi400-201002401-BG Impact: Critical Release date: 2010-03-03 Products: embeddedEsx 4.0.0 Updates Firmware

This patch provides support and fixes the following issues:

  • On some systems under heavy networking and processor load (large number of virtual machines), some NIC drivers might randomly attempt to reset the device and fail.
    The VMkernel logs generate the following messages every second:
    Oct 13 05:19:19 vmkernel: 0:09:22:33.216 cpu2:4390)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
    Oct 13 05:19:20 vmkernel: 0:09:22:34.218 cpu8:4395)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
  • ESX hosts do not display the proper status of the NFS datastore after recovering from a connectivity loss.
    Symptom: In vCenter Server, the NFS datastore is displayed as inactive.
  • When using NPIV, if the LUN on the physical HBA path is not same as the LUN on the virtual port (VPORT) path, though the LUNID:TARGETID pairs are same, then I/O might be directed to the wrong LUN causing a possible data corruption. Refer KB 1015290 for more information.
    Symptom: If NPIV is not configured properly, I/O might be directed to the wrong disk.
  • On Fujitsu systems, the OEM-IPMI-Command-Handler that lists the available OEM IPMI commands do not work as intended. No custom OEM IPMI commands are listed, though they were initialized correctly by the OEM. After applying this fix, running the VMware_IPMIOEMExtensionService and VMware_IPMIOEMExtensionServiceImpl objects displays the supported commands as listed in the command files.
  • Provides prebuilt kernel module drivers for Ubuntu 9.10 guest operating systems.
  • Adds support for upstreamed kernel PVSCSI and vmxnet3 modules.
  • Provides a change to the maintenance mode requirement during Cisco Nexus 1000V software upgrade. After installing this patch if you perform Cisco Nexus 1000V software upgrade, the ESX host goes into maintenance mode during the VEM upgrade.
  • In certain race conditions, freeing journal blocks from VMFS filesystems might fail. The WARNING: J3: 1625: Error freeing journal block (returned 0) <FB 428659> for 497dd872-042e6e6b-942e-00215a4f87bb: Lock was not free error is written to the VMware logs.
  • Changing the resolution of the guest operating system over a PCoIP connection (desktops managed by View 4.0) might cause the virtual machine to stop responding.
    Symptoms: The following symptoms might be visible:

    • When you try to connect to the virtual machine through a vCenter Server console, a black screen appears with the Unable to connect to MKS: vmx connection handshake failed for vmfs {VM Path} message.
    • Performance graphs for CPU and memory usage in vCenter Server drop to 0.
    • Virtual machines cannot be powered off or restarted.

ID: ESXi400-201002402-BG Impact: Critical Release date: 2010-03-03 Products: embeddedEsx 4.0.0 Updates VMware Tools

This patch fixes an issue where pressing Ctrl+Alt+Delete on service console causes ESX 4.0 hosts to reboot.

ID: VEM400-201002001-BG Impact: HostGeneral Release date: 2010-03-03 Products: embeddedEsx 4.0.0, esx 4.0.0 Cisco Nexus 1000V VEM

ID: VEM400-201002011-BG Impact: HostGeneral Release date: 2010-03-03 Products: embeddedEsx 4.0.0, esx 4.0.0 Cisco Nexus 1000V VEM

RVTools 2.8.1 Released

February 21st, 2010

Rob de Veij has released version 2.8.1 of his stellar virtualization utility RVTools.  I love this free tool as it provides valuable information about my infrastructure in a fast and easy format.

New in this version:
- On vHost tab new field: number of running vCPUs
- On vSphere VMs in vApp where not displayed.
- Filter not working correct when annotations or custum fields contains null value.
- When NTP server(s) = null the time info fields are not displayed on the vHost tabpage.
- When datastore name or virtual machine name containts spaces the inconsistent foldername check was not working correct.
- Tools health check now only executed for running VMs.

Go download this tool today and be sure to tell Rob how much you appreciate his development efforts!

VMware, much of this information is vital as it pertains to configuration maximums and should be available in the VMware vSphere Client for capacity planning purposes.

Preferential Treatment for DPM Hosts

February 7th, 2010

Here’s a tip that’s so simple and probably well known that it could be categorized as a stupid pet trick.

As I’ve mentioned in the past, I leverage VMware DPM (an Enterprise licensing feature) in the lab so that during periods of lesser activity (while I’m at work or sleeping, or both), ESX hosts in the lab can be placed in standby mode to cut electricity consumption and save on the energy bill.  I haven’t taken the time to research how hosts in the cluster are arbitrarily chosen for standby mode.  Over the course of time, the pattern I have witnessed tells me it’s more of a round robin type selection.  For instance, today host A will be chosen for standby mode, tomorrow host B will be chosen, and the next day, again host A will be chosen.  Perhaps load is taken into the calculation.  I don’t honestly know.  It’s not important right now.

I’ve also mentioned in the past that I run both ESX and ESXi in the same vSphere cluster.  This is a VMware supported configuration. I do this so that I can get a daily dose of both host platform experiences.  I’m not shy in saying my platform preference is still ESX because of its Service Console. What can I say… old habits are hard to break, but I’m trying, I really am.  More often than not, I need ESX Service Console access for whatever reason.  When I pop in the lab and find out that the ESX host is in standby mode, it takes a good 5 minutes to wake it up and then work on the things I need to get done.

Enter DPM Host Options.  This feature lets me apply some rules in the host selection process for DPM.  In this case, I want DPM to do its thing and save me money, but I don’t want it to shut down the ESX host.  Rather, shut down the ESXi host instead.  To do this is simple.  Modify the cluster settings and disable DPM for the ESX host as shown below.

With this rule in place, DPM will always choose solo.boche.mcse for standby mode, which is the ESXi host.  The ESX host, lando.boche.mcse, has been disabled for DPM and should always remained powered on and ready for action.

Configure VMware ESX(i) Round Robin on EMC Storage

February 4th, 2010

I recently set out to enable VMware ESX(i) 4 Round Robin load balancing with EMC Celerra (CLARiiON) fibre channel storage.  Before I get to the details of how I did it, let me preface this discussion with a bit about how I interpret Celerra storage architecture. 

The Celerra is built on CLARiiON fibre channel storage and as such, it leverages the benefits and successes CLARiiON has built over the years.  I believe most CLARiiON’s are, by default, active/passive arrays from VMware’s perspective.  Maybe more accurately stated, all controllers are active, however, each controller has sole ownership of a LUN or set of LUNs.  If a host wants access to a LUN, it is preferable to go through the owning controller (the preferred path).  Attempts to access a LUN through any other controller than the owning controller will result in a “Trespass” in EMC speak.  A Trespass is shift in LUN ownership from one controller to another in order to service an I/O request from a fabric host.  When I first saw Trespasses in Navisphere, I was alarmed.  I soon learned that they aren’t all that bad in moderation.  EMC reports that a Trespass occurs EXTREMELY quickly and in almost all cases will not cause problems.  However, as with any array which adopts the LUN ownership model, stacking up enough I/O requests which force a race condition between controllers for LUN access, will cause a condition known as thrashing.   Thrashing causes storage latency and queuing as controllers play tug of war for LUN access.  This is why it is important for ESX hosts, which share LUN access, to consistently access LUNs via the same controller path.  

As I said, the LUN ownership model above is the “out-of-box” configuration for the Celerra, also known as Failover Mode 1 in EMC Navisphere.  The LUN path going through the owning controller will be the Active path from a VMware perspective.  Other paths will be Standby.  This is true for both MRU and Fixed path selection policies.  What I needed to know was how to enable Round Robin path selection in VMware.  Choosing Round Robin in the vSphere Client is easy enough, however, there’s more to it than that because the Celerra is still operating in Failover Mode 1 where I/O can only go through the owning controller. 

So the first step in this process is to read the CLARiiON/VMware Applied Technology Guide which says I need to change the Failover Mode of the Celerra from 1 to 4 using Navisphere (FLARE release 28 version 04.28.000.5.704 or later may be required).  A value of 4 tells the CLARiiON to switch to the ALUA (Asymmetric Logical Unit Access or Active/Active) mode.  In this mode, the controller/LUN ownership model still exists, however, instead of transferring ownership of the LUN to the other controller with a Trespass, LUN access is allowed through the non-owning controller.  The I/O is passed by the non-owning controller to the owning controller via the backplane and then to the LUN.  In this configuration, both controllers are Active and can be used to access a LUN without causing ownership contention or thrashing.  It’s worth mentioning right now that although both controllers are active, the Celerra will report to ESX the owning controller as the optimal path, and the non-owning controller as the non-optimal path.  This information will be key a little later on.  Each ESX host needs to be configured for Failover Mode 4 in Navisphere.  The easiest way to do this is to run the Failover Setup Wizard.  Repeat the process for each ESX host.  One problem I ran into here is that after making the configuration change, each host and HBA still showed a Failover Mode of 1 in the Navisphere GUI.  It was as if the Failover Setup Wizard steps were not persisting.  I failed to accept this so I installed the Navisphere CLI and verified each host with the following command: 

naviseccli -h <SPA_IP_ADDRESS> port -list –all

Output showed that Failover Mode 4 was configured:

Information about each HBA:
HBA UID:                 20:00:00:00:C9:8F:C8:C4:10:00:00:00:C9:8F:C8:C4
Server Name:             lando.boche.mcse
Server IP Address:       192.168.110.5
HBA Model Description:
HBA Vendor Description:  VMware ESX 4.0.0
HBA Device Driver Name:
Information about each port of this HBA:�
    SP Name:               SP A
    SP Port ID:            2
    HBA Devicename:        naa.50060160c4602f4a50060160c4602f4a
    Trusted:               NO
    Logged In:             YES
    Source ID:             66560
    Defined:               YES
    Initiator Type:           3
    StorageGroup Name:     DL385_G2
    ArrayCommPath:         1
    Failover mode:         4
    Unit serial number:    Array

Unfortunately, the CLARiiON/VMware Applied Technology Guide didn’t give me the remaining information I needed to actually get ALUA and Round Robin working.  So I turned to social networking and my circle of VMware and EMC storage experts on Twitter.  They put me on to the fact that I needed to configure SATP for VMW_SATP_ALUA_CX, something I wasn’t familiar with yet. 

So the next step is a multistep procedure to configure the Pluggable Storage Architecture on the ESX hosts.  More specifically, SATP (Storage Array Type Plugin) and the PSP (Path Selection Plugin), in that order. Duncan Epping provides a good foundation for PSA which can be learned here.

Configuring the SATP tells the PSA what type of array we’re using, and more accurately, what failover mode the array is running.  In this case, I needed to configure the SATP for each LUN to VMW_SATP_ALUA_CX which is the EMC CLARiiON (CX series) running in ALUA mode (active/active failover mode 4).  The command to do this must be issued on each ESX host in the cluster for each active/active LUN and is as follows: 

#set SATP
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.50060160c4602f4a50060160c4602f4a
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.60060160ec242700be1a7ec7a208df11
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.60060160ec242700bf1a7ec7a208df11
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.60060160ec2427001cac9740a308df11
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.60060160ec2427001dac9740a308df11

The devices you see above can be found in the vSphere Client when looking at the HBA devices discovered.  You can also find devices with the following command on the ESX Service Console: 

esxcli nmp device list 

I found that changing the SATP requires a host reboot for the change to take effect (thank you Scott Lowe).  After the host is rebooted, the same command used above should reflect that the SATP has been set correctly: 

esxcli nmp device list 

Results in: 

naa.60060160ec2427001dac9740a308df11
    Device Display Name: DGC Fibre Channel Disk (naa.60060160ec2427001dac9740a308df11)
    Storage Array Type: VMW_SATP_ALUA_CX
    Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_ow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}}
    Path Selection Policy: VMW_PSP_FIXED
    Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPat=0,numBytesPending=0}
    Working Paths: vmhba1:C0:T0:L61 

Once the SATP is set, it is time to configure the PSP for each LUN to Round Robin.  You can do this via the vSphere Client, or you can issue the commands at the Service Console: 

#set PSP per device
esxcli nmp psp setconfig –config VMW_PSP_RR –device naa.60060160ec242700be1a7ec7a208df11
esxcli nmp psp setconfig –config VMW_PSP_RR –device naa.60060160ec242700bf1a7ec7a208df11
esxcli nmp psp setconfig –config VMW_PSP_RR –device naa.60060160ec2427001cac9740a308df11
esxcli nmp psp setconfig –config VMW_PSP_RR –device naa.60060160ec2427001dac9740a308df11 

#set PSP for device
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.50060160c4602f4a50060160c4602f4a
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.60060160ec242700be1a7ec7a208df11
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.60060160ec242700bf1a7ec7a208df11
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.60060160ec2427001cac9740a308df11
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.60060160ec2427001dac9740a308df11 

Once again, running the command: 

esxcli nmp device list 

Now results in: 

naa.60060160ec2427001dac9740a308df11
    Device Display Name: DGC Fibre Channel Disk (naa.60060160ec2427001dac9740a308df11)
    Storage Array Type: VMW_SATP_ALUA_CX
    Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_ow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}}
    Path Selection Policy: VMW_PSP_RR
    Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPat=0,numBytesPending=0}
    Working Paths: vmhba1:C0:T0:L61 

Notice the Path Selection Policy has now changed to Round Robin. 

I’m good to go, right?  Wrong.  I struggled with this last bit for a while.  Using ESXTOP and IOMETER, I could see that I/O was still only going down one path instead of two.  Then I remembered something Duncan Epping had said to me in an earlier conversation a few days ago.  He mentioned something about the array reporting optimal and non-optimal paths to the PSA.  I printed out a copy of the Storage Path and Storage Plugin Management with esxcli document from VMware and took it to lunch with me.  The answer was buried on page 88.  The nmp roundrobin setting useANO is configured by default to 0 which means unoptimized paths reported by the array will not be included in Round Robin path selection unless optimized paths become unavailable.  Remember I said early on that unoptimized and optimized paths reported by the array would be a key piece of information.  We can see this in action by looking at the device list above.  The very last line shows working paths, and only one path is listed for Round Robin use – the optimized path reported by the array.  The fix here is to issue the following command, again on each host for all LUNs in the configuration: 

#use non-optimal paths for Round Robin
esxcli nmp roundrobin setconfig –useANO 1 –device naa.50060160c4602f4a50060160c4602f4a
esxcli nmp roundrobin setconfig –useANO 1 –device naa.60060160ec242700be1a7ec7a208df11
esxcli nmp roundrobin setconfig –useANO 1 –device naa.60060160ec242700bf1a7ec7a208df11
esxcli nmp roundrobin setconfig –useANO 1 –device naa.60060160ec2427001cac9740a308df11
esxcli nmp roundrobin setconfig –useANO 1 –device naa.60060160ec2427001dac9740a308df11

Once again, running the command: 

esxcli nmp device list 

Now results in: 

naa.60060160ec2427001dac9740a308df11
    Device Display Name: DGC Fibre Channel Disk (naa.60060160ec2427001dac9740a308df11)
    Storage Array Type: VMW_SATP_ALUA_CX
    Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_support=on;explicit_allow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}
TPG_id=2,TPG_state=AO}}
    Path Selection Policy: VMW_PSP_RR
    Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=1;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
    Working Paths: vmhba0:C0:T0:L61, vmhba1:C0:T0:L61 

Notice the change in useANO which now reflects a value of 1.  In addition, I now have two Working Paths – an optimized path and an unoptimized path. 

I fired up ESXTOP and IOMETER which now showed a flurry of I/O traversing both paths.  I kid you not, it was a Clark Griswold moment when all the Christmas lights on the house finally worked.

So it took a while to figure this out but with some reading and the help of experts, I finally got it, and I was extremely jazzed.  What would have helped was if VMware’s PSA was more plug and play with various array types.  For instance, why can’t PSA recognize ALUA on the CLARiiON and automatically configure SATP for VMW_SATP_ALUA_CX?  Why is a reboot required for an SATP change?  PSA configuration in the vSphere client might have also been convenient but I recognize has diminishing returns or practical use with a large amount of hosts and/or LUNs to configure.  Scripting and CLI is the way to go for consistency and automation reasons or how about PSA configuration via Host Profiles? 

I felt a little betrayed and confused by the Navisphere GUI reflecting Failover Mode 1 after several attempts to change it to 4.  I was looking at host connectivity status. Was I looking in the wrong place? 

Lastly, end to end documentation on how to configure Round Robin would have helped a lot.  EMC got me part of the way there with the CLARiiON/VMware Applied Technology Guide document, but left me hanging, making no mention of the PSA configuration needed.  I’m getting that the end game for EMC multipathing today is PowerPath, which is fine – I’ll get to that, but I really wanted to do some testing with native Round Robin first, if for no other reason to establish a baseline to compare PowerPath to once I get there. 

Thanks again to the people I leaned on to help me through this.  It was the usual crew who can always be counted on.

Service Console Directory Listing Text Color in PuTTY

January 25th, 2010

Curious about the default colors you see in a remote PuTTY session connected to the ESX Service Console?  Some are obvious such as the directory listings which show up as blue text on a black background.  Another obvious one is the compressed .tar.gz file which will show up in a nicely contrasting red text on black background.  Or how about this one which I’m sure you’ve seen, executable scripts are shown as green text on a black background.  You might be asking yourself “What about the oddball ones I see from time to time which don’t have an explanation?”  I’ve provided an example in the screenshot – a folder named isos shows up with a green background and blue text.  What does that mean? 

There’s a way to find out.  While in the remote PuTTY session connected to the ESX Service Console, run the command dircolors -p from any directory.  Here’s the default legend:

# Below are the color init strings for the basic file types. A color init
# string consists of one or more of the following numeric codes:
# Attribute codes:
# 00=none 01=bold 04=underscore 05=blink 07=reverse 08=concealed
# Text color codes:
# 30=black 31=red 32=green 33=yellow 34=blue 35=magenta 36=cyan 37=white
# Background color codes:
# 40=black 41=red 42=green 43=yellow 44=blue 45=magenta 46=cyan 47=white
NORMAL 00 # global default, although everything should be something.
FILE 00 # normal file
DIR 01;34 # directory
LINK 01;36 # symbolic link. (If you set this to ‘target’ instead of a
 # numerical value, the color is as for the file pointed to.)
FIFO 40;33 # pipe
SOCK 01;35 # socket
DOOR 01;35 # door
BLK 40;33;01 # block device driver
CHR 40;33;01 # character device driver
ORPHAN 40;31;01 # symlink to nonexistent file
SETUID 37;41 # file that is setuid (u+s)
SETGID 30;43 # file that is setgid (g+s)
STICKY_OTHER_WRITABLE 30;42 # dir that is sticky and other-writable (+t,o+w)
OTHER_WRITABLE 34;42 # dir that is other-writable (o+w) and not sticky
STICKY 37;44 # dir with the sticky bit set (+t) and not other-writable
# This is for files with execute permission:
EXEC 01;32
# List any file extensions like ‘.gz’ or ‘.tar’ that you would like ls
# to colorize below. Put the extension, a space, and the color init string.
# (and any comments you want to add after a ‘#’)
# If you use DOS-style suffixes, you may want to uncomment the following:
#.cmd 01;32 # executables (bright green)
#.exe 01;32
#.com 01;32
#.btm 01;32
#.bat 01;32
.tar 01;31 # archives or compressed (bright red)
.tgz 01;31
.arj 01;31
.taz 01;31
.lzh 01;31
.zip 01;31
.z 01;31
.Z 01;31
.gz 01;31
.bz2 01;31
.deb 01;31
.rpm 01;31
.jar 01;31
# image formats
.jpg 01;35
.jpeg 01;35
.gif 01;35
.bmp 01;35
.pbm 01;35
.pgm 01;35
.ppm 01;35
.tga 01;35
.xbm 01;35
.xpm 01;35
.tif 01;35
.tiff 01;35
.png 01;35
.mov 01;35
.mpg 01;35
.mpeg 01;35
.avi 01;35
.fli 01;35
.gl 01;35
.dl 01;35
.xcf 01;35
.xwd 01;35
# audio formats
.flac 01;35
.mp3 01;35
.mpc 01;35
.ogg 01;35
.wav 01;35

 

Applied to the screenshot example above, the legend tells us that the isos directory is: OTHER_WRITABLE 34;42 # dir that is other-writable (o+w) and not sticky.

Another color you may commonly see which I haven’t yet mentioned is cyan which identifies symbolic links.  These can be found in several directories.  Most often you will see symbolic links in /vmfs/volumes/ connecting a friendly datastore name with it’s not so friendly volume name which is better known by the VMkernel.

That’s it. Not what I would considering Earth shattering material here, but maybe you’ve seen these colors before and haven’t connected the dots on their meaning.  For people with Linux background, this is probably old hat.

VMware VI3 Implementation and Administration

January 11th, 2010

I recently finished reading the book VMware VI3 Implementation and Administration by Eric Siebert (ISBN-13: 978-0-13-700703-5).  VMware VI3 Implementation and Administration was a very enjoyable read. I don’t mean to sound cliché but for me it was one of those books that is hard to put down. Released in May of 2009, along with the next generation of VMware IV (vSphere), the timing of its arrival to market probably could have been better, but better late than never. Datacenters will be running on VI3 for quite some time. With that in mind, this book provides a tremendous amount of value and insight. I can tell that Eric put a lot of time and research into this book; the quality of the content shows. Much of the book was review for me, but I was still able to pick up bits and pieces here and there I wasn’t aware of, as well as some fresh perspective and new approaches to design, administration, and support.

To be honest and objective, I felt that Chapter 9, “Backing Up Your Virtual Environment”, lacked the completeness which all other chapters were given. A single page was dedicated to VMware Consolidated Backup with no detailed examples or demonstrations of how to use it, which would have been found throughout other chapters. To add, there was only a few sentences covering Replication which is a required component in many environments. Eric likes to discuss 3rd party solutions and this would have been a great opportunity to go into more detail or at least mention some products affordable to businesses of any size which could leverage replication solutions.

Overall, this is a great book. Eric has a no-nonsense writing style backed by decades of in the trench experience. Along with the print copy, you get a free electronic online edition as well allowing you to access the book anywhere where there is internet connectivity.  Pick up your copy today!  I thank you Eric and look forward to your upcoming vSphere book!

ESX 3.5.0 Update 5 Change in Serivce Console Memory

December 30th, 2009

You may know that the Red Hat Enterprise Linux 3 Update 6 based ESX 3.x Service Console default memory allocation has been 272MB since its first release.  VMware Infrastructure 3 Advanced Technical Design Guide authors Ron Oglesby and Scott Herold discuss in their book about how Service Console memory requirements in ESX 3.x have become less of a factor in 3.x compared to 2.x since the Service Console has been stripped of some of its resonsibilities including VMM and hardware management.  They go so far as to say the default value of 272MB should be enough memory for most environments. I generally accept this theory, but for the record I have been on plenty of support calls where VMware recommends increasing Service Console memory to its maximum value of 800MB.  Many subscribe to maxing out Service Console memory as a best practice to avoid problems down the road and if nothing else, avoid a reboot for the memory change or rebuild to resize Service Console swap.  Service Console memory utilization will vary between environments and will influenced by 3rd party software which is installed in the Service Console such as anti-virus, hardware agents, backup agents, etc.  The number of vSwitch ports will also impact Service Console memory use.

Left to their own discretion, many have established their own build standards with respect to Service Console memory allocation.  Some will increase it.  Some will leave it at the factory default of 272MB.  I haven’t heard of anyone reducing Service Console memory usage but it can be lowered slightly down to 256MB.  Whatever you decide, be sure you adjust your Service Console swap accordingly.  While we’re on the subject, the assignable range of Service Console memory in ESX 4.0 is the same as 3.x (256MB – 800MB), however, the default Service Console memory assignment in ESX 4.0 is 400MB whereas it is 272MB in ESX 3.x.

While working in the lab on my VCDX design, I discovered that VMware has increased the default Service Conosle memory assignment to 512MB as of ESX 3.5.0 Update 5.  For those who configure and tune their ESX hosts manually, this is a non issue for you.  Continue to manually configure your ESX hosts.  Those with automated post build scripts using sed to change Service Console memory allocation, you’ve got a few changes to make to your scripts.  Basically, whereas sed used to look for 272MB values to replace, it must now search for 512MB values.  For example:

An ESX 3.5.0u4 post build script which increases COS memory from 272MB to 800MB:

cp /etc/vmware/esx.conf /etc/vmware/esx.conf.old
cp /boot/grub/grub.conf /boot/grub/grub.conf.old
/bin/sed -i -e ’s/272/800/’ /etc/vmware/esx.conf
/bin/sed -i -e ’s/272M/800M/’ /boot/grub/grub.conf
/bin/sed -i -e ’s/277504/818176/’ /boot/grub/grub.conf

Will become an ESX 3.5.0u5 post build script which increases COS memory from 512MB to 800MB:

cp /etc/vmware/esx.conf /etc/vmware/esx.conf.old
cp /boot/grub/grub.conf /boot/grub/grub.conf.old
/bin/sed -i -e ’s/512/800/’ /etc/vmware/esx.conf
/bin/sed -i -e ’s/512M/800M/’ /boot/grub/grub.conf
/bin/sed -i -e ’s/523264/818176/’ /boot/grub/grub.conf

 If you’ve got a mix of 3.5u4 and 3.5u5 hosts and you wish to use the same centralized post configuration script on each, the following script should cover both:

cp /etc/vmware/esx.conf /etc/vmware/esx.conf.old
cp /boot/grub/grub.conf /boot/grub/grub.conf.old
/bin/sed -i -e ’s/272/800/’ /etc/vmware/esx.conf
/bin/sed -i -e ’s/512/800/’ /etc/vmware/esx.conf
/bin/sed -i -e ’s/272M/800M/’ /boot/grub/grub.conf
/bin/sed -i -e ’s/512M/800M/’ /boot/grub/grub.conf
/bin/sed -i -e ’s/277504/818176/’ /boot/grub/grub.conf
/bin/sed -i -e ’s/523264/818176/’ /boot/grub/grub.conf

I looked in the ESX 3.5.0 Update 5 release notes and did not see anything about a Service Console memory allocation increase.  My hunch is VMware realized they can reduce their volume of support calls and ultimately increase support revenue margin by granting the Service Console more memory out of the box.

The lab has once again fulfilled its purpose.  You test releases in your lab or development environment, right?  Right?  :-)

Happy Holidays

VMware Releases ESX(i) 3.5 Update 5; Critical Updates

December 5th, 2009

VMware apparently released ESX(i) 3.5 Update 5 dated 12/3/09, however it became available on Update Manager late this afternoon.  VMware is extremely poor at communicating anything but major releases, so to get the fastest notification possible about security patches and updates, I configure my VMware Update Manager servers to check for updates every 6 hours and provide me with email notification of anything it finds.  VMware doesn’t listen to me much when it comes to feature requests so I’ll shelve the ranting.

So what’s new in ESX 3.5 Update 5?  The major highlights are guest VM support for Windows 7 and Windows Server 2008 R2 (reminder, 64-bit only), as well as Ubuntu 9.04, and added hardware support for processors and NICs.  Before you get too excited about Windows 7, remember that it is not a supported guest operating system in VMware View.  Even in the new View 4 release, Windows 7 has “Technology Preview” support status only.

If you track the updates from VMware Update Manager, the 12/3 releases amount to 20 updates including Update 5, 16 updates of which are rated critical.  If you’re still a ways out on vSphere deployment, you’ll probably want to take a look at the critical updates for your 3.x environment.

Enablement of Intel Xeon Processor 3400 Series – Support for the Intel Xeon processor 3400 series has been added. Support includes Enhanced VMotion capabilities. For additional information on previous processor families supported by Enhanced VMotion, see Enhanced VMotion Compatibility (EVC) processor support (KB 1003212).

Driver Update for Broadcom bnx2 Network Controller – The driver for bnx2 controllers has been upgraded to version 1.6.9. This driver supports bootcode upgrade on bnx2 chipsets and requires bmapilnx and lnxfwnx2 tools upgrade from Broadcom. This driver also adds support for Network Controller – Sideband Interface (NC-SI) for SOL (serial over LAN) applicable to Broadcom NetXtreme 5709 and 5716 chipsets.

Driver Update for LSI SCSI and SAS Controllers – The driver for LSI SCSI and SAS controllers is updated to version 2.06.74. This version of the driver is required to provide a better support for shared SAS environments.

Newly Supported Guest Operating Systems – Support for the following guest operating systems has been added specifically for this release:

For more complete information about supported guests included in this release, see the VMware Compatibility Guide: http://www.vmware.com/resources/compatibility/search.php?deviceCategory=software.

•Windows 7 Enterprise (32-bit and 64-bit)
•Windows 7 Ultimate (32-bit and 64-bit)
•Windows 7 Professional (32-bit and 64-bit)
•Windows 7 Home Premium (32-bit and 64-bit)
•Windows 2008 R2 Standard Edition (64-bit)
•Windows 2008 R2 Enterprise Edition (64-bit)
•Windows 2008 R2 Datacenter Edition (64-bit)
•Windows 2008 R2 Web Server (64-bit)
•Ubuntu Desktop 9.04 (32-bit and 64-bit)
•Ubuntu Server 9.04 (32-bit and 64-bit)

Newly Supported Management Agents – See VMware ESX Server Supported Hardware Lifecycle Management Agents for current information on supported management agents.

Newly Supported Network Cards – This release of ESX Server supports HP NC375T (NetXen) PCI Express Quad Port Gigabit Server Adapter.

Newly Supported SATA Controllers – This release of ESX Server supports the Intel Ibex Peak SATA AHCI controller.

Note:

•Some limitations apply in terms of support for SATA controllers. For more information, see SATA Controller Support in ESX 3.5. (KB 1008673)

•Storing VMFS datastores on native SATA drives is not supported.

vSphere 4 Update 1 Released

November 19th, 2009

While the Dutch and UK bloggers sleep, VMware has released Update 1 for ESXi 4, ESX 4, and vCenter 4.

Notable improvements include:

  • View 4 support
  • Windows 7 and Windows Server 2008 R2 support (Support for Windows 7 was timed nicely for View 4)
  • DB2 database back end support (Who requested this? Sound off, I’d like to hear your comments)
  • HA cluster hosts can now support 160 VMs each in a cluster of 8 hosts or less. 9 or more hosts in an HA cluster are still limited to 40 VMs per host (This still bugs me a little. Anyone loading up a host with near 160 or more VMs?)
  • Paravirtualized SCSI support has been extended to Windows 2003 and 2008 boot drives
  • vDS performance improvement (Important if creating VMkernel portgroups on vDS for NAS storage)
  • vCPUs per core limit increased from 20 to 25 (“Lorraine – You are my density…”)
  • Intel Xeon 3400 series CPU support
  • Improved support for Microsoft Clustering (Yes, more changes to the wonderful MS clustering document)
  • Rumors of comma separators in report/graph numbers (This could be my favorite new feature)
  • Many resolved issues

Last but not least, we can still reboot our ESX 4 hosts with CTRL + ALT + DEL at the console :)

Tame Electrical and Heating Costs with CPU Power Management

November 11th, 2009

A casual Twitter tweet about my power savings through the use of VMware Distributed Power Management (DPM) found its way to VMware Senior Product Manager for DPM, Ulana Legedza, and Andrei Dorofeev. Ulana was interested in learning more about my situation. I explained how VMware DPM had evaluated workloads between two clustered vSphere hosts in my home lab, and proceeded to shut down one of the hosts for most of the month of October, saving me more than $50 on my energy bill.

Ulana and Andrei took the conversation to the next level and asked me if I was using vSphere’s Advanced CPU Power Management feature (See vSphere Resource Management Guide page 22). I was not, in fact I was unaware of its existence. Power Management is a new feature in ESX(i)4 available to processors supporting Enhanced Intel SpeedStep or Enhanced AMD PowerNow! power management technologies. To quote the .PDF article:

“To improve CPU power efficiency, you can configure your ESX/ESXi hosts to dynamically switch CPU frequencies based on workload demands. This type of power management is called Dynamic Voltage and Frequency Scaling (DVFS). It uses processor performance states (P-states) made available to the VMkernel through an ACPI interface.”

A quick look at the Quad Core AMD Opteron 2356 processors in my HP DL385 G2 showed they support Enhanced AMD PowerNow! Power Management Technology:

There are two steps to enabling this power management feature. The first step is to ensure it is enabled in the server BIOS. On an HP DL385 G2, CPU power management is enabled by default. In this particular server model, it is configured via the BIOS by hitting <F9> at the end of the POST (would require a reboot obviously)

A slightly easier method might be to verify and/or configure the policy through HP’s out of band (OOB) iLO 2, however, a reboot will be requested by the iLO 2 for a policy change to take effect. On an HP server, configure for OS Control mode, but again, this appears to be the default for the HP DL385 G2 so hopefully no reboot is required for you to implement this power saving measure in your environment:

After enabling power management in the BIOS, the second step is to modify the Power Management Policy on each ESX(i) host from the default of static to dynamic. The definitions of these two settings can be found in the .PDF linked above and are as follows:

static – The default. The VMkernel can detect power management features available on the host but does not actively use them unless requested by the BIOS for power capping or thermal events.

dynamic – The VMkernel optimizes each CPU’s frequency to match demand in order to improve power efficiency but not affect performance. When CPU demand increases, this policy setting ensures that CPU frequencies also increase.

You might be asking yourself by this point “Ok, this is nice, but what’s the trade off?” Note the wording in the dynamic definition above “improves power efficiency but does not affect performance”. This is a win/win configuration change!

This step can be performed one of a few ways on each host (again, no reboot required for this change):

  1. Using the vSphere Client, change the Advanced host setting Power.CpuPolicy from static to dynamic
  2. Scriptable: Via the ESX service console, PuTTY, or script, issue the command esxcfg-advcfg -s dynamic /Power/CpuPolicy

The impact on my home lab was quite visible. After 12 hours, the blue area in the following 24 hour graph reflects average electrical consumption was reduced from an average 337 Watts down to 292 Watts. All things being equal and CPU loads balanced by DRS, that’s a reduction in energy consumption of over 13% per host:

An alternate graph shows Btu output dropped from 1,135 Btu to about 1,000 Btu. All things being equal, a reduction of about 135 Btu per host:

A Btu is heat – explained more at wiseGEEK’s What is a Btu? Heat is a byproduct of technology in the datacenter and in most cases is viewed as overhead expense because it requires cooling (additional costs) to maintain optimal operating conditions for the equipment running in the environment. If we can eliminate heat, we eliminate the associated cost of removing the heat. This is known as cost avoidance.

Eliminating heat is as much of an interest to me as reducing my energy bill. The excessive heat generated in the basement eventually finds its way upstairs causing the rest of the house to be a little uncomfortable. The air conditioner in my home wasn’t manufactured to handle the excessive heat. Now, I live in the midwest where we have some frigid winters. Heat in the home is welcomed during the winter months. I could turn off CPU Power Management raising the Btu index as well as my energy bill, in favor of reducing my natural gas heating bill. I don’t know which is more expensive. This could be a great experiment for the January/February time frame.

In summary, we can attack operating costs from two sides by using VMware CPU Power Management:

  1. Reduction in excess electricity used by idle CPU cycles
  2. Reduction in cooling costs by reducing Btu output

I’m excited to see what next month’s energy bill looks like.

Update 11-17-09:  I was just made aware that Simon Seagrave wrote an earlier article on CPU power management here.  Sorry Simon, I was unaware of your article and I did not intentionally copy your topic.  Your article covered the topic well.  I hope we’re still friends :)

VMware ESX Guest OS I/O Timeout Settings (for NetApp Storage Systems)

October 29th, 2009

You may already be aware that installing VMware Tools in a Windows VM configures a registry value which controls the I/O timeout for all Windows disk in the event of a short storage outage. This is to help the guest operating system survive high latency or temporary outage conditions such as SAN path failover or maybe a network failure in Ethernet based storage.  VMware Tools changes the Windows default value of 10 seconds for non-cluster nodes, 20 seconds for cluster nodes, to 60 seconds (or x03c hex).

Did you know that disk I/O timeout is a configurable parameter in other guest operating systems as well? And why not, it makes sense that we would want every guest OS to be able to outlast a storage deficiency.

NetApp offers a document titled VMware ESX Guest OS I/O Timeout Settings for NetApp Storage Systems. It’s published as kb41511 and you’ll need a free NetApp NOW account to access the document. This white paper serves a few useful purposes:

  • Defines recommended disk I/O timeout settings for various guest operating systems on NetApp storage systems
  • Defines benchmark disk I/O timeout settings for various guest operating systems which could be used on any storage system, including local SCSI
  • In some cases provides scripts to make the necessary changes
  • Explains the methods to make the disk I/O timeout changes on the following guest operating systems:
    • RHEL4
    • RHEL5
    • SLES9
    • SLES10
    • Solaris 10
    • Windows

Now on the subject disk I/O timeouts, understand the above is to be used as chance for extending the uptime of a VM during adverse storage conditions. As in life, there are no guarantees. A guest OS with high disk I/O activity may not be able to tolerate sustained read and/or write requests for the duration of the timeout value. Windows guests may freeze or BSOD. Linux guests may go read-only on their root volumes which requires a reboot. Which brings me to the next point…

A larger timeout value isn’t necessarily better. In extending disk I/O timeout values, we’re applying virtual duct tape to an underlying storage issue which needs further looking into. Given the complex and wide variety of shared storage systems available to the datacenter today, storage issues can be caused by many variables including but not limited to disks (spindles), target controllers, fabric components such as fibre cables, SFP/GBICs, HBAs, fabric switches, zoning, network components such as copper cabling, NICs, network switches, routers, and firewalls. Also keep in mind that while the OS may survive the disk I/O interruption, application(s) running on the OS platform may not.  Applications themselves implement response timeout values which are likely going to be hard coded and non-configurable by a platform or virtualization administrator in the application itself.

Lastly, try to remember that if you go through the effort of increasing your disk I/O timeout values on Windows guests beyond 60 seconds, future installation of VMware Tools will reset the disk I/O timeout back to 60 seconds.  What this means is that in medium to large environments, you’re going to need an automated method to deploy custom disk I/O timeout values at least for Windows guests.  For those with NetApp storage, NetApp pushes these standards firmly, along with other VMware best practices which I’ll save for a future blog article.

TrainSignal vSphere Training DVD 1 Completed

October 23rd, 2009

This evening I finished viewing the first of three TrainSignal vSphere Training DVDs authored by VCP and CCIE David Davis. Having viewed TrainSignal’s last VMware Virtual Infrastructure training on VI3, I knew I was in for some good stuff.

DVD 1 starts off with introductions to the video’s instructor as well as a hypothetical company which is used as a focus and discussion point throughout the video series. Practical application of technologies to a role played scenario, the Wired Brain Coffee Company in this case, serves as positive reinforcement to the lessons being taught and is an effective method for knowledge retention, especially if the student is following along and working hands on in their own lab through the examples.

The video then sets a beginner’s pace as it covers VMware certification, virtualization basics. Moving on, it compares and contrasts VMware, Microsoft, and Citrix hypervisors. Beyond this comparison, the focus from here on out is on VMware products where a closer look is taken at the different components and tiers of vSphere.

Half way through the DVD, we’re finally to the point where we’re installing and configuring the vSphere products. One valuable offering from the video is a lesson describing the steps needed to install ESX and ESXi in VMware Workstation. This is what is called a nested hypervisor – an ESX(i) type 1 (bare metal) hypervisor running on top of a VMware Workstation type 2 (hosted) hypervisor. Nested hypervisors are not supported in production environments but they are quite helpful in lab, test, and portable environments.

Towards the end, lesson 17 provides a nice demonstration of a VMware Tools installation in a Linux guest operating system which isn’t nearly as straight forward as a VMware Tools on Windows installation. The last two lessons begin touching on some of the new advanced features that vSphere offers: Hot Add/Hot Plug virtual hardware and Host Profiles.

Thus far my feeling is this training is geared towards the beginner to intermediate level. I’m looking forward to DVD 2 where the instructor dives into more of the advanced design, configuration, and operational topics of VMware vSphere. I’ve attended VMware’s vSphere What’s New (2 day) and VMware’s vSphere Quick Start (5 day) classes. With approximately 150 new features making their debut in vSphere, I’ve yet to see anyone cover them all – that would be a tall order.

DVD 1 Lessons:

  1. Meet Your Instructor
  2. Our Scenario with the Wired Brain Coffee Company
  3. VMware Certification – Preparing for the VCP and VCDX
  4. Introduction to Virtualization
  5. Virtualization Products Compared
  6. VMware ESXi4 Free Edition for the SMB
  7. VMware vSphere 4 and ESX Essentials
  8. vSphere Management Options
  9. Installing the VMware vSphere Client
  10. Navigating vSphere Using the vSphere Client
  11. Running VMware ESX 4 in Workstation
  12. Installing VMware ESX 4
  13. Installing VMware ESXi Version 4
  14. Installing VMware vCenter 4
  15. vCenter4 – Configuring Your New Virtual Infrastructure
  16. Creating & Modifying Virtual Guest Machines
  17. Installing and Configuring VMware Tools
  18. Adding Virtual Machine Hardware with vSphere Hot Plug
  19. Using vSphere Host Profiles

8 New ESX 3.5.0 Patches Released; 3 Critical

October 16th, 2009

Eight new patches have been released for ESX 3.5.0. Other versions of ESX, including vSphere and ESXi, are not impacted.

3 of the 8 patches are rated critical and should be evaluated quickly for application in your virtual infrastructure.

ID: ESX350-200910401-SG Impact: HostSecurity Release date: 2009-10-16 Products: esx 3.5.0 Updates VMkernel, Tools, hostd

This patch contains the following fixes and enhancements:

This patch updates the service console kernel version to kernel-2.4.21-58.EL. The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the names CVE-2008-4210, CVE-2008-3275, CVE-2008-0598, CVE-2008-2136, CVE-2008-2812, CVE-2007-6063, and CVE-2008-3525 to the security issues fixed in kernel-2.4.21-58.EL.

This patch reduces the boot time of ESX hosts and should be applied when multiple ESX hosts detect LUNs used for Microsoft Cluster Service (MSCS).

Symptom: Error messages similar to the following might be logged in the /var/log/vmkernel log file of the service console:

Jul 24 14:34:24 VMEX3EQCH1100003 vmkernel: 165:15:48:57.500 cpu0:1033)WARNING: SCSI: 5519: Failing I/O due to too many reservation conflicts

Jul 24 14:34:24 VMEX3EQCH1100003 vmkernel: 165:15:48:57.500 cpu0:1033)WARNING: SCSI: 5615: status SCSI reservation conflict, rstatus 0xc0de01 for vmhba1:0:9. residual R 919, CR 0, ER 3

Jul 24 14:34:24 VMEX3EQCH1100003 vmkernel: 165:15:48:57.500 cpu0:1033)SCSI: 6608: Partition table read from device vmhba1:0:9 failed: SCSI reservation conflict (0xbad0022)

Any additional lines or customizations added by a user in the /etc/fstab file are deleted when VMware Tools is reinstalled or reconfigured. This issue occurs because when uninstalling, VMware Tools restores the files which were backed up during installation.

After applying this patch, any request for connection with ESX 3.5 using cipher suite of 56-bit encryption will be dropped. As a result, browsers that exclusively use cipher suites with 40-bit and 56-bit encryption cannot connect to ESX 3.5. Microsoft has made the Internet Explorer High Encryption Pack available for Internet Explorer 5.01 and earlier. Internet Explorer 5.5 and higher versions already use 128-bit encryption.

This patch contains a fix for a security vulnerability in the ISC third-party DHCP client. This vulnerability allows for code execution in the client by a remote DHCP server through a specially crafted subnet-mask option. The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the name CVE-2009-0692 to this issue.

ID: ESX350-200910402-BG Impact: Critical Release date: 2009-10-16 Products: esx 3.5.0 Updates ESX Scripts

This patch is required to be installed with ESX350-200910401-SG (KB 1013124) to resolve a boot-time-related issue. The patch reduces the boot time of ESX hosts and should be applied when multiple ESX hosts detect LUNs used for Microsoft Cluster Service (MSCS).

ID: ESX350-200910403-SG Impact: HostSecurity Release date: 2009-10-16 Products: esx 3.5.0 Updates Web Access

This patch updates the following:

WebAccess component Tomcat server to 5.5.27. This update addresses multiple security issues that exist in the earlier releases of the Tomcat server.

The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the names CVE-2008-1232, CVE-2008-1947, and CVE-2008-2370 to the issues addressed by Tomcat 5.5.27. For more information on these security vulnerabilities, refer to the Apache Tomcat 5.x Vulnerabilities page at http://tomcat.apache.org/security-5.html.

WebAccess component JRE to 1.5.0_18. This update addresses multiple security issues that existed in the previous versions of JRE.

The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the following names to the security issues fixed in JRE 1.5.0_17:

CVE-2008-2086, CVE-2008-5347, CVE-2008-5348, CVE-2008-5349, CVE-2008-5350, CVE-2008-5351, CVE-2008-5352, CVE-2008-5353, CVE-2008-5354, CVE-2008-5356, CVE-2008-5357, CVE-2008-5358, CVE-2008-5359, CVE-2008-5360, CVE-2008-5339, CVE-2008-5342, CVE-2008-5344, CVE-2008-5345, CVE-2008-5346, CVE-2008-5340, CVE-2008-5341, CVE-2008-5343, and CVE-2008-5355.

The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the following names to the security issues fixed in JRE 1.5.0_18:

CVE-2009-1093, CVE-2009-1094, CVE-2009-1095, CVE-2009-1096, CVE-2009-1097, CVE-2009-1098, CVE-2009-1099, CVE-2009-1100, CVE-2009-1101, CVE-2009-1102, CVE-2009-1103, CVE-2009-1104, CVE-2009-1105, CVE-2009-1106, and CVE-2009-1107.

ID: ESX350-200910404-SG Impact: HostSecurity Release date: 2009-10-16 Products: esx 3.5.0 Updates cim

After applying this patch, any request for connection to CIM port 5989 on ESX 3.5 using cipher suite of 56-bit encryption will be dropped.

ID: ESX350-200910405-SG Impact: HostSecurity Release date: 2009-10-16 Products: esx 3.5.0 Updates mptscsi drivers

This patch updates the mptscsi driver to a version that is compatible with the service console version kernel-2.4.21-58.EL.

ID: ESX350-200910406-SG Impact: HostSecurity Release date: 2009-10-16 Products: esx 3.5.0 Updates Service Console DHCP Client

The service console package dhclient has been updated to version dhclient-3.0.1-10.2. This fixes a stack buffer overflow flaw in the ISC DHCP client and a flaw in the way the DHCP daemon init script handles temporary files. The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the names CVE-2009-0692 and CVE-2009-1893 to these issues.

ID: ESX350-200910408-BG Impact: Critical Release date: 2009-10-16 Products: esx 3.5.0 Updates VMkernel iSCSI driver

When ESX 3.5 hosts are connected to Adaptec Snap Server series or Dell NX series of NAS appliances through the ESX software iSCSI initiator, sometimes the iSCSI LUNs are not detected by the ESX 3.5 hosts. The issue is caused due to the way the software iSCSI driver detects an overflow condition. This patch fixes the issue.

ID: ESX350-200910409-BG Impact: Critical Release date: 2009-10-16 Products: esx 3.5.0 Updates Emulex FC driver

ESX 3.5 Update 4 hosts with Emulex HBAs might stop responding when accessed through vCenter Server. This Emulex driver patch fixes the issue.

Symptom: On ESX hosts, any application making an ioctl call in to the Emulex driver might fail.

Virtualizing vCenter With vDS Catch-22

October 9th, 2009

I’ve typically been a fan of virtualizing the vCenter management server in most situations. VMware vCenter and Update Manager both make fine virtualization candidates as long as the underlying infrastructure for vCenter stays up. Loss of vCenter in a blackout situation can make things a bit of a hassle, but one can work through it with the right combination of patience and knowledge.

A few nights ago I had decided to migrate my vCenter VM to my vSphere virtual infrastructure. Because my vCenter VM was on a standalone VMware Server 2.0 box, I had to shut down the vCenter VM in order to cold migrate it to one of the ESX4 hosts directly, transfer the files to the SAN, upgrade virtual hardware, etc. Once the files were migrated to the vSphere infrastructure, it was time to configure the VM for the correct network and power it up. This is where I ran into the problem.

vCenter was shut down and unavailable, therefore, I had connected my vSphere client directly to the ESX4 host in which I transferred the VM to. When trying to configure the vCenter VM to use the vNetwork Distributed Switch (vDS) port group I had set up for all VM traffic, it was unavailable in the dropdown list of networks. The vCenter server was powered down and thus the vDS Control Plane was unavailable, eliminating my view of vDS networks.

This is a dilemma. Without a network connection, the vCenter server will not be able to communicate with the back end SQL database on a different box running SQL. This will cause the vCenter server services to not start and thus I’ll never have visibility to the vDS. Fortunately I have a fairly flat network in the lab with just a few subnets. I was able to create a temporary vSwitch and port group locally on the ESX4 host which would grant the vCenter VM the network connectivity it needed so I could then modify the network, changing from a local to a vDS port group on the fly.

Once the vCenter server was back up, I further realized that vDS port groups are still unable to be seen when the vSphere client is connected directly to an ESX4 host. The ability configure a VM to utilize vDS networking requires both that the vCenter server be functional, as well as a vSphere client connected to said vCenter server and not a managed host.

The situation I explained above is the catch-22 – the temporary inability to configure VMs for vDS networking while the vCenter server is unavailable. One might call my situation a convergence of circumstances, but with an existing virtualized vCenter server that you’re looking to migrate to a vDS integrated vSphere infrastructure, the scenario is very real. I’d like to note all VMs that had been running on a vDS port continued to run without a network outage as the vDS Data Plane is maintained on each host and remained in tact.

8 New ESX(i) 4.0 Patches Released; 7 Critical

September 25th, 2009

Eight new patches have been released for ESX(i) 4.0 (6 for ESX, 2 for ESXi).  Previous versions of ESX(i) are not impacted.

7 of the 8 patches are rated critical and should be evaluated quickly for application in your virtual infrastructure.

ID: ESX400-200909401-BG Impact: Critical Release date: 2009-09-24 Products: esx 4.0.0
Updates vmx and vmkernel64
This patch fixes some key issues such as:
* Guest operating system shows high memory usage on Nehalem based systems, which might trigger memory alarms in vCenter.
* monitor or vmkernel fails when running certain guest operating systems with a 32-bit monitor running in binary translation mode.

See http://kb.vmware.com/kb/1014019 for details

NOTE: Cisco Nexus 1000v customers using VMware Update Manager to patch ESX 4.0 should add an additional patch download URL as described in KB 1013134

ID: ESX400-200909402-BG Impact: Critical Release date: 2009-09-24 Products: esx 4.0.0 Updates VMware Tools
This patch includes the following fixes
* Updated VMware SVGA and mouse device drivers for supported Linux guest operating systems that use Xorg 7.5.
* PBMs for Debian 5.0.1.
* PBMs for SUSE Linux Enterprise 11 VMI kernel.

See http://kb.vmware.com/kb/1014020 for details

NOTE: Cisco Nexus 1000v customers using VMware Update Manager to patch ESX 4.0 should add an additional patch download URL as described in KB 1013134

ID: ESX400-200909403-BG Impact: Critical Release date: 2009-09-24 Products: esx 4.0.0 Updates bnx2x
This patch fixes the following issues:
* Virtual machines experience a network outage when they run with older versions of VMware Tools (ESX 3.0.x)
* A network outage is experienced if the MTU value is changed on a Broadcom Netxtreme II 10gig NIC.
* unloading the driver causes a host reboot.

See http://kb.vmware.com/kb/1014021 for details

NOTE: Cisco Nexus 1000v customers using VMware Update Manager to patch ESX 4.0 should add an additional patch download URL as described in KB 1013134

ID: ESX400-200909404-BG Impact: Critical Release date: 2009-09-24 Products: esx 4.0.0 Updates ixgbe
This patch fixes the following issue:
* A vSphere ESX Host that has NIC teaming configured with the ixgbe driver for the physical NICs might fail if one of the physical NICs goes down.

See http://kb.vmware.com/kb/1014022 for more details

NOTE: Cisco Nexus 1000v customers using VMware Update Manager to patch ESX 4.0 should add an additional patch download URL as described in KB 1013134

ID: ESX400-200909405-BG Impact: HostGeneral Release date: 2009-09-24 Products: esx 4.0.0 Updates perftools
This patch fixes the following issue:
* esxtop utility might quit with the error message “VMEsxtop_GrpStatsInit() failed” when attempting to monitor network status on ESX.

See http://kb.vmware.com/kb/1014023 for more details

NOTE: Cisco Nexus 1000v customers using VMware Update Manager to patch ESX 4.0 should add an additional patch download URL as described in KB 1013134

ID: ESX400-200909406-BG Impact: Critical Release date: 2009-09-24 Products: esx 4.0.0 Updates hpsa
This patch fixes the following issue:
* A virtual machine might fail after the Storage Port controller is reset on ESX hosts that have the HPSA driver connected to an SAS array.
* Hosts cannot detect more than 2 HPSA controllers due to the limited driver heap size.

See http://kb.vmware.com/kb/1014024 for more details

NOTE: Cisco Nexus 1000v customers using VMware Update Manager to patch ESX 4.0 should add an additional patch download URL as described in KB 1013134

ID: ESXi400-200909401-BG Impact: Critical Release date: 2009-09-24 Products: embeddedEsx 4.0.0 Updates Firmware
This patch fixes some key issues such as:
* Guest operating system shows high memory usage on Nehalem based systems, which might trigger memory alarms in vCenter.
* monitor or vmkernel fails when running certain guest operating systems with a 32-bit monitor running in binary translation mode.
See http://kb.vmware.com/kb/1014026 for details

NOTE: Cisco Nexus 1000v customers using VMware Update Manager to patch ESXi 4.0 should add an additional patch download URL as described in KB 1013134

ID: ESXi400-200909402-BG Impact: Critical Release date: 2009-09-24 Products: embeddedEsx 4.0.0 Updates Tools
This patch includes the following fixes
* Updated VMware SVGA and mouse device drivers for supported Linux guest operating systems that use Xorg 7.5.
* PBMs for Debian 5.0.1.
* PBMs for SUSE Linux Enterprise 11 VMI kernel.

See http://kb.vmware.com/kb/1014027 for details

NOTE: Cisco Nexus 1000v customers using VMware Update Manager to patch ESXi 4.0 should add an additional patch download URL as described in KB 1013134