Posts Tagged ‘VMware’

vpxd.cfg Advanced Configuration

March 13th, 2010

vpxd.cfg is an XML formatted file which can be modified to alter the native behavior of the VMware vCenter Server.  Sparse references on the internet document the changes that can be made in this environment.  Inspired by Ulli Hankeln, the purpose of this blog post is to collect and document all known, unknown, supported, and unsupported vpxd.cfg modifications in a centralized location. 

If you have any to add, please provide feedback in the form of a blog comment along with a link pointing to a reference and I’ll update the post.

**Disclaimer**
As with anything found on this site and much of the internet in general, information is provided “as is” without warranty.  Modify settings at your own risk.  I suggest thoroughly researching the changes first and also checking with VMware Support.

The vpxd.cfg file is located by default at %ALLUSERPROFILE%\Application Data\VMware\VMware VirtualCenter\vpxd.cfg

  • On Windows Server 2008, this would generally be C:\ProgramData\VMware\VMware VirtualCenter\vpxd.cfg
  • On Windows Server 2003, this would generally be C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg

This collection of vpxd.cfg settings has been sourced from various places.  The parameters will generally apply to a version of vCenter Server ranging from 2.0 through 4.x.  A given parameter can apply to several or even all versions.  However, one thing I didn’t do was specify which version of vCenter Server the parameter applies to – too much work – sorry – you’ll have to experiment in your lab or DEV environment.  I do think it’s safe to say that most of these parameters focus on the latest releases of vCenter Server – 2.5 and 4.0.

Remember to restart the VMware VirtualCenter Server service in the Server Manager for changes to vpxd.cfg to take effect.

Tag:  blockingTimeoutSeconds
Nested In:  vmomi, soapStubAdapter
What It Does:  Defines the timeout value in seconds for SOAP layer blocking.  Use cases for increasing: slow connections, low bandwidth, or high latency between virtual infrastructure components.  Read more here and here.
Example:

<vmomi>
<soapStubAdapter>
<blockingTimeoutSeconds>10800</blockingTimeoutSeconds>
</soapStubAdapter>
</vmomi>

Tag:  calls
Nested In:  trace, vmomi
What It Does:  Unknown.  Read more here.
Example:

<trace>
<vmomi>
<calls>true</calls>
</vmomi>
</trace>

Tag:  cipherList
Nested In:  vmacore, ssl
What It Does:  Reverts to cipher suites used in previous versions of vCenter Server (2.5u3 and earlier) for browser/SSL compatibility issues.  Read more here.
Example:

<vmacore>
<ssl>
<cipherList>DEFAULT</cipherList>
</ssl>
</vmacore>

Tag:  compressOnRoll
Nested In:  log
What It Does:  Defines whether or not vCenter Server vpxd log files are rolled up and compressed into .gz files.  Read more here.
Example:

<log>
<compressOnRoll>false</compressOnRoll>
</log>

Tag:  cpuFeatureMask
Nested In:  guestOSDescriptor, esx-2-x-x, all-versions, all-guests
What It Does:  Masks CPU features to force VMotion compatibility between hosts. VMware neither supports nor recommends modifying the VMotion constraints for CPU features.  Read more here.
Example:

<guestOSDescriptor>
<esx-2-x-x>
<all-versions>
<all-guests>
<cpuFeatureMask>Elements and mask definition go in here</cpuFeatureMask>
</all-guests>
</all-versions>
</esx-2-x-x>
</guestOSDescriptor>

Tag:  directory
Nested In:  log
What It Does:  Defines the location for the vCenter logs.  Read more here.
Example:

<log>
<directory>D:\VC_Logs</directory>
</log>

Tag:  dontstartconsolidation
Nested In:  vcp2v
What It Does:  May resolve an issue where the Consolidation button is missing in the Virtual Infrastructure Client.  Read more here.
Example:

<vcp2v>
<dontstartconsolidation>true</dontstartconsolidation>
</vcp2v>

Tag:  filterOverheadLimitIssues
Nested In:  vpxd
What It Does:  Unknown.
Example:

<vpxd>
<filterOverheadLimitIssues>true</filterOverheadLimitIssues>
</vpxd>

Tag:  hostRescanFilter
Nested In:  unknown
What It Does:  Defines the behavior of mass ESX(i) host rescans of vmHBAs.  Read more here.
Example:

<hostRescanFilter>true</hostRescanFilter>

Tag:  IoMax
Nested In:  vmacore, threadpool
What It Does:  Unknown but my guess is it defines the maximum I/O for the vpxd.exe process (vCenter Server service). Influenced by TaskMax.
Example:

<vmacore>
<threadpool>
<IoMax>200</IoMax>
</threadpool>
</vmacore>

Tag:  level
Nested In:  log
What It Does:  Defines the logging level for vCenter logs.  Read more here.
Example:

 <log>
<level>trivia</level>
</log>

Tag:  logLevel
Nested In:  trace, vmomi
What It Does:  Enables debug logging level for vmomi?  Read more here.
Example:

<trace>
<vmomi>
<logLevel>verbose</logLevel>
</vmomi>
</trace>

Tag:  loglevel
Nested In:  nfc
What It Does:  Enables debug logging level for the NFC process.  Read more here.
Example:

<nfc>
<loglevel>debug</loglevel>
</nfc>

Tag:  managedIP
Nested In:  unknown
What It Does:  Defines the managed IP address used in vCenter Server Heartbeat.  Read more here.
Example:

<managedIP>10.10.0.1</managedIP>

Tag:  maxCostPerHost
Nested In:  ResourceManager
What It Does:  Defines the number of simultaneous VM migrations (both hot and cold) per ESX(i) host.  Read more here.
Example:

<ResourceManager>
<maxCostPerHost>8</maxCostPerHost>
</ResourceManager>

Tag:  maxFileNum
Nested In:  log
What It Does:  Defines the maximum number of log files for vCenter logs.  Read more here.
Example:

<log>
<maxFileNum>50</maxFileNum>
</log>

Tag:  maxFileSize
Nested In:  log
What It Does:  Defines the maximum log file size in Bytes and thus rollover interval for vCenter logs.  Read more here.
Example:

<log>
<maxFileSize>10485760</maxFileSize>
</log>

Tag:  name
Nested In:  log
What It Does:  Defines the log file prefix name for vCenter logs.  Read more here.
Example:

<log>
<name>vpxd</name>
</log>

Tag:  notRespondingTimeout
Nested In:  heartbeat
What It Does:  Defines the heartbeat timeout value in seconds between ESX(i) hosts and vCenter Server.  Use case would be to increase the value if remote ESX(i) hosts frequently go into a not responding state in vCenter Server due to WAN bandwidth or latency issues.  Read more here.
Example:

<heartbeat>
<notRespondingTimeout>60</notRespondingTimeout>
</heartbeat>

Tag:  portReserveTimeout
Nested In:  dvs
What It Does:  Defines the timeout value in minutes for unused dvPort reservations.  Lowering the value temporarily is helpful for unlocking dvPorts to remove a vDS or dvPort group.  Read more here.
Example:

<dvs>
<portReserveTimeout>10</portReserveTimeout>
</dvs>

Tag:  serializeadds
Nested In:  vpxd, das
What It Does:  Unknown but if I had to guess I’d say it defines the behavior of how the HA agent is installed on cluster hosts.
Example:

<vpxd>
<das>
<serializeadds>true</serializeadds>
</das>
</vpxd>

Tag:  slotCpuMinMHz
Nested In:  vpxd, das
What It Does:  Defines the minimum CPU calculation of a HA cluster slot size when there are no CPU reservations. Read more here.
Example:

<vpxd>
<das>
<slotCpuMinMHz>256</slotCpuMinMHz>
</das>
</vpxd>

Tag:  slotMemMinMB
Nested In:  vpxd, das
What It Does:  Defines the minimum memory calculation of a HA cluster slot size when there are no memory reservations. Read more here.
Example:

<vpxd>
<das>
<slotMemMinMB>0</slotMemMinMB>
</das>
</vpxd>

Tag:  sspiProtocol
Nested In:  unknown
What It Does:  Defines the authentication mechanism used with passthrough authentication between the Virtual Infrastructure Client and vCenter Server.  Read more here.
Example:

<sspiProtocol>Kerberos</sspiProtocol>

Tag:  TaskMax
Nested In:  vmacore, threadpool
What It Does:  Defines the number of worker threads for the vpxd.exe process (vCenter Server service). Influences IoMax.
Example:

<vmacore>
<threadpool>
<TaskMax>30</TaskMax>
</threadpool>
</vmacore>

Tag:  timeout
Nested In:  task
What It Does:  Defines the timeout value in seconds for long tasks.  Read more here.
Example:

<task>
<timeout>10800</timeout>
</task>

Tag:  verbose
Nested In:  trace, db
What It Does:  Enables database tracing.  Enables database logging in the vpxd log.  Read more here and here.
Example:

<trace>
<db>
<verbose>true</verbose>
</db>
</trace>

Tag:  verbosity
Nested In:  trace, vmomi
What It Does:  Unknown.  Read more here.
Example:

<trace>
<vmomi>
<verbosity>verbose</verbosity>
</vmomi>
</trace>

Tag:  verboseObjectSize
Nested In:  trace, vmomi
What It Does:  Unknown.  Read more here.
Example:

<trace>
<vmomi>
<verboseObjectSize>40</verboseObjectSize>
</vmomi>
</trace>

Tag:  VMOnVirtualIntranet
Nested In:  migrate, test, CompatibleNetworks
What It Does:  Setting to false enables VMotion for VMs connected to an internal vSwitch. Setting to false will turn off the internal vSwitch restriction on VMotion events. Useful for servers behind a firewall virtual appliance deployed in bridged networking mode.  Read more here.
Example:

<migrate>
<test>
<CompatibleNetworks>
<VMOnVirtualIntranet>false</VMOnVirtualIntranet>
</CompatibleNetworks>
</test>
</migrate>

Tag:  VMOverheadGrowthLimit
Nested In:  cluster
What It Does:  Defines the growth rate cap in terms of MB per minute for VM memory overhead at the cluster level. Can be adjusted to resolve high CPU utilization in guest VMs introduced in ESX(i) 3.5 and vCenter 2.5.  Read more here.
Example:

<cluster>
<VMOverheadGrowthLimit>5</VMOverheadGrowthLimit>
</cluster>

 

Slightly related, the vCenter Server process (vpxd.exe) can be launched at a command prompt on the vCenter Server (instead of starting as a service) for troubleshooting purposes.  The executable is located at:

<Install Directory>\VMware\Infrastructure\VirtualCenter Server>vpxd.exe

Usage: vpxd.exe [FLAGS]
Flags:
-r Register VMware VirtualCenter Server
-u Unregister VMware VirtualCenter Server
-s Run as a standalone server rather than a Service
-c Print vmdb schema to stdout
-b Recreate database repository
-f cfg Use the specified file instead of the default vpxd.cfg
-l licenseKey Store license key in ldap and assign it to VirtualCenter
-e feature Set the feature to be in use for VirtualCenter. This option takes only one feature at a time.
-p Reset the database password
-v Print the version number to stdout

Perf Charts Service Experienced An Internal Error

March 12th, 2010

Happy Friday evening y’all.  Tonight’s blog post comes from a former colleague of mine whom I will call “Paul Berg”.  Paul came across an error in VMware vSphere which he was able to resolve and he would like to share the solution with the VMware community. 

Paul uses an Oracle database to back end vCenter. When viewing the performance charts in Performance tab | Overview button, he received the following error:

Perf Charts service experienced an internal error.

Message:  Report application initialization is not completed successfully.  Retry in 60 seconds.

You can probably guess what followed… missing data in the charts.  No joy whatsoever.

Following is the resolution:

1. Get the fully qualified domain name or the global name of the TNS service from the Oracle database. This can be found in the file named tnsnames.ora on the Oracle database server

2. Add this FQDN to the registry key HKLM\Software\ODBC\ODBC.INI\VirtualCenter\ServerName on the VC server.

3. Restart the VMware VirtualCenter Server service.

For us, the database was listed as VMDB in the registry. We have moved to an Oracle RAC configuration so I needed to change the entry to VMDB.GLOBAL to match what was in the tnsnames.ora listing. I wasn’t aware that VMDB.GLOBAL was considered the FQDN for an Oracle DB.

The following VMware KB Article 1012812 documents the issue as well as a few different approaches to a resolution depending on root cause.  Again, this issue is specific to Oracle database environments.

Performance Overview charts fail with the error: STATs Report Service internal error

Thank you for sharing Paul.  I’ve got one more in the queue from you – I’ll try to get it out in the next couple of weeks.  Here’s a teaser: Poor vSphere performance on Nehalem processors.  Ouch!

VMware Update Manager Becomes Self-Aware

March 4th, 2010

@Mikemohr on Twitter tonight said it best:

“Haven’t we learned from Hollywood what happens when the machines become self-aware?”

I got a good chuckle.  He took my comment of VMware becoming “self-aware” exactly where I wanted it to go.  A reference to The Terminator series of films in which a sophisticated computer defense system called Skynet becomes self-aware and things go downhill for mankind from there.

Metaphorically speaking in today’s case, Skynet is VMware vSphere and mankind is represented by VMware vSphere Administrators.

During an attempt to patch my ESX(i)4  hosts, I received an error message (click the image for a larger version):

At that point, the remediation task fails and the host is not patched.  The VUM log file reflects the same error in a little more detail:

[2010-03-04 14:58:04:690 'JobDispatcher' 3020 INFO] [JobDispatcher, 1616] Scheduling task VciHostRemediateTask{675}
[2010-03-04 14:58:04:690 'JobDispatcher' 3020 INFO] [JobDispatcher, 354] Starting task VciHostRemediateTask{675}
[2010-03-04 14:58:04:690 'VciHostRemediateTask.VciHostRemediateTask{675}' 2676 INFO] [vciTaskBase, 534] Task started…
[2010-03-04 14:58:04:908 'VciHostRemediateTask.VciHostRemediateTask{675}' 2676 INFO] [vciHostRemediateTask, 680] Host host-112 scheduled for patching.
[2010-03-04 14:58:05:127 'VciHostRemediateTask.VciHostRemediateTask{675}' 2676 INFO] [vciHostRemediateTask, 691] Add remediate host: vim.HostSystem:host-112
[2010-03-04 14:58:13:987 'InventoryMonitor' 2180 INFO] [InventoryMonitor, 427] ProcessUpdate, Enter, Update version := 15936
[2010-03-04 14:58:13:987 'InventoryMonitor' 2180 INFO] [InventoryMonitor, 460] ProcessUpdate: object = vm-2642; type: vim.VirtualMachine; kind: 0
[2010-03-04 14:58:17:533 'VciHostRemediateTask.VciHostRemediateTask{675}' 2676 WARN] [vciHostRemediateTask, 717] Skipping host solo.boche.mcse as it contains VM that is running VUM or VC inside it.
[2010-03-04 14:58:17:533 'VciHostRemediateTask.VciHostRemediateTask{675}' 2676 INFO] [vciHostRemediateTask, 786] Skipping host 0BC5A140, none of upgrade and patching is supported.
[2010-03-04 14:58:17:533 'VciHostRemediateTask.VciHostRemediateTask{675}' 2676 ERROR] [vciHostRemediateTask, 230] No supported Hosts found for Remediate.
[2010-03-04 14:58:17:737 'VciRemediateTask.RemediateTask{674}' 2676 INFO] [vciTaskBase, 583] A subTask finished: VciHostRemediateTask{675}

Further testing in the lab revealed that this condition will be caused with a vCenter VM and/or a VMware Update Manager (VUM) VM. I understand from other colleagues on the Twitterverse that they’ve seen the same symptoms occur with patch staging.

The work around is to manually place the host in maintenance mode, at which time it has no problem whatsoever evacuating all VMs, including infrastructure VMs.  At that point, the host in maintenance mode can be remediated.

VMware Update Manager has apparently become self-aware in that it detects when its infrastructure VMs are running on the same host hardware which is to be remediated.  Self-awareness in and of itself isn’t bad, however, its feature integration is.  Unfortunately for the humans, this is a step backwards in functionality and a reduction in efficiency for a task which was once automated.  Previously, a remediation task had no problem evacuating all VMs from a host, infrastructure or not. What we have now is… well… consider the following pre and post “self-awareness” remediation steps:

Pre “self-awareness” remediation for a 6 host cluster containing infrastructure VMs:

  1. Right click the cluster object and choose Remediate
  2. Hosts are automatically and sequentially placed in maintenance mode, evacuated, patched, rebooted, and brought out of maintenance mode

Post “self-awareness” remediation for a 6 host cluster containing infrastructure VMs:

  1. Right click Host1 object and choose Enter Maintenance Mode
  2. Wait for evacutation to complete
  3. Right click Host1 object and choose Remediate
  4. Wait for remediation to complete
  5. Right click Host1 object and choose Exit Maintenance Mode
  6. Right click Host2 object and choose Enter Maintenance Mode
  7. Wait for evacutation to complete
  8. Right click Host2 object and choose Remediate
  9. Wait for remediation to complete
  10. Right click Host2 object and choose Exit Maintenance Mode
  11. Right click Host3 object and choose Enter Maintenance Mode
  12. Wait for evacutation to complete
  13. Right click Host3 object and choose Remediate
  14. Wait for remediation to complete
  15. Right click Host3 object and choose Exit Maintenance Mode
  16. Right click Host4 object and choose Enter Maintenance Mode
  17. Wait for evacutation to complete
  18. Right click Host4 object and choose Remediate
  19. Wait for remediation to complete
  20. Right click Host4 object and choose Exit Maintenance Mode
  21. Right click Host5 object and choose Enter Maintenance Mode
  22. Wait for evacutation to complete
  23. Right click Host5 object and choose Remediate
  24. Wait for remediation to complete
  25. Right click Host5 object and choose Exit Maintenance Mode
  26. Right click Host6 object and choose Enter Maintenance Mode
  27. Wait for evacutation to complete
  28. Right click Host6 object and choose Remediate
  29. Wait for remediation to complete
  30. Right click Host6 object and choose Exit Maintenance Mode

It’s Saturday and your kids want to go to the park. Do the math.

11 New ESX(i) 4.0 Patch Definitions Released; 6 Critical

March 3rd, 2010

Eleven new patch definitions have been released for ESX(i) 4.0 (7 for ESX, 2 for ESXi, 2 for the Cisco Nexus 1000V).  Previous versions of ESX(i) are not impacted.

6 of the 11 patch definitions are rated critical and should be evaluated quickly for application in your virtual infrastructure.

ID: ESX400-201002401-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates vmkernel64,vmx,hostd etc

This patch provides support and fixes the following issues:

  • On some systems under heavy networking and processor load (large number of virtual machines), some NIC drivers might randomly attempt to reset the device and fail.
    The VMkernel logs generate the following messages every second:
    Oct 13 05:19:19 vmkernel: 0:09:22:33.216 cpu2:4390)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
    Oct 13 05:19:20 vmkernel: 0:09:22:34.218 cpu8:4395)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
  • ESX hosts do not display the proper status of the NFS datastore after recovering from a connectivity loss.
    Symptom: In vCenter Server, the NFS datastore is displayed as inactive.
  • When using NPIV, if the LUN on the physical HBA path is not same as the LUN on the virtual port (VPORT) path, though the LUNID:TARGETID pairs are same, then I/O might be directed to the wrong LUN causing a possible data corruption. Refer KB 1015290 for more information.
    Symptom: If NPIV is not configured properly, I/O might be directed to the wrong disk.
  • On Fujitsu systems, the OEM-IPMI-Command-Handler that lists the available OEM IPMI commands do not work as intended. No custom OEM IPMI commands are listed, though they were initialized correctly by the OEM. After applying this fix, running the VMware_IPMIOEMExtensionService and VMware_IPMIOEMExtensionServiceImpl objects displays the supported commands as listed in the command files.
  • Provides prebuilt kernel module drivers for Ubuntu 9.10 guest operating systems.
  • Adds support for upstreamed kernel PVSCSI and vmxnet3 modules.
  • Provides a change to the maintenance mode requirement during Cisco Nexus 1000V software upgrade. After installing this patch if you perform Cisco Nexus 1000V software upgrade, the ESX host goes into maintenance mode during the VEM upgrade.
  • In certain race conditions, freeing journal blocks from VMFS filesystems might fail. The WARNING: J3: 1625: Error freeing journal block (returned 0) <FB 428659> for 497dd872-042e6e6b-942e-00215a4f87bb: Lock was not free error is written to the VMware logs.
  • Changing the resolution of the guest operating system over a PCoIP connection (desktops managed by View 4.0) might cause the virtual machine to stop responding.
    Symptoms: The following symptoms might be visible:

    • When you try to connect to the virtual machine through a vCenter Server console, a black screen appears with the Unable to connect to MKS: vmx connection handshake failed for vmfs {VM Path} message.
    • Performance graphs for CPU and memory usage in vCenter Server drop to 0.
    • Virtual machines cannot be powered off or restarted.

ID: ESX400-201002402-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates initscripts

This patch fixes an issue where pressing Ctrl+Alt+Delete on service console causes ESX 4.0 hosts to reboot.

ID: ESX400-201002404-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates glib2

The service console package for GLib2 is updated to version glib2-2.12.3-4.el5_3.1. This GLib update fixes an issue where the functions inside GLib incorrectly allows multiple integer overflows leading to heap-based buffer overflows in GLib’s Base64 encoding and decoding functions. This might allow an attacker to possibly execute arbitrary code while a user is running the application. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2008-4316 to this issue.

ID: ESX400-201002405-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates megaraid-sas

This patch fixes an issue where some applications do not receive events even after registering for Asynchronous Event Notifications (AEN). This issue occurs when multiple applications register for AENs.

ID: ESX400-201002406-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates newt

The service console package for Newt library is updated to version newt-0.52.2-12.el5_4.1. This security update of Newt library fixes an issue where an attacker might cause a denial of service or possibly execute arbitrary code with the privileges of a user who is running applications using the Newt library. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2009-2905 to this issue.

ID: ESX400-201002407-SG Impact: HostSecurity Release date: 2010-03-03 Products: esx 4.0.0 Updates nfs-utils

The service console package for nfs-utils is updated to version nfs-utils-1.0.9-42.el5. This security update of nfs-utils fixes an issue that might permit a remote attacker to bypass an intended access restriction. The Common Vulnerabilities and Exposures Project (cve.mitre.org) has assigned the name CVE-2008-4552 to this issue.

ID: ESX400-201002408-BG Impact: Critical Release date: 2010-03-03 Products: esx 4.0.0 Updates Enic driver

In scenarios where Pass Thru Switching (PTS) is in effect, if virtual machines are powered on, the network interface might not come up. In PTS mode, when the network interface is brought up, PTS figures the MTU from the network. There is a race in this scenario, where the enic driver might incorrectly indicate that the driver fails. This issue might occur frequently on a CISCO UCS system. This patch fixes the issue.

ID: ESXi400-201002401-BG Impact: Critical Release date: 2010-03-03 Products: embeddedEsx 4.0.0 Updates Firmware

This patch provides support and fixes the following issues:

  • On some systems under heavy networking and processor load (large number of virtual machines), some NIC drivers might randomly attempt to reset the device and fail.
    The VMkernel logs generate the following messages every second:
    Oct 13 05:19:19 vmkernel: 0:09:22:33.216 cpu2:4390)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
    Oct 13 05:19:20 vmkernel: 0:09:22:34.218 cpu8:4395)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
  • ESX hosts do not display the proper status of the NFS datastore after recovering from a connectivity loss.
    Symptom: In vCenter Server, the NFS datastore is displayed as inactive.
  • When using NPIV, if the LUN on the physical HBA path is not same as the LUN on the virtual port (VPORT) path, though the LUNID:TARGETID pairs are same, then I/O might be directed to the wrong LUN causing a possible data corruption. Refer KB 1015290 for more information.
    Symptom: If NPIV is not configured properly, I/O might be directed to the wrong disk.
  • On Fujitsu systems, the OEM-IPMI-Command-Handler that lists the available OEM IPMI commands do not work as intended. No custom OEM IPMI commands are listed, though they were initialized correctly by the OEM. After applying this fix, running the VMware_IPMIOEMExtensionService and VMware_IPMIOEMExtensionServiceImpl objects displays the supported commands as listed in the command files.
  • Provides prebuilt kernel module drivers for Ubuntu 9.10 guest operating systems.
  • Adds support for upstreamed kernel PVSCSI and vmxnet3 modules.
  • Provides a change to the maintenance mode requirement during Cisco Nexus 1000V software upgrade. After installing this patch if you perform Cisco Nexus 1000V software upgrade, the ESX host goes into maintenance mode during the VEM upgrade.
  • In certain race conditions, freeing journal blocks from VMFS filesystems might fail. The WARNING: J3: 1625: Error freeing journal block (returned 0) <FB 428659> for 497dd872-042e6e6b-942e-00215a4f87bb: Lock was not free error is written to the VMware logs.
  • Changing the resolution of the guest operating system over a PCoIP connection (desktops managed by View 4.0) might cause the virtual machine to stop responding.
    Symptoms: The following symptoms might be visible:

    • When you try to connect to the virtual machine through a vCenter Server console, a black screen appears with the Unable to connect to MKS: vmx connection handshake failed for vmfs {VM Path} message.
    • Performance graphs for CPU and memory usage in vCenter Server drop to 0.
    • Virtual machines cannot be powered off or restarted.

ID: ESXi400-201002402-BG Impact: Critical Release date: 2010-03-03 Products: embeddedEsx 4.0.0 Updates VMware Tools

This patch fixes an issue where pressing Ctrl+Alt+Delete on service console causes ESX 4.0 hosts to reboot.

ID: VEM400-201002001-BG Impact: HostGeneral Release date: 2010-03-03 Products: embeddedEsx 4.0.0, esx 4.0.0 Cisco Nexus 1000V VEM

ID: VEM400-201002011-BG Impact: HostGeneral Release date: 2010-03-03 Products: embeddedEsx 4.0.0, esx 4.0.0 Cisco Nexus 1000V VEM

Thank You

February 27th, 2010

Once in a while, I’m a witness to acts of extraordinary kindness from a person or group of persons.  It may not occur on a regular basis, but when it does, it is something special to behold.  It happened this afternoon at the Minneapolis Area VMware Users Group (VMUG) meeting.

It started out as a fairly typical event.  I called the meeting to order, briefly went through some general business and current events in the VMware virtualization community, and then turned the meeting over to our first speakers Craig Drugge and Pavan Jhamnani of Syncsort.  I took a seat, prepared to learn about Syncsort’s data protection and rapid recovery technologies.  However that was not to be, at least not right away.  Instead, Pavan invited Michael Cardinal of ThinLaunch up on stage.  I was curious about what was transpiring since this was Syncsort’s hour and I wasn’t aware that ThinLaunch had any ties to Syncsort’s technology.

Michael took the stage with a white paper bag in hand and began speaking to the audience about a person he has known for a few years.  A person who diggs virtualization.  A person whom he’d bumped into at VMware Partner Exchange early Wednesday morning at Starbucks Mandalay Bay.  I caught on pretty quickly that he was referring to me.  Michael proceeded to announce my recent VCDX certification accomplishment.  I thought that was extremely generous of him, but there was more.  Michael asked me to come up on stage where he presented me with a gift.  This was something that he, his wife, and Bill Hinkens (Territory Manager, VMware) collaborated on.  Michael turned the bag around to reveal the VMware diamond plate artwork along with my name and VCDX #34 on it.  Inside the bag was a black VMware fleece sweater, again with my name, VCDX, and #34 on it.  I was at a loss for words.  I accpeted the gift, thanked Michael, and we took our seats. The meeting continued from its brief diversion.

The sweater, the bag, the presentation, the planning, the thought, these were all wonderful gifts from a group of people who went out of their way which I will remember for a long time.  Virtualization, for me, has built a great community of people and in many cases has yielded friendships at a professional as well as a personal level.  For that I am very thankful and each day I look forard to what the future brings.

Thank you.

RVTools 2.8.1 Released

February 21st, 2010

Rob de Veij has released version 2.8.1 of his stellar virtualization utility RVTools.  I love this free tool as it provides valuable information about my infrastructure in a fast and easy format.

New in this version:
- On vHost tab new field: number of running vCPUs
- On vSphere VMs in vApp where not displayed.
- Filter not working correct when annotations or custum fields contains null value.
- When NTP server(s) = null the time info fields are not displayed on the vHost tabpage.
- When datastore name or virtual machine name containts spaces the inconsistent foldername check was not working correct.
- Tools health check now only executed for running VMs.

Go download this tool today and be sure to tell Rob how much you appreciate his development efforts!

VMware, much of this information is vital as it pertains to configuration maximums and should be available in the VMware vSphere Client for capacity planning purposes.

VCDX #34 – The Conclusion of a Journey

February 19th, 2010

Last Sunday I wrote about my VCDX Defense experience. This evening I am fortunate enough to share the news that I have passed the final board review and have achieved VCDX certification. I was awarded VCDX #34.  For the others who defended last week in Las Vegas, I offer my congratulations to you all on a job which I’m sure was well done.  Without a doubt, it was a journey which I’m sure will benefit me for many years to come.  I’m proud to have walked down a path paved by so much collective brilliance before me. I am inspired and driven by the knowledge shared in the virtualization community. I hope that I can continue provide the best I have to offer in return.

It is not my intent to turn this into the Acadamy Awards, but I would be extremely negligent if I didn’t thank key people who devoted their time to ensure my success by reviewing my design, challenging me with questions, as well as those who provided tips and encouragement for the defense.  I had several weaknesses exposed and with your help I was able to strengthen in those areas prior to my defense.

Amy (I didn’t receive your note until after the defense, but I was really touched. Your support, patience, and understanding is nothing short of amazing)
Gary Bowman (old guy… mock defense was very helpful!)
Gabrie Van Zanten (seriously, with the questions, you had too much fun…)
Roger Lund (great questions from you, thank you for taking the time)
David Davis (tremendous help from a CCIE… I’m not even worthy)
Scott Lowe (thank you for the offer and last minute design tips)
Michael Cardinal (Wednesday morning shot of confidence at Starbucks)
Rick Scherer (tips on calming nerves were great – I followed to a T)
John Arrasjid (so many great VCDX tips, invaluable!)
Duncan Epping (I got a lot more than breakfast out of you Tuesday morning, you don’t even know)
Frank Denneman (thank you for the help, confidence, & for not making faces at me)
Rich Brambley (UGG who told me Tuesday evening I can do this)
Andrew Hald (Tuesday dinner.. thank you for letting me join you)
Spencer Critchlow (your tips were invaluable!)
Doug Hazelman (Veeam played a helpful role in my design)
Dawn Theirl (thank you for the encouragement)

Tips for the Defense:
1) Know your design, I mean really know it.
2) Refer to tip #1

Good luck.

My VCDX Defense Experience

February 14th, 2010

Last Wednesday morning in Las Vegas, I participated in my VMware Certified Design Expert (VCDX) Defense.  A successful Defense is the last in a series of required steps to obtain VCDX certification.  Defense experiences have been shared by others such as Rick Scherer, Dave Convery, and Duncan Epping.  I found my own Defense experience to be similar to theirs.

Prior to the Defense, I submitted an application and a design for the panelists to review.  As Dave Convery pointed out, this may be the hardest part of the entire process as far as the volume of work goes.  The design is a complete set of documentation that must meet key requirements outlined in the application.  There is not a lot of time to complete the application and design once you are invited for that step.  My best advice would be to clear your schedule as much as possible to crank out quality documentation.  Also, be sure the application is filled out completely and the design covers all requirements.  Missing information risks outright rejection and you’ll likely miss the opportunity for the upcoming defense.  It is absolutely critical that all fields in the application are completed.  This cannot be stressed enough. The panelists will spend up to 8 hours reviewing the design.  The submitted documentation is more about quality than quantity. Be sure the documentation submitted is relavant to the design.  Any information the panelists cannot pull from the submitted design will need to be clarified during the defense which is then a pressure situation for the candidate.

Once the application and design is accepted, the defense date is scheduled around a major VMware event.  Typically VMworld or Partner Exchange (PEX).  My defense was scheduled at PEX in Las Vegas.  As I am not a partner nor do I work for a partner, I did not attend PEX or any of its sessions.  I flew in on a Tuesday morning and left a day later, merely for the defense. This strategy is fine with me as I would rather stay focused on the defense and my design, and not face daily distractions and new information released at a conference.

During the days leading up to my defense, I felt very confident.  I had been studying my design and going over all the Enterprise Admin and Design exam study material on a daily basis.  I had been brushing up on white papers and blog articles for areas which I felt I was weak on or had forgotten details of.  I brought a 3 ring binder filled with about 400 pages of documentation as well as every VI3 published .pdf known to mankind on my thumb drive.  While I didn’t read all the .pdf files, they were with me if I needed them for reference.  As it turned out, a few of the documents I crammed on the night before my panel would play a nice role during part of my defense.

After arriving in Las Vegas Tuesday morning, my confidence level remained as high as ever.  I had spent the entire 3 hours on the plane reading out of my 3 ring binder.  Outside of having breakfast with a friend, I spent a good portion of Tuesday studying which was my intent in booking a Tuesday morning arrival.  Early Tuesday afternoon, exhaustion hit me like a ton of bricks. I decided to try to take a nap. I laid in bed for close to an hour and couldn’t fall asleep. I decided to try a long bath in my swanky bathroom with a TV in it (my favorite part of the trip I think). I got my second wind and attended a meeting Tuesday evening for about an hour where I met up with fellow vExperts.  Asked how I felt about the following morning’s defense, again my answer was mostly confident, cool, and collected.  I just wanted to get it over with.  The anxiety of the approaching defense date was starting to mount.  I found myself calculating the hours remaining in my head. “In 15 hours I will have started my defense.  In 17 hours I will have finished the first defense section.  In 18 hours it will all be over with.”  After the meeting, some of the guys were going out on the strip for a nice dinner.  I really wanted to go but knew had no time for this social event.  I hung back and had a quick buffet dinner with a guy who I would find out was a VCDX himself and a panelist from VMware.  I was back to my room by 8:30pm and studied until about 10:15pm.  At that point, I was getting tired and decided to take the wise advice of Rick Scherer and John Arrisjid by getting a good night’s sleep.

I was getting good sleep until… I woke up at 3:30am and couldn’t fall back asleep.  I laid in bed for a full 2.5 hours thinking about my upcoming defense, points I wanted to make, design choices, etc. It’s a long time to dwell on these items but it was quiet and peaceful and I was well rested. I shot out of bed at my 6am wake up call, got ready, packed, and headed out.  I stopped by the hotel business center to print 4 copies of a presentation slide update I had made the night before. I forgot to print current slide only and instead printed 4 copies of the entire deck. Expensive lesson printing 60 pages which couldn’t be cancelled (how convenient for the hotel). At least they were in B&W and not color. The plan was to get a good breakfast to calm any nerves that may develop (advice from Rick Scherer).  Unfortunately, there was no breakfast open at 6:30am. The restaurants didn’t open until 7am.  I headed to Starbucks to start getting caffienated. While having coffee and going through my slides, I decided to create 3 new slides right then and there.  I felt they would be beneficial for the executive presentation but a small part of me challenged “is this really wise throwing these in at the last second?”  Why not.  SEs do it all the time prior to arriving at customer sites.  At this point I still felt pretty confident and didn’t really have any nerves.

At 7:30 I finished my coffee and headed to breakfast. Last minute cramming at the breakfast buffet table downing coffee and some food. As the clock passed 8am, I had less than an hour left to head upstairs for my defense panel.  I could start to feel the nervousness set in. I continued to study until I realized it was 8:50am and I had less than 10 minutes to get through the casino over to 2nd level of the convention center. Whoops.  I arrived at Breakers L with maybe 2 minutes to spare and Melissa greeted me.  The panelists were waiting inside and not quite ready for me yet. In the mean time, I walked across the hall and poked my head in a large auditorium to see who was speaking. It was Steve Herrod talking about a technology which I cannot repeat at this point in time. I told myself repeatedly that I am not nervous but I was only lying to myself. It’s inevitible. When the exam room doors open and you see the panel of experts in there, you feel it. I surmise it may be a bit like meeting the Father, the Son, and the Holy Ghost for the first time. People who spend a lot of time in front of customers are still nervous for these defense panels. It’s unavoidable. One candidate who finished his defense Tuesday evening likened the experience to having “a proctology exam”.

The first 75 minutes is spent “defending” my design.  I’ve got about a 15 slide deck to get through and to use as reference throughout the design defense.  I recommend putting as much reference as you can in the slide deck which you can yourself refer to during the defense.  It will help illustrate design choices and jog your memory for design elements which you’ve forgotten due to nervousness. The first 5-10 minutes I was pretty nervous and stuttered once or twice during my presentation. After that, I warmed up and it felt more like a good technical discussion with co-workers which I enjoyed. As the questions started coming in, I made good use of some of the slides to help explain decisions.  Good slides to have here are architecture diagrams, network, storage, etc.  I felt my performance during this section of the defense was passable based on the questioning I received, but the honest truth is it’s too hard to tell with the scoring method that is used.  It’s about accumulating points.  What’s unknown is how many points were left to accumulate and areas to talk about which we did not get to due to the 75 minutes of time expiring? Afterwards, I can’t help but think about 1 technical question I knew I jumped the gun on and answered incorrectly, failing to correct myself. I’m told by a current VCDX to not worry about it, nobody is perfect in the defense – that is to say, the scoring of the defense will allow for X number of mistakes. I’ve also spent time playing back other areas of the defense, wondering if I clarified my points clear enough? Trying to remember if the panelists understood that one of the points I was making was in the context of a specific circumstance and it would be important that they would understand that for it to be technically correct.  Did they understand the physical network topology well enough between sites or draw a harmful conclusion that I was contradicting myself during explanation?  I can’t stress enough how fast the time elapses in front of the panel.  At least it did for me.

After the 75 minute defense, we took a short break and proceeded with the 30 minute mock design.  In retrospect, the scenario which was thrown at me wasn’t too bad.  Unfortunately I didn’t get through nearly as much of it as I wanted to.  I spent a lot of time digging in areas where there were probably no more points to be had I should have just moved on.  I wish I had another shot at it and I would have moved faster.  The idea in this section is to ask a lot of intelligent questions to frame out a design in 30 minutes.  But don’t spend too much time in one area.  This section is more about “the journey” than the final design.  Questions need to be asked of the “customers” during the design process so they can see how you think on your feet.  They may also not provide all of the needed information for the design which is, again, where asking questions comes in.  Once again, time flies.  Be quick but be as thorough as possible.  Think out loud.

After completing the 30 minute mock design section, we moved right into the last section which is a 15 minute troubleshooting scenario.  The three panelists are once again the customers in this scenario and they came to me with a VMware Infrastructure 3 problem they are experiencing.  Once again, this process is more about “the journey” than the final result.  It’s about thinking out loud, asking questions of the customer, and showing them the throught process to isolate root cause of a problem. I feel I did well in this section and will go so far as to say that I found the root cause. Before I could get acknowledgement, however, the 15 minute timer expired.  I do not know how each section is weighted, if it is, but hopefully I did do well enough on the last section to help carry me through the two previous sections.  A common occurrance through the Enterprise Admin and Design written exams was that I felt I did poorly in one section, but stellar in another, which carried me through to a passing score on each written exam.

The panelists and observers were a good group of people and I can honestly say that once I got beyond that first 5-10 minutes of nerves, the pressure wasn’t nearly as bad as I thought it would be.  I think it all depends on how prepared one is for the experience.  You may have heard other people say “Know your design inside and out”. This could not be closer to the truth. Know it up, down, sideways, back, and front. Be prepared for any question relating to your design, including upstream and downstream impacts. Know the infrastructure components well such as storage and hardware platforms. Anything you list in your design you need to be able to speak to. If you cannot speak to everything in your design, then how do you know it is appropriate for your design? “Because”, and “Best Practice” are not complete answers.  I’ve collected a ton of tips along the way (like these) and each of them contributed to getting me as far as I’ve gotten at this point.  Social networking tools have helped immensely.  I can’t imagine going this alone in a vacuum.  I would have been totally unprepared for the design defense, if I even made it that far.

So after my defense, I was told “7 days” in regard to getting my results. I was hoping to be pleasantly surprised with results late Friday after the defenses at Mandalay Bay wrapped up.  However, having not received them yet and tomorrow is a holiday, it looks like it will take the full week (and hopefully not longer) to get the results.  It has been difficult waiting this long.  Anxiety is building and I’ve been watching email like a man possessed.  I’ve been replaying the scenarios in my head, both good and bad.  It’s unhealthy for sure. Although no formal statistics have been released by VMware, I gather through word of mouth that about 50% of the candidates pass their defense attempt, while the other 50% do not.  With two individuals from this past week already pronounced as having passed and becoming VCDX certified, the odds are starting to stack up against those like me who still wait for their results.  I’m trying to keep my mind occupied on other things but it is difficult.  I periodically take comfort in thinking about things far more important, like smiles on my childrens’ faces. For those that pass, I’m sure they look back upon the efforts as well spent and the reward of passing as well deserved.  I know that I have already benefited from what I have learned through the process. It has taught me to be more of a thinker which maps directly to my Design and Engineering role at work. I would love nothing more at this point than to have the VCDX certificate to go along with it.  I look at the VCDX as a highly coveted certification with a lot of integrity built into the program and process which is sure to last a long time. There is no possibility of a “paper VCDX” as far as I’m concerned. That means value for cert holders and businesses for many years to come.

Oh I almost forgot, I brought my own whiteboard dry erase marker on the trip and used it during my defense. I had been using it for practice on my whiteboard at home and thought it may bring me good luck in the defense. Shabby dry erase markers can be a distraction.  In addition, it has a fine eraser on the opposite end which comes in handy and can save time wiping away small details instead of using the huge brick eraser.  The panel didn’t seem to have any reservations with me using it.  Click the image to view a larger version.

VMkernel Networks, Jumbo Frames, and ESXi 4

February 12th, 2010

Question:  Can I implement jumbo frames on ESXi 4 Update 1 VMkernel networks?

Answer:  Who in the hell knows?

You see, the ESXi 4.0 Update 1 Configuration Guide states on page 54:

“Jumbo frames are not supported for VMkernel networking interfaces in ESXi.”

Duncan Epping of Yellow Bricks also reports:

“Jumbo frames are not supported for VMkernel networking interfaces in ESXi. (page 54)”

One month after the release of ESXi 4 Update 1, Charu Chaubal of VMware posted on the ESXi Chronicles blog:

“I am happy to say that this is merely an error in the documentation. In fact, ESXi 4.0 DOES support Jumbo Frames on VMkernel networking interfaces. The correction will hopefully appear in a new release of the documentation, but in the meantime, go ahead and configure Jumbo frames for your ESXi 4.0 hosts.”

Shortly after, Duncan Epping of Yellow Bricks confirms Charu Chaubal’s report that jumbo frames are supported on ESXi VMkernel networks.

Now, nearly two months after Charu’s clarification and three months after the release of ESXi 4 Update 1, the documentation remains dubious on page 54 stating that jumbo frames are not supported on ESXi 4 VMkernel networks which is a direct contradition to a VMware ESXi blog.

I opened a Business Critical Support SR with VMware on the question.  I was told by VMware BCS that jumbo frames are NOT supported on ESXi 4 Update 1 VMkernel networks and a reference was made to the documentatation on page 54. 

Our dedicated VMware onsite Engineer escalated and I was then told ESXi 4 Update 1 DOES support jumbo frames on VMkernel networks, making reference to Charu’s article.

Hey VMware, which is it?  If this is a documentation mistake, why are you dragging your feet in getting the documentation updated two months after a VMware employee discovers the error and blogs it?  Waiting for the next release of ESXi?  Unacceptable!  You update the public documentation as soon as you discover the error and be damned sure your BCS support Engineers know the right answer!  Do you know how much companies pay for BCS?  You owe your customers the correct answer.  If misinformation comes as a result of a known documentation error, SHAME ON YOU!  Architecture and design decisions are being made daily on this information or misinformation, which ever it may be.

Update 2/23/10:  Toby Kraft (@vmwarewriter on Twitter) will be updating the documentation by next week.  Thank you Toby!

Update 3/1/10:  VMware has updated their documentation to reflect currently supported configurations.  Thank you VMware (and Toby)!

Train Signal Releases vSphere Pro Vol 1

February 10th, 2010

Train Signal has released a new addition to its VMware library of training entitled VMware vSphere Pro Series Training Vol 1 which covers the topics of VMwareView, ThinApp, Nexus 1000V, and PowerCLI. The 11 hours of new content spans 20 videos and is authored by (in no particular order) industry recognized experts David Davis, Hal Rottenberg, and Rick Scherer.

General availability was February 9th meaning you can order today at an individual cost of $297, or purchase it in a bundle with Train Signal’s other vSphere videos for $594.

Here are two hints of what you’ll be getting with this new release:

Video 1 – Sample content from each of the 3 video authors in the course – a true overview of what you will see in the course featuring VMware View, Nexus 1000V, and PowerCLI.

Video 2 – “ThinApp your App in Under 5 minutes”

These are a great group of guys who really know there stuff. Order your copy of VMware vSphere Pro Series Training Vol 1 today!

Preferential Treatment for DPM Hosts

February 7th, 2010

Here’s a tip that’s so simple and probably well known that it could be categorized as a stupid pet trick.

As I’ve mentioned in the past, I leverage VMware DPM (an Enterprise licensing feature) in the lab so that during periods of lesser activity (while I’m at work or sleeping, or both), ESX hosts in the lab can be placed in standby mode to cut electricity consumption and save on the energy bill.  I haven’t taken the time to research how hosts in the cluster are arbitrarily chosen for standby mode.  Over the course of time, the pattern I have witnessed tells me it’s more of a round robin type selection.  For instance, today host A will be chosen for standby mode, tomorrow host B will be chosen, and the next day, again host A will be chosen.  Perhaps load is taken into the calculation.  I don’t honestly know.  It’s not important right now.

I’ve also mentioned in the past that I run both ESX and ESXi in the same vSphere cluster.  This is a VMware supported configuration. I do this so that I can get a daily dose of both host platform experiences.  I’m not shy in saying my platform preference is still ESX because of its Service Console. What can I say… old habits are hard to break, but I’m trying, I really am.  More often than not, I need ESX Service Console access for whatever reason.  When I pop in the lab and find out that the ESX host is in standby mode, it takes a good 5 minutes to wake it up and then work on the things I need to get done.

Enter DPM Host Options.  This feature lets me apply some rules in the host selection process for DPM.  In this case, I want DPM to do its thing and save me money, but I don’t want it to shut down the ESX host.  Rather, shut down the ESXi host instead.  To do this is simple.  Modify the cluster settings and disable DPM for the ESX host as shown below.

With this rule in place, DPM will always choose solo.boche.mcse for standby mode, which is the ESXi host.  The ESX host, lando.boche.mcse, has been disabled for DPM and should always remained powered on and ready for action.

Configure VMware ESX(i) Round Robin on EMC Storage

February 4th, 2010

I recently set out to enable VMware ESX(i) 4 Round Robin load balancing with EMC Celerra (CLARiiON) fibre channel storage.  Before I get to the details of how I did it, let me preface this discussion with a bit about how I interpret Celerra storage architecture. 

The Celerra is built on CLARiiON fibre channel storage and as such, it leverages the benefits and successes CLARiiON has built over the years.  I believe most CLARiiON’s are, by default, active/passive arrays from VMware’s perspective.  Maybe more accurately stated, all controllers are active, however, each controller has sole ownership of a LUN or set of LUNs.  If a host wants access to a LUN, it is preferable to go through the owning controller (the preferred path).  Attempts to access a LUN through any other controller than the owning controller will result in a “Trespass” in EMC speak.  A Trespass is shift in LUN ownership from one controller to another in order to service an I/O request from a fabric host.  When I first saw Trespasses in Navisphere, I was alarmed.  I soon learned that they aren’t all that bad in moderation.  EMC reports that a Trespass occurs EXTREMELY quickly and in almost all cases will not cause problems.  However, as with any array which adopts the LUN ownership model, stacking up enough I/O requests which force a race condition between controllers for LUN access, will cause a condition known as thrashing.   Thrashing causes storage latency and queuing as controllers play tug of war for LUN access.  This is why it is important for ESX hosts, which share LUN access, to consistently access LUNs via the same controller path.  

As I said, the LUN ownership model above is the “out-of-box” configuration for the Celerra, also known as Failover Mode 1 in EMC Navisphere.  The LUN path going through the owning controller will be the Active path from a VMware perspective.  Other paths will be Standby.  This is true for both MRU and Fixed path selection policies.  What I needed to know was how to enable Round Robin path selection in VMware.  Choosing Round Robin in the vSphere Client is easy enough, however, there’s more to it than that because the Celerra is still operating in Failover Mode 1 where I/O can only go through the owning controller. 

So the first step in this process is to read the CLARiiON/VMware Applied Technology Guide which says I need to change the Failover Mode of the Celerra from 1 to 4 using Navisphere (FLARE release 28 version 04.28.000.5.704 or later may be required).  A value of 4 tells the CLARiiON to switch to the ALUA (Asymmetric Logical Unit Access or Active/Active) mode.  In this mode, the controller/LUN ownership model still exists, however, instead of transferring ownership of the LUN to the other controller with a Trespass, LUN access is allowed through the non-owning controller.  The I/O is passed by the non-owning controller to the owning controller via the backplane and then to the LUN.  In this configuration, both controllers are Active and can be used to access a LUN without causing ownership contention or thrashing.  It’s worth mentioning right now that although both controllers are active, the Celerra will report to ESX the owning controller as the optimal path, and the non-owning controller as the non-optimal path.  This information will be key a little later on.  Each ESX host needs to be configured for Failover Mode 4 in Navisphere.  The easiest way to do this is to run the Failover Setup Wizard.  Repeat the process for each ESX host.  One problem I ran into here is that after making the configuration change, each host and HBA still showed a Failover Mode of 1 in the Navisphere GUI.  It was as if the Failover Setup Wizard steps were not persisting.  I failed to accept this so I installed the Navisphere CLI and verified each host with the following command: 

naviseccli -h <SPA_IP_ADDRESS> port -list –all

Output showed that Failover Mode 4 was configured:

Information about each HBA:
HBA UID:                 20:00:00:00:C9:8F:C8:C4:10:00:00:00:C9:8F:C8:C4
Server Name:             lando.boche.mcse
Server IP Address:       192.168.110.5
HBA Model Description:
HBA Vendor Description:  VMware ESX 4.0.0
HBA Device Driver Name:
Information about each port of this HBA:�
    SP Name:               SP A
    SP Port ID:            2
    HBA Devicename:        naa.50060160c4602f4a50060160c4602f4a
    Trusted:               NO
    Logged In:             YES
    Source ID:             66560
    Defined:               YES
    Initiator Type:           3
    StorageGroup Name:     DL385_G2
    ArrayCommPath:         1
    Failover mode:         4
    Unit serial number:    Array

Unfortunately, the CLARiiON/VMware Applied Technology Guide didn’t give me the remaining information I needed to actually get ALUA and Round Robin working.  So I turned to social networking and my circle of VMware and EMC storage experts on Twitter.  They put me on to the fact that I needed to configure SATP for VMW_SATP_ALUA_CX, something I wasn’t familiar with yet. 

So the next step is a multistep procedure to configure the Pluggable Storage Architecture on the ESX hosts.  More specifically, SATP (Storage Array Type Plugin) and the PSP (Path Selection Plugin), in that order. Duncan Epping provides a good foundation for PSA which can be learned here.

Configuring the SATP tells the PSA what type of array we’re using, and more accurately, what failover mode the array is running.  In this case, I needed to configure the SATP for each LUN to VMW_SATP_ALUA_CX which is the EMC CLARiiON (CX series) running in ALUA mode (active/active failover mode 4).  The command to do this must be issued on each ESX host in the cluster for each active/active LUN and is as follows: 

#set SATP
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.50060160c4602f4a50060160c4602f4a
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.60060160ec242700be1a7ec7a208df11
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.60060160ec242700bf1a7ec7a208df11
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.60060160ec2427001cac9740a308df11
esxcli nmp satp setconfig –config VMW_SATP_ALUA_CX –device naa.60060160ec2427001dac9740a308df11

The devices you see above can be found in the vSphere Client when looking at the HBA devices discovered.  You can also find devices with the following command on the ESX Service Console: 

esxcli nmp device list 

I found that changing the SATP requires a host reboot for the change to take effect (thank you Scott Lowe).  After the host is rebooted, the same command used above should reflect that the SATP has been set correctly: 

esxcli nmp device list 

Results in: 

naa.60060160ec2427001dac9740a308df11
    Device Display Name: DGC Fibre Channel Disk (naa.60060160ec2427001dac9740a308df11)
    Storage Array Type: VMW_SATP_ALUA_CX
    Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_ow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}}
    Path Selection Policy: VMW_PSP_FIXED
    Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPat=0,numBytesPending=0}
    Working Paths: vmhba1:C0:T0:L61 

Once the SATP is set, it is time to configure the PSP for each LUN to Round Robin.  You can do this via the vSphere Client, or you can issue the commands at the Service Console: 

#set PSP per device
esxcli nmp psp setconfig –config VMW_PSP_RR –device naa.60060160ec242700be1a7ec7a208df11
esxcli nmp psp setconfig –config VMW_PSP_RR –device naa.60060160ec242700bf1a7ec7a208df11
esxcli nmp psp setconfig –config VMW_PSP_RR –device naa.60060160ec2427001cac9740a308df11
esxcli nmp psp setconfig –config VMW_PSP_RR –device naa.60060160ec2427001dac9740a308df11 

#set PSP for device
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.50060160c4602f4a50060160c4602f4a
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.60060160ec242700be1a7ec7a208df11
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.60060160ec242700bf1a7ec7a208df11
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.60060160ec2427001cac9740a308df11
esxcli nmp device setpolicy –psp VMW_PSP_RR –device naa.60060160ec2427001dac9740a308df11 

Once again, running the command: 

esxcli nmp device list 

Now results in: 

naa.60060160ec2427001dac9740a308df11
    Device Display Name: DGC Fibre Channel Disk (naa.60060160ec2427001dac9740a308df11)
    Storage Array Type: VMW_SATP_ALUA_CX
    Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_ow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=2,TPG_state=AO}}
    Path Selection Policy: VMW_PSP_RR
    Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPat=0,numBytesPending=0}
    Working Paths: vmhba1:C0:T0:L61 

Notice the Path Selection Policy has now changed to Round Robin. 

I’m good to go, right?  Wrong.  I struggled with this last bit for a while.  Using ESXTOP and IOMETER, I could see that I/O was still only going down one path instead of two.  Then I remembered something Duncan Epping had said to me in an earlier conversation a few days ago.  He mentioned something about the array reporting optimal and non-optimal paths to the PSA.  I printed out a copy of the Storage Path and Storage Plugin Management with esxcli document from VMware and took it to lunch with me.  The answer was buried on page 88.  The nmp roundrobin setting useANO is configured by default to 0 which means unoptimized paths reported by the array will not be included in Round Robin path selection unless optimized paths become unavailable.  Remember I said early on that unoptimized and optimized paths reported by the array would be a key piece of information.  We can see this in action by looking at the device list above.  The very last line shows working paths, and only one path is listed for Round Robin use – the optimized path reported by the array.  The fix here is to issue the following command, again on each host for all LUNs in the configuration: 

#use non-optimal paths for Round Robin
esxcli nmp roundrobin setconfig –useANO 1 –device naa.50060160c4602f4a50060160c4602f4a
esxcli nmp roundrobin setconfig –useANO 1 –device naa.60060160ec242700be1a7ec7a208df11
esxcli nmp roundrobin setconfig –useANO 1 –device naa.60060160ec242700bf1a7ec7a208df11
esxcli nmp roundrobin setconfig –useANO 1 –device naa.60060160ec2427001cac9740a308df11
esxcli nmp roundrobin setconfig –useANO 1 –device naa.60060160ec2427001dac9740a308df11

Once again, running the command: 

esxcli nmp device list 

Now results in: 

naa.60060160ec2427001dac9740a308df11
    Device Display Name: DGC Fibre Channel Disk (naa.60060160ec2427001dac9740a308df11)
    Storage Array Type: VMW_SATP_ALUA_CX
    Storage Array Type Device Config: {navireg=on, ipfilter=on}{implicit_support=on;explicit_support=on;explicit_allow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}
TPG_id=2,TPG_state=AO}}
    Path Selection Policy: VMW_PSP_RR
    Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=1;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
    Working Paths: vmhba0:C0:T0:L61, vmhba1:C0:T0:L61 

Notice the change in useANO which now reflects a value of 1.  In addition, I now have two Working Paths – an optimized path and an unoptimized path. 

I fired up ESXTOP and IOMETER which now showed a flurry of I/O traversing both paths.  I kid you not, it was a Clark Griswold moment when all the Christmas lights on the house finally worked.

So it took a while to figure this out but with some reading and the help of experts, I finally got it, and I was extremely jazzed.  What would have helped was if VMware’s PSA was more plug and play with various array types.  For instance, why can’t PSA recognize ALUA on the CLARiiON and automatically configure SATP for VMW_SATP_ALUA_CX?  Why is a reboot required for an SATP change?  PSA configuration in the vSphere client might have also been convenient but I recognize has diminishing returns or practical use with a large amount of hosts and/or LUNs to configure.  Scripting and CLI is the way to go for consistency and automation reasons or how about PSA configuration via Host Profiles? 

I felt a little betrayed and confused by the Navisphere GUI reflecting Failover Mode 1 after several attempts to change it to 4.  I was looking at host connectivity status. Was I looking in the wrong place? 

Lastly, end to end documentation on how to configure Round Robin would have helped a lot.  EMC got me part of the way there with the CLARiiON/VMware Applied Technology Guide document, but left me hanging, making no mention of the PSA configuration needed.  I’m getting that the end game for EMC multipathing today is PowerPath, which is fine – I’ll get to that, but I really wanted to do some testing with native Round Robin first, if for no other reason to establish a baseline to compare PowerPath to once I get there. 

Thanks again to the people I leaned on to help me through this.  It was the usual crew who can always be counted on.

Service Console Directory Listing Text Color in PuTTY

January 25th, 2010

Curious about the default colors you see in a remote PuTTY session connected to the ESX Service Console?  Some are obvious such as the directory listings which show up as blue text on a black background.  Another obvious one is the compressed .tar.gz file which will show up in a nicely contrasting red text on black background.  Or how about this one which I’m sure you’ve seen, executable scripts are shown as green text on a black background.  You might be asking yourself “What about the oddball ones I see from time to time which don’t have an explanation?”  I’ve provided an example in the screenshot – a folder named isos shows up with a green background and blue text.  What does that mean? 

There’s a way to find out.  While in the remote PuTTY session connected to the ESX Service Console, run the command dircolors -p from any directory.  Here’s the default legend:

# Below are the color init strings for the basic file types. A color init
# string consists of one or more of the following numeric codes:
# Attribute codes:
# 00=none 01=bold 04=underscore 05=blink 07=reverse 08=concealed
# Text color codes:
# 30=black 31=red 32=green 33=yellow 34=blue 35=magenta 36=cyan 37=white
# Background color codes:
# 40=black 41=red 42=green 43=yellow 44=blue 45=magenta 46=cyan 47=white
NORMAL 00 # global default, although everything should be something.
FILE 00 # normal file
DIR 01;34 # directory
LINK 01;36 # symbolic link. (If you set this to ‘target’ instead of a
 # numerical value, the color is as for the file pointed to.)
FIFO 40;33 # pipe
SOCK 01;35 # socket
DOOR 01;35 # door
BLK 40;33;01 # block device driver
CHR 40;33;01 # character device driver
ORPHAN 40;31;01 # symlink to nonexistent file
SETUID 37;41 # file that is setuid (u+s)
SETGID 30;43 # file that is setgid (g+s)
STICKY_OTHER_WRITABLE 30;42 # dir that is sticky and other-writable (+t,o+w)
OTHER_WRITABLE 34;42 # dir that is other-writable (o+w) and not sticky
STICKY 37;44 # dir with the sticky bit set (+t) and not other-writable
# This is for files with execute permission:
EXEC 01;32
# List any file extensions like ‘.gz’ or ‘.tar’ that you would like ls
# to colorize below. Put the extension, a space, and the color init string.
# (and any comments you want to add after a ‘#’)
# If you use DOS-style suffixes, you may want to uncomment the following:
#.cmd 01;32 # executables (bright green)
#.exe 01;32
#.com 01;32
#.btm 01;32
#.bat 01;32
.tar 01;31 # archives or compressed (bright red)
.tgz 01;31
.arj 01;31
.taz 01;31
.lzh 01;31
.zip 01;31
.z 01;31
.Z 01;31
.gz 01;31
.bz2 01;31
.deb 01;31
.rpm 01;31
.jar 01;31
# image formats
.jpg 01;35
.jpeg 01;35
.gif 01;35
.bmp 01;35
.pbm 01;35
.pgm 01;35
.ppm 01;35
.tga 01;35
.xbm 01;35
.xpm 01;35
.tif 01;35
.tiff 01;35
.png 01;35
.mov 01;35
.mpg 01;35
.mpeg 01;35
.avi 01;35
.fli 01;35
.gl 01;35
.dl 01;35
.xcf 01;35
.xwd 01;35
# audio formats
.flac 01;35
.mp3 01;35
.mpc 01;35
.ogg 01;35
.wav 01;35

 

Applied to the screenshot example above, the legend tells us that the isos directory is: OTHER_WRITABLE 34;42 # dir that is other-writable (o+w) and not sticky.

Another color you may commonly see which I haven’t yet mentioned is cyan which identifies symbolic links.  These can be found in several directories.  Most often you will see symbolic links in /vmfs/volumes/ connecting a friendly datastore name with it’s not so friendly volume name which is better known by the VMkernel.

That’s it. Not what I would considering Earth shattering material here, but maybe you’ve seen these colors before and haven’t connected the dots on their meaning.  For people with Linux background, this is probably old hat.

VMTN Storage Performance Thread and the EMC Celerra NS-120

January 23rd, 2010

The VMTN Storage Performance Thread is a collaboration of storage performance results on VMware virtual infrastructure provided by VMTN Community members around the world.  The thread starts here, was locked due to length, and continues on in a new thread here.  There’s even a Google Spreadsheet version, however, activity in that data repository appears to have diminished long ago.  The spirit of the testing is outlined by thread creater and VMTN Virtuoso christianZ

“My idea is to create an open thread with uniform tests whereby the results will be all inofficial and w/o any warranty. If anybody shouldn’t be agreed with some results then he can make own tests and presents his/her results too. I hope this way to classify the different systems and give a “neutral” performance comparison. Additionally I will mention that the performance [and cost] is one of many aspects to choose the right system.” 

Testing standards are defined by christianZ so that results from each submission are consistent and comparable.  A pre-defined template is used in conjunction with IOMETER to generate the disk I/O and capture the performance metrics.  The test lab environment and the results are then appended to the thread discussion linked above.  The performance metrics measured are:

  1. Average Response Time (in Milliseconds, lower is better) – also known as latency of which VMware declares a potential problem threshold of 50ms in their Scalable Storage Performance whitepaper
  2. Average I/O per Second (number of I/Os, higher is better)
  3. Average MB per Second (in MB, higher is better)

Following are my results with the EMC Celerra NS-120 Unified Storage array

SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / RAID 5
SAN TYPE / HBAs: Emulex dual port 4Gb Fiber Channel, HP StorageWorks 2Gb SAN switch
OTHER: Disk.SchedNumReqOutstanding and HBA queue depth set to 64 

Fibre Channel SAN Fabric Test

Test Name Avg. Response Time Avg. I/O per Second Avg. MB per Second
Max Throughput – 100% Read 1.62 35,261.29 1,101.92
Real Life – 60% Rand / 65% Read 16.71 2,805.43 21.92
Max Throughput – 50% Read 5.93 10,028.25 313.38
Random 8K – 70% Read 11.08 3,700.69 28.91
  
 
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: swISCSI
OTHER: Shared NetGear 1Gb SoHo Ethernet switch

swISCSI Test

Test Name Avg. Response Time Avg. I/O per Second Avg. MB per Second
Max Throughput – 100% Read 17.52 3,426.00 107.06
Real Life – 60% Rand / 65% Read 14.33 3,584.53 28.00
Max Throughput – 50% Read 11.33 5,236.50 163.64
Random 8K – 70% Read 15.25 3,335.68 22.06
  
 
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: NFS
OTHER: Shared NetGear 1Gb SoHo Ethernet switch

NFS Test

Test Name Avg. Response Time Avg. I/O per Second Avg. MB per Second
Max Throughput – 100% Read 17.18 3,494.48 109.20
Real Life – 60% Rand / 65% Read 121.85 480.81 3.76
Max Throughput – 50% Read 12.77 4,718.29 147.45
Random 8K – 70% Read 123.41 478.17 3.74

Please read further below for futher NFS testing results after applying EMC Celerra best practices

Fibre Channel Summary

Not surprisingly, Celerra over SAN fabric beats the pants off of the shared storage solutions I’ve had in the lab previously, HP MSA1000 and Openfiler 2.2 swISCSI before that, in all four IOMETER categories.  I was, however, pleasantly surprised to find that Celerra over fibre channel was one of the top performing configurations among a sea of HP EVA, Hitachi, NetApp, and EMC CX series frames.

swISCSI Summary

Celerra over swISCSIwas only slightly faster than the Openfiler 2.2 swISCSI on HP Proliant ML570 G2 hardware I had in the past on the Max Throughput-100%Read test. In the other three test categories, however, the Celerra left the Openfiler array in the dust.

NFS Summary

Moving on to Celerra over NFS, performance results were consistent with swISCSI in two test categories (Max Throughput-100%Read and Max Throughput-50%Read), but NFS performance numbers really dropped in the remaining two categories as compared to swISCSI (RealLife-60%Rand-65%Read and Random-8k-70%Read). 

What’s worth noting is that both the iSCSI and NFS datastores are backed by the same logical Disk Group and physical disks on the Celerra.  I did this purposely to compare the iSCSI and NFS protocols, with everything else being equal.  The differences in two out of the four categories are obvious.  The question came to mind:  Does the performance difference come from the Celerra, the VMkernel, or a combination of both?  Both iSCSI and NFS have evolved into viable protocols for production use in enterprise datacenters, therefore, I’m leaning AWAY from the theory that the performance degradation over NFS stems from the VMkernel. My initial conclusion here is that Celerra over NFS doesn’t perform as well with Random Read disk I/O patterns.  I welcome your comments and experience here.

Please read further below for futher NFS testing results after applying EMC Celerra best practices

CIFS

Although I did not test CIFS, I would like to take a look at its performance.  CIFS isn’t used directly by VMware virtual infrastructure, but it can be a handy protocol to leverage with NFS storage.  File management (ie. .ISOs, templates, etc.) on ESX NFS volumes becomes easier and more mobile and less tools are required when the NFS volumes are presented as CIFS shares on a predominantly Windows client network.  Providing adequate security through CIFS will be a must to protect the ESX datastore on NFS.

If you’re curious about storage array configuration and its impact on performance, cost, and availability, take a look at this RAID triangle which VMTN Master meistermn posted in one of the performance threads:

The Celerra stroage is currently carved out in the following way:

  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14  
DAE 2 FC FC FC FC FC FC FC FC FC FC FC FC FC FC FC DAE 2
DAE1 NAS NAS NAS NAS NAS Spr Spr                 DAE 1
DAE 0 Vlt Vlt Vlt Vlt Vlt NAS NAS NAS NAS NAS NAS NAS NAS NAS NAS DAE 0
  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14  

FC = fibre channel Disk Group

NAS = iSCSI/NFS Disk Groups

Spr = Hot Spare

Vlt = Celerra Valut drives

I’m very pleased with the Celerra NS-120.  With the first batch of tests complete, I’m starting to formulate ideas on when, where, and how to use the various storage protocols with the Celerra.  My goal is not to eliminate use of the slowest performing protocol in the lab.  I want to work with each of them on a continual basis to test future design and integration with VMware virtual infrastructure.

Update 1/30/10: New NFS performance numbers.  I’ve begun working with EMC vSpecialist to troubleshoot the performance descrepancies between swISCSI and NFS protocols.  A few key things have been identified and a new set of performance metrics have been posted below after making some changes:

  1. The first thing that the EMC vSpecialists (and others on the blog post comments) asked about was whether or not the file system uncached write mechanism was enabled. The uncached write mechanism is designed to improve performance for applications with many connections to a large file, such as a virtual disk file of a virtual machine.  This mechanism can enhance access to such large files through the NFS protocol.  Out of the box, the factory default is the uncached write mechanism is disabled on the Celerra. EMC recommends this feature be enabled with ESX(i).  The beauty here is that the feature can be toggled while the NFS file system is mounted on cluster hosts with VMs running on it.  VMware ESX Using EMC Celerra Storage Systems pages 99-101 outlines this recommendation.
  2. Per VMware ESX Using EMC Celerra Storage Systems pages 73-74, NFS send and receive buffers should be divisible by 32k on the ESX(i) hosts.  Again, these advanced settings can be adjusted on the hosts while VMs are running and the settings do not require a reboot.  EMC recommended a value of 64 (presumably for both).
  3. Use the maximum amount of write cache possible for Storage Processors (SPs). Factory defaults here:  598BM total read cache size, 32MB read cache size, 598MB total write cache size, 566MB write cache size.
  4. Specific to this test – verify that the ramp up time is 120 seconds.  Without the ramp up the results can be skewed. The tests I originall performed were with a 0 second ramp up time.

The new NFS performance tests are below, using some of the recommendations above: 

SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: NFS
OTHER: Shared NetGear 1Gb SoHo Ethernet switch

New NFS Test After Enabling the NFS file system Uncached Write Mechanism

VMware ESX Using EMC Celerra Storage Systems pages 99-101

Test Name Avg. Response Time Avg. I/O per Second Avg. MB per Second
Max Throughput – 100% Read 17.39 3,452.30 107.88
Real Life – 60% Rand / 65% Read 20.28 2,816.13 22.00
Max Throughput – 50% Read 19.43 3,051.72 95.37
Random 8K – 70% Read 19.21 2,878.05 22.48
Significant improvement here!  
 
 
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: NFS
OTHER: Shared NetGear 1Gb SoHo Ethernet switch

New NFS Test After Configuring
NFS.SendBufferSize = 256 (this was set at the default of 264 which is not divisible by 32k)
NFS.ReceiveBufferSize = 128 (this was already at the default of 128)

VMware ESX Using EMC Celerra Storage Systems pages 73-74

Test Name Avg. Response Time Avg. I/O per Second Avg. MB per Second
Max Throughput – 100% Read 17.41 3,449.05 107.78
Real Life – 60% Rand / 65% Read 20.41 2,807.66 21.93
Max Throughput – 50% Read  18.25  3,247.21  101.48
Random 8K – 70% Read  18.55  2,996.54  23.41
Slight change  
 
 
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: NFS
OTHER: Shared NetGear 1Gb SoHo Ethernet switch

New NFS Test After Configuring IOMETER for 120 second Ramp Up Time

Test Name Avg. Response Time Avg. I/O per Second Avg. MB per Second
Max Throughput – 100% Read  17.28  3,472.43  108.51
Real Life – 60% Rand / 65% Read  21.05  2,726.38  21.30
Max Throughput – 50% Read  17.73  3,338.72  104.34
Random 8K – 70% Read  17.70  3,091.17  24.15

Slight change

Due to the commentary received on the 120 second ramp up, I re-ran the swISCSI test to see if that changeded things much.  To fairly compare protocol performance, the same parameters must be used across the board in the tests.

SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: swISCSI
OTHER: Shared NetGear 1Gb SoHo Ethernet switch

New swISCSI Test After Configuring IOMETER for 120 second Ramp Up Time

Test Name Avg. Response Time Avg. I/O per Second Avg. MB per Second
Max Throughput – 100% Read  17.79  3,351.07  104.72
Real Life – 60% Rand / 65% Read  14.74  3,481.25  27.20
Max Throughput – 50% Read  12.17  4,707.39  147.11
Random 8K – 70% Read  15.02  3,403.39  26.59

swISCSI is still performing slightly better than NFS on the Random Reads, however, the margin is much closer

At this point I am content, stroke, happy, (borrowing UK terminology there) with NFS performance.  I am now moving on to ALUA, Round Robin, and PowerPath/VE testing.  I set up NPIV over the weekend with the Celerra as well – look for a blog post coming up on that.

Thank you EMC and to the folks who replied in the comments below with your help tackling best practices and NFS optimization/tuning!

Lab Update

January 19th, 2010

I thought I’d post a lab update since John Troyer nudged me letting me know this week’s weekly podcast was focusing home labs for VCP and VCDX studies.

Read more here.  Scroll down to the Lab Update section.