Posts Tagged ‘VMware’

Father’s Day Fun With ESXi Annotations

June 19th, 2011

Tired of the same old DCUI look of your ESXi host?

SnagIt Capture

Change it with your own custom Annotation (found under Host|Configuration|Software|Advanced Settings):

SnagIt Capture

Viola!

SnagIt Capture

Clearly this is more fun than any one person should be allowed to have.

Clear the Annotation field to restore the original ESXi look.

Disk.SchedNumReqOutstanding and Queue Depth

June 16th, 2011

There is a VMware storage whitepaper available which is titled Scalable Storage Performance.  It is an oldie but goodie.  In fact, next to VMware’s Configuration Maximums document,  it is one of my favorites and I’ve referenced it often.  I like it because it is efficient in specifically covering block storage LUN queue depth and SCSI reservations.  It was written pre-VAAI but I feel the concepts are still quite relevant in the block storage world.

One of the interrelated components of queue depth on the VMware side is the advanced VMkernel parameter Disk.SchedNumReqOutstanding.  This setting determines the maximum number of active storage commands (IO) allowed at any given time at the VMkernel.  In essence, this is queue depth at the hypervisor layer.  Queue depth can be configured at various points in the path of an IO such as the VMkernel which I already mentioned, in addition to the HBA hardware layer, the kernel module (driver) layer, as well as at the guest OS layer.

Getting back to Disk.SchedNumReqOutstanding, I’ve always lived by the definition I felt was most clear in the Scalable Storage Performance whitepaper.  Disk.SchedNumReqOutstanding is the maximum number of active commands (IO) per LUN.  Clustered hosts don’t collaborate on this value which implies this queue depth is per host.  In other words, each host has its own independent queue depth, again, per LUN.  How does Disk.SchedNumReqOutstanding impact multiple VMs living on the same LUN (again, same host)?  The whitepaper states each VM will evenly share the queue depth (assuming each VM has identical shares from a storage standpoint).

When virtual machines share a LUN, the total number of outstanding commands permitted from all virtual machines to that LUN is governed by the Disk.SchedNumReqOutstanding configuration parameter that can be set using VirtualCenter. If the total number of outstanding commands from all virtual machines exceeds this parameter, the excess commands are queued in the ESX kernel.

I was recently challenged by a statement agreeing to all of the above but with one critical exception:  Disk.SchedNumReqOutstanding provides an independent queue depth for each VM on the LUN.  In other words, if Disk.SchedNumReqOutstanding is left at its default value of 32, then VM1 has a queue depth of 32, VM2 has a queue depth of 32, and VM3 has its own independent queue depth of 32.  Stack those three VMs and we arrive at a sum total of 96 outstanding IOs on the LUN.  A few sources were provided to me to support this:

Fibre Channel SAN Configuration Guide:

You can adjust the maximum number of outstanding disk requests with the Disk.SchedNumReqOutstanding parameter in the vSphere Client. When two or more virtual machines are accessing the same LUN, this parameter controls the number of outstanding requests that each virtual machine can issue to the LUN.

VMware KB Article 1268 (Setting the Maximum Outstanding Disk Requests per Virtual Machine):

You can adjust the maximum number of outstanding disk requests with the Disk.SchedNumReqOutstanding parameter. When two or more virtual machines are accessing the same LUN (logical unit number), this parameter controls the number of outstanding requests each virtual machine can issue to the LUN.

The problem with the two statements above is that I feel they are poorly worded, and as a result, misinterpreted.  I understand what the statement is trying to say, but it’s implying something quite a bit different depending on how a person reads it.  Each statement is correct in that Disk.SchedNumReqOutstanding will gate the amount of active IO possible per LUN and ultimately per VM.  However, the wording implies that the value assigned to Disk.SchedNumReqOutstanding applies individually to each VM which is not the case.  The reason I’m pointing this out is due to the number of misinterpretations I’ve subsequently discovered via Google which I gather are the result of reading one of the latter sources above.

The scenario can be quickly proven in the lab.  Disk.SchedNumReqOutstanding is configured for the default value of 32 active IOs.  Using resxtop, I see my three VMs cranking out IO with IOMETER.  Each VM is configured with IOMETER to create 32 active IOs.  If what I’m being told by the challenge is true, I should be seeing 96 active IO being generated to the LUN from the combined activity of the three VMs.

Snagit Capture

But that’s not what’s happening.  Instead what I see is approximately 32 ACTV (active) IOs on the LUN, with another 67 IOs waiting in queue (by the way, ESXTOP statistic definitions can be found here).  In my opinion, the Scalable Storage Performance whitepaper most accurately and best defines the behavior of the Disk.SchedNumReqOutstanding value.

Snagit Capture

Now going back to the possibility of the Disk.SchedNumReqOutstanding stacking, LUN utilization could get out of hand rapidly with 10, 15, 20, 25 VMs per LUN.  We’d quickly exceed the max supported value of Disk.SchedNumReqOutstanding (and all HBAs I’m aware of) which is 256.  HBA ports themselves typically support a few thousand IOPS.  Stacking the queue depths for each VM could quickly saturate an HBA meaning we’d get a lot less mileage out of those ports as well.

While having a queue depth discussion, it’s also worth noting the %USD value is at 100% and LOAD is approximately 3.  The LOAD statistic corroborates the 3:1 ratio of total IO:queue depth and both figures paint the picture of an oversubscribed LUN from an IO standpoint.

In conclusion, I’d like to see VMware modify the wording in their documentation to provide better understanding leaving nothing open to interpretation.

Update 6/23/11:  Duncan Epping at Yellow Bricks responded with a great followup Disk.SchedNumReqOutstanding the story.

vmware-configcheck

June 11th, 2011

Detect corruption/integrity problems with your .vmx files with the following Service Console command (ESX only):

/usr/bin/vmware-configcheck | grep -v PASS

The command walks through the inventory and checks whether each virtual machine in the inventory matches the set of rules on configuration that is indicated in the rules file on the host: /etc/vmware/configrules

Why would you do this?  As stated above, the utility will detect corruption in the construct of a .vmx file.  Integrity issues in the .vmx file can cause a failed vMotion and the VM will crash.  I’ve seen it happen twice (out of tens of thousands of vMotion operations).

The command appears to have been deprecated and is not included in ESXi.

IBM x3850 NICs Lose Network Connectivity With ESXi 4.0 Update 1

June 11th, 2011

This is a heads up on an issue I ran into some time ago upgrading to VMware ESXi 4.0 Update 1 on an IBM System x3850.  Granted, it’s an aging hardware platform and fast becoming a dated issue, nonetheless this information may help someone out of a late night or weekend fiasco.

Shortly after the upgrade, VMs began experiencing intermittent losses in network connectivity.  Tied to the problem, the following error was revealed in the ESXi log files:

WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic7: transmit timed out

The root cause turned out to be a known issue with the e1000e driver on ESXi 4.0u1 and the IBM x3850.  The issue is documented well in VMware KB Article 1010313 (Intel 82571 NICs intermittently lose connectivity with ESX 4.x).  The KB article was updated last April and appears to still be giving VMware fits as it has spread to vSphere 4.1.  According to the KB article:

This issue may occur if the Message Signaled Interrupt (MSI) mode is enabled for the e1000e driver and this mode is not supported in a server platform. This driver supports these three interrupt modes:

  • 0.Legacy
  • 1.MSI
  • 2.MSI-X

ESX 4.0 added support for Message Signaled Interrupts in network and storage drivers. The default interrupt mode for the e1000e driver under ESX 4.x is MSI (1).

The workaround according to the KB article is to configure the e1000e driver to use Legacy (0) Interrupt mode (thus disabling MSI mode) by performing the following:

  1. Open a console to the ESX or ESXi host.
  2. To configure the e1000e module option IntMode and use Legacy (0) interrupts for a 4-port NIC, run the command:
    • esxcfg-module -s IntMode=0,0,0,0 e1000e
    • Note: A mode number must be specified for each NIC port. In case of 2 quad port NICs, specify the mode 0 for all 8 ports with the command:esxcfg-module -s IntMode=0,0,0,0,0,0,0,0 e1000e
  3. On ESX host, run this command to rebuild initrd:
    • esxcfg-boot -b
    • Note: This step is not applicable to ESXi hosts.
  4. Reboot the ESX/ESXi host for the changes to take effect.

Scripted Removal Of Non-present Hardware After A P2V

June 11th, 2011

After converting a physical machine to a virtual machine, it is considered a best practice to remove unneeded applications, software, services, and device drivers which were tied to the physical machine but no longer applicable to the present day virtual machine.  Performing this task from time to time manually isn’t too bad but at large scale, a manual process becomes inefficient.  There are tools available which will automate the process of removing unneeded device drivers (sometimes referred to as ghost hardware).  A former colleage put together a scripted solution for Windows VMs which I’m sharing here. 

Copy the .zip file to the virtual machine local hard drive, extract it, and follow the instructions in the readme.txt file.  I have not thoroughly tested the tool.  No warranties – use at your own risk.  I would suggest using it on a test machine first to become comfortable with the process before using it on production machines or using on a large scale basis.

Download: remnonpresent.zip (719KB)

VMunderground BPaaS (Beantown Party as a Service)

June 3rd, 2011

Event:  VMunderground BPaaS (Beantown Party as a Service): Tech Field Day 6 Edition

Calling all New England VMUG members, guests, and potential sponsors, as of Friday evening, there are still plenty of tickets available for the upcoming VMunderground BPaaS (Beantown Party as a Service).  I’ll be there and I hope to see you there as well.  Following are the announcement details from VMunderground:

VMunderground is happy to announce that we’ll be helping throw a virtualizaiton community bash with the Tech Field Day 6 crew!!  If you are in the Boston area or local enough to Fenway, mark your calendars for Thursday night, June 9.  It will be the mother of all FCoTR user group meetings!!  Planning a dance fight for FCoTR supporters versus the FCoAC (Avian Carrier) supporters.  It will be spectacular!

Seriously, it’s on. It’s happening. And their will be a big giant green monster involved.  Tech Field Day has the EMC Club at historic Fenway Park in Boston and TFD is opening up the event attendence to VMunderground! We have around 100 seats we’re looking to fill.  If you’re in the Boston area this night clear your calendar and sign up for a ticket and get ready to bring your “A” game to the VMunderground BPaaS (Beantown Party as a Service).   

If you’ve had the fortune of attending a VMunderground WuPaaS (Warm-up Party as a Service) at the last two VMworlds, you know that VMunderground is a blast.  Some of the smartest virtualization minds in the world, best sponsors, and always some good food, brews and conversations.  We’re currently rounding up some great sponsors to help support this community focused event. Stay tuned to learn more about this awesome opportunity to rub elbows with Boston’s finest vPeople and the Tech Field Day crew.  

If you’re interested in sponsoring the first installment of BPaaS, please contact Theron Conrey and Sean Clark at beantown@vkaboom.com (parent company of VMunderground).  We have a few slots open for sponsorship, but speak up soon to reserve your spot.

ATTENDEES:  Please note, this is NOT the VMworld party.  This is a first-time joint collaboration with Tech Field Day and you should probably live within Boston area before committing to this event.  VMunderground rocks, but not maybe not enough to fly into Boston for.  😉

TICKETS: Tickets go on sale in two 50 ticket batches: Friday, May 27 at 4 PM Eastern time.  And the evening of Monday, June 6.  There will be 100 tickets.  When they’re gone, they’re gone. Watch twitter or this site for more info.  @vSeanClark and @TheronConrey are a good place to start.

Co-scheduling Visualized

May 21st, 2011

I stumbled onto this time lapse video of 51 airplanes taking off (and others taxiing) at Boston’s Logan International Airport.  One thought immediately popped into my mind: co-scheduling, which is a function of The VMware vSphere CPU Scheduler.  The accelerated speed of the video really pronounced the importance of precision the scheduler is responsible for, which in this case is the air traffic controller (or controllers).

httpv://www.youtube.com/watch?v=3k-xG8XX1EM

How does this video relate to co-scheduling?

  • Imagine the planes represent CPU execution (or more accurately CPU execution requests).
  • Imagine the various runways & taxiways represent the number of vCPUs in a VM.

The scheduler is responsible for managing the traffic, making sure there’s a clear path for each plane to move forward and to be on time. 

  • With less runways and taxiways (vCPUs in a VM), scheduling complexity is reduced.
  • Adding runways and taxiways (vCPUs in a VM) increases scheduling complexity but with a limited number of planes (guest OS CPU execution requests), scheduling will still be manageable and planes will arrive on time.
  • Now add a significant number of planes (4 vCPU, 8 vCPU) to our multitude of criss/crossing runways and taxiways.  The precision required to avoid accidents and maintain fairness becomes extremely complex.  The result is high %RDY time for VMs on the host.

How do we deal with scheduling complexity?

  1. Right size VMs whether they are new builds or P2V.  A minimalist approach to resource guarantees is the best place to start when we’re working with consolidated infrastructure and shared resources.
  2. If you’ve already right sized VMs and you’re running into high %RDY times:
    • Balance workloads by mixing VMs having both lower and higher number of vCPUs on the same host/cluster
    • Add cores to the host/cluster by:
      • Scaling up (increasing the core count in the host)
      • Scaling out (increasing the number of hosts in the cluster)

(Video source: @GuyKawasaki‘s Holy Kaw!)