Archive for the ‘Virtualization’ category

ESXTOP drilldown

January 28th, 2009

Open up the service console on your ESX host and run the esxtop command.  You may already know that while in esxtop, interactively pressing the c, m, d, and n keys changes the esxtop focus to each of the four food groups:  CPU, Memory, Disk, and Network respectively, but did you know there are more advanced views for drilling down to more detailed information?

For example, we already know pressing the d key provides disk information from the adapter level which contains rolled up statistics from all current activity on the adapter:

1-28-2009 1-22-17 AM

 

Now try these interactive keys:

Press the u key to view disk information from the device level – this shows us statistics for each LUN per adapter:

1-28-2009 1-21-40 AM

Press the v key to view disk information from the VM level – the most granular level esxtop provides:

1-28-2009 1-22-39 AM

There’s also a key, when looking at CPU statistics, which will expand a VM showing the individual processes that make up that running VM.  Can you find it?  This will come in handy if you ever find yourself in the situation where you need to kill a VM from the service console.

If you would like to view the complete documentation for esxtop (known as man pages in the *nix world), use the command man esxtop in the service console.

ESXTOP is powerful tool whos capabilities extend quite a bit farther than what I’ve briefly talked about here.  I hope to see it in future versions of ESX (and ESXi).

Great iSCSI info!

January 27th, 2009

I’ve been using Openfiler 2.2 iSCSI in the lab for a few years with great success as a means for shared storage. Shared storage with VMware ESX/ESXi (along with the necessary licensing) allows us great things like VMotion, DRS, HA, etc. I’ve recently been kicking the tires of Openfiler 2.3 and have been anxious to implement partly due to the ease in its menu driven NIC bonding feature which I wanted to leverage for maximum disk I/O throughput.

Coincidentally, just yesterday a few of the big brains in the storage industry got together and published what I consider one of the best blog entries in the known universe. Chad Sakac and David Black (EMC), Andy Banta (VMware), Vaughn Stewart (NetApp), Eric Schott (Dell/EqualLogic), Adam Carter (HP/Lefthand) all conspired.

One of the iSCSI topics they cover is link aggregation over Ethernet. I read and re-read this section with great interest. My current swiSCSI configuration in the lab consists of a single 1Gb VMKernel NIC (along with a redundant failover NIC) connected to a single 1Gb NIC in the Openfiler storage box having a single iSCSI target with two LUNs. I’ve got more 1Gb NICs that I can add to the Openfiler storage box, so my million dollar question was “will this increase performance?” The short answer is NO with my current configuration. Although the additional NIC in the Openfiler box will provide a level of hardware redundancy, due to the way ESX 3.x iSCSI communicates with the iSCSI target, only a single Ethernet path will be used for by ESX to communicate to the single target backed by both LUNs.

However, what I can do to add more iSCSI bandwidth is to add the 2nd Gb NIC in the Openfiler box along with an additional IP address, and then configure an additional iSCSI target so that each LUN is mapped to a separate iSCSI target.  Adding the additional NIC in the Openfiler box for hardware redundancy is a no brainer and I probably could have done that long ago, but as far as squeezing more performance out of my modest iSCSI hardware, I’m going to perform some disk I/O testing to see if the single Gb NIC is a disk I/O bottleneck.  I may not have enough horsepower under the hood of the Openfiler box to warrant going through the steps of adding additional iSCSI targets and IP addressing.

A few of the keys I extracted from the blog post are as follows:

“The core thing to understand (and the bulk of our conversation – thank you Eric and David) is that 802.3ad/LACP surely aggregates physical links, but the mechanisms used to determine the whether a given flow of information follows one link or another are critical.

Personally, I found this doc very clarifying.: http://www.ieee802.org/3/hssg/public/apr07/frazier_01_0407.pdf

You’ll note several key things in this doc:

* All frames associated with a given “conversation” are transmitted on the same link to prevent mis-ordering of frames. So what is a “conversation”? A “conversation” is the TCP connection.
* The link selection for a conversation is usually done by doing a hash on the MAC addresses or IP address.
* There is a mechanism to “move a conversation” from one link to another (for loadbalancing), but the conversation stops on the first link before moving to the second.
* Link Aggregation achieves high utilization across multiple links when carrying multiple conversations, and is less efficient with a small number of conversations (and has no improved bandwith with just one). While Link Aggregation is good, it’s not as efficient as a single faster link.”

the ESX 3.x software initiator really only works on a single TCP connection for each target – so all traffic to a single iSCSI Target will use a single logical interface. Without extra design measures, it does limit the amount of IO available to each iSCSI target to roughly 120 – 160 MBs of read and write access.

“This design does not limit the total amount of I/O bandwidth available to an ESX host configured with multiple GbE links for iSCSI traffic (or more generally VMKernel traffic) connecting to multiple datastores across multiple iSCSI targets, but does for a single iSCSI target without taking extra steps.

Question 1: How do I configure MPIO (in this case, VMware NMP) and my iSCSI targets and LUNs to get the most optimal use of my network infrastructure? How do I scale that up?

Answer 1: Keep it simple. Use the ESX iSCSI software initiator. Use multiple iSCSI targets. Use MPIO at the ESX layer. Add Ethernet links and iSCSI targets to increase overall throughput. Ser your expectation for no more than ~160MBps for a single iSCSI target.

Remember an iSCSI session is from initiator to target. If use multiple iSCSI targets, with multiple IP addresses, you will use all the available links in aggregate, the storage traffic in total will load balance relatively well. But any individual one target will be limited to a maximum of single GbE connection’s worth of bandwidth.

Remember that this also applies to all the LUNs behind that target. So, consider that as you distribute the LUNs appropriately among those targets.

The ESX initiator uses the same core method to get a list of targets from any iSCSI array (static configuration or dynamic discovery using the iSCSI SendTargets request) and then a list of LUNs behind that target (SCSI REPORT LUNS command).”

Question 4: Do I use Link Aggregation and if so, how?

Answer 4: There are some reasons to use Link Aggregation, but increasing a throughput to a single iSCSI target isn’t one of them in ESX 3.x.

What about Link Aggregation – shouldn’t that resolve the issue of not being able to drive more than a single GbE for each iSCSI target? In a word – NO. A TCP connection will have the same IP addresses and MAC addresses for the duration of the connection, and therefore the same hash result. This means that regardless of your link aggregation setup, in ESX 3.x, the network traffic from an ESX host for a single iSCSI target will always follow a single link.

For swiSCSI users, they also mention some cool details about what’s coming in the next release of ESX/ESXi. Those looking for more iSCSI performance will want to pay attention. 10Gb Ethernet is also going to be a game changer, further threatening fibre channel SAN technologies.

I can’t stress enough how neat and informative this article is. To boot, technology experts from competing storage vendors pooled their knowledge for the greater good. That’s just awesome!

How to install Windows 7 on VMware Fusion

January 25th, 2009

The VMware Fusion team has put together a great “how to” guide for installing Microsoft Windows 7 (beta) on VMware Fusion on Mac.  Complete with screenshots and detailed explanations, this resource should have you up and running Windows 7 in no time.

I’m hearing from various people in the trenches that Windows 7 on a VM runs very well, better than Vista, and one report says with as little as 512MB RAM.  Sometimes it’s hard to tell if people are more excited about running the new Windows OS as a VM, or the fact that the Windows promise land that Vista never provided may be right around the corner.

Check it out!

Help with license keys

January 20th, 2009

Purchasing a product and not being able to install or use it due to licensing issues can be frustrating.  VMware provides at least two resource inlets to resolve licensing issues:

Help with License Keys:

Contact VI-hotline@vmware.com

or

Call 1.877.4.VMware (1-877-486-9273)

In addition to the above, you should be able to talk to your local VMware rep. who should be more than willing to help.

While I’m on the subject, here’s a link to manage your online VMware account where you can:

  • Manage orders
  • Register a product
  • Manage product licenses
  • Find a serial number
  • Manage subscriptions

Plus:  The VMware Product Licensing home page

Lastly, a link to VMware Infrastructure 3 Pricing, Packaging, and Licensing Overview (a great document I might add)

KB1008130: VMware ESX and ESXi 3.5 U3 I/O failure on SAN LUN(s) and LUN queue is blocked indefinitely

January 19th, 2009

I became aware of this issue last week by word of mouth and received the official Email blast from VMware this morning.

The vulnerability lies in a convergence of circumstances:

1. Fibre channel SAN storage with multipathing
2. A fibre channel SAN path failure or planned path transition
3. Metadata update occurring during the fibre channel SAN path failure where metadata updates include but are not limited to:

a. Power operations of a VM
b. Snapshot operations of a VM (think backups)
c. Storage VMotion (sVMotion)
d. Changing a file’s attributes
e. Creating a VMFS volume
f. Creating, modifying, deleting, growing, or locking of a file on a VMFS volume

The chance of a fibre channel path failure can be rated as slim, however, metadata updates can happen quite frequently, or more often than you might think. Therefore, if a fibre channel path failure occurs, chances are good that a metadata update could be in flight which is precisely when disaster will strike. Moreover, the safety benefit and reliance on multipathing is diminished by the vulnerability.

Please be aware of this.

Dear ESX 3.5 Customer,

Our records indicate you recently downloaded VMware® ESX Version 3.5 U3 from our product download site. This email is to alert you that an issue with that product version could adversely effect your environment. This email provides a detailed description of the issue so that you can evaluate whether it affects you, and the next steps you can take to get resolution or avoid encountering the issue.

ISSUE DETAILS:
VMware ESX and ESXi 3.5 U3 I/O failure on SAN LUN(s) and LUN queue is blocked indefinitely. This occurs when VMFS3 metadata updates are being done at the same time failover to an alternate path occurs for the LUN on which the VMFS3 volume resides. The effected releases are ESX 3.5 Update 3 and ESXi 3.5 U3 Embedded and Installable with both Active/Active or Active/Passive SAN arrays (Fibre Channel and iSCSI).

PROBLEM STATEMENT AND SYMPTONS:
ESX or ESXi Host may get disconnected from Virtual Center
All paths to the LUNs are in standby state
Esxcfg-rescan might take a long tome to complete or never complete (hung)
VMKernel logs show entries similar to the following:

Queue for device vml.02001600006006016086741d00c6a0bc934902dd115241 49442035 has been blocked for 6399 seconds.

Please refer to KB 1008130.

SOLUTION:
A reboot is required to clear this condition.

VMware is working on a patch to address this issue. The knowledge base article for this issue will be updated after the patch is available.

NEXT STEPS:
If you encounter this condition, please collect the following information and open an SR with VMware Support:

1. Collect a vsi dump before reboot using /usr/lib/vmware/bin/vsi_traverse.

2. Reboot the server and collect the vm-support dump.

3. Note the activities around the time where a first “blocked for xxxx seconds” message is shown in the VMkernel.

Please consult your local support center if you require further information or assistance. We apologize in advance for any inconvenience this issue may cause you. Your satisfaction is our number one goal.

Update:  The patch has been released that resolves this

Computerworld: VMware among the 9 hottest skills for 2009

January 19th, 2009

8. Data center
Most of the glass-house buzz is about server and storage virtualization projects that help organizations lower their energy costs and shrink their data center footprints.

But few companies are recruiting specifically for data center skills. Instead, they’re retraining existing staff in VMware and other virtualization technologies. For instance, Aspen Skiing is considering virtualizing up to 40% of its servers in 2009, says Major. To achieve that, Aspen Skiing plans to rely on VMware and EMC to provide staff with the necessary training.

Read more here.

Virtualize your game servers

January 18th, 2009

Why not?  This is a 24 player Team Fortress 2 dedicated server running on Windows Server 2003 on top of VMware ESX 3.5.0 Update 3.

  • 1 vCPU
  • 512MB vRAM
  • 1 vNIC
  • Periodic banner advertisements in game spreading the word to gamers about VMware virtualization and its practical application to the gaming community which has a large following

Click on the image below for a larger version:

tf2vmw

One month CPU utilization averages:

tf2vmwcpu