Archive for January, 2009

Critical ESX/ESXi 3.5.0 Update 3 patch released

January 31st, 2009

VMware ESX/ESXi 3.5.0 Update 3 introduced a bug whereby planned or unplanned pathing changes in a multipathed SAN LUN while VMFS3 metadata is being written can cause communication to the SAN LUN(s) to hault, resulting in the loss of virtual disk access (.vmdk) for VMs.  The issue is documented in full in VMware KB article 1008130.

A patch is now available in VMware KB article 1006651 which resolves the issue above as well as several others.

For users on ESX/ESX 3.5.0u3, I highly recommend applying this patch as soon as possible.

Train Signal training discount through the month of February

January 31st, 2009

Train Signal is offering an astounding 25% off any virtualization product they sell through the month of February 2009.

Here is a short sample of their VMware ESX training video where instructor David Davis talks about templates and cloning virtual machines:

To take advantage of the 25% off, use the code BOCHENET at checkout.

I know first hand that the economy is tough.  Take advantage of this offer and get top shelf training for your dollar.  Train Signal offers a 90 day money back guarantee if you are not completely satisfied.

NFL’s Super Bowl IT team gets ready for game day

January 31st, 2009

 I think this would be a neat gig, and probably somewhat stressful.  All infrastructure components from simple to the most advanced must be monitored thoroughly and must not be overlooked.  And hey, virtualization is involved which is a plus.  It’s too bad they don’t specify what flavor of virtualization.  Inquiring minds would like to know.  How about it Computerworld?

January 30, 2009 (Computerworld) The National Football League is fielding three teams for Sunday’s Super Bowl. The first two are well known: the Pittsburgh Steelers and Arizona Cardinals. The third, more anonymous one is the 17-member IT staff that the NFL has assigned to work in Tampa, Fla., the site of this year’s game.

That team was tasked with creating a complete IT operation for Super Bowl XLIII in a matter of weeks. Its coaches are Joe Manto, the NFL’s vice president of IT, and Jon Kelly, the league’s director of infrastructure computing. Their opponent is the same one that IT managers face everywhere: anything that can threaten system availability and uptime.

It doesn’t help matters that one of the four IBM BladeCenter S systems being used in Tampa is located on a wood floor in a tent that lacks any climate control capabilities. But so far, so good – and with the four BladeCenter boxes at different locations, and virtualization software ready to provide redundancy, neither Manto nor Kelly seems all that worried.

“It’s very exciting for IT guys,” Manto said of the experience of setting up a systems infrastructure for the Super Bowl. It’s unlike most IT projects, which involve creating systems that will provide ongoing support to users. Instead, the seven-day-a-week effort in Tampa has a short life span and a clear and unmovable deadline.

“That game is going to kick off on Sunday no matter what happens,” Manto said. And by Tuesday, the IT equipment will be disassembled, packed and shipped out of Tampa. “It’s really an open-and-closed operation, which is sort of unique in the IT world,” he said.

The IT staff has set up systems in a hotel to support business operations for about 200 NFL employees who are on-site in Tampa. It also has also built a tech operation at the convention center in Tampa to support 3,500 media representatives who are covering the event; that setup includes wireless networking and automated access to NFL data.

Another system will manage the credentialing of up to 25,000 people – everyone from construction workers to halftime performers. In addition, about 300 PCs have been networked together.

This is the first year that the NFL has completely turned over its server processing workload for the Super Bowl to blade systems. Each BladeCenter chassis includes two blade servers, each with a pair of sockets for quad-core chips. In the past, the league would bring “tens of servers” to the game to provide IT support, Kelly said.

Manto said he will be able to watch parts of the game, primarily on TV monitors, as he moves around Raymond James Stadium in Tampa checking on system operations. But for the most part, Sunday will be a 14-hour workday for the IT staff. “Our main goal,” he said, “is to make sure that everything about this event is accomplished professionally and in a way that gives the fans the best possible experience.”

 Article above originally posted here.

New product launch: iBac VIP for VMware Virtual Center

January 29th, 2009

Another VMware virtual infrastructure backup option. Options are good! This product works with both ESX as well as ESXi (requires VCB).

Licensing: One license ($5,495) covers all VMs and ESX hosts. Comparably speaking, another 3rd party virtualization management vendor charges approximately $500 per ESX/ESXi host CPU socket and also requires VCB for ESXi hosts. VCB licensing aside, in this comparison, iBac becomes attractive for infrastructures having 5+ 2-socket hosts, or 3+ 4-socket hosts (thankfully we don’t get dinged for multi core processors yet – who will be the first brave vendor, after Oracle, to license this way?)

From Idealstor:

“Idealstor, a leading developer of disk-to-disk backup solutions, announced today the release of iBac VIP for VMware Virtual Center. iBac VIP for Virtual Center was created to simplify VMware backups by offering a single license that backs up every virtual machine regardless of how many ESX hosts have been implemented.

Nandan Arora – Chief Technology Offer at Idealstor is quoted in this release

“Virtualization offers a unique set of tools that enables companies to consolidate servers but also to quickly provision new server instances as needed without having to incur the costs of implementing a physical server. Most software companies on the market today ignored this and released VMware backup solutions that are tied to the number of virtual machines, physical processors or ESX hosts running on the network. iBac VIP for Virtual Center was designed to turn this licensing model upside down. VIP for Virtual Center lets you backup any number of virtual machines regardless of the number of processors or ESX hosts being run.”Idealstor, a leading developer of disk-to-disk backup solutions, announced today the release of iBac VIP for VMware Virtual Center. iBac VIP for Virtual Center was created to simplify VMware backups by offering a single license that backs up every virtual machine regardless of how many ESX hosts have been implemented.

“iBac VIP was launched in 2008 and offers an enterprise backup solution for VMware virtual servers. The goal of iBac VIP was to offer an easy to use and easy to license backup solution for VMware virtual environments. The original release of iBac VIP was licensed based on the number of ESX hosts that were being run regardless of the number of VMs or processors on the host server. With the release of iBac VIP for Virtual Center, Idealstor seeks to further simplify VMware backups by offering backup administrators the options to choose between licensing the product per ESX host or Virtual Center. Suggested retail price for iBac VIP for Virtual Center is $5495.00.

iBac VIP ties into the VCB framework provided by VMware. Rather than having to run scripts or purchase expensive backup agents to backup each virtual machine, iBac VIP offers an easy use interface that allows backups administrators to efficiently manage their VMware backups. VIP backups can be managed from the proxy/backup server or from a remote machine running the VIP management console. Scheduling, advanced logging and email reports are available for all backup jobs. Recovery can be done at the file level or entire virtual machines can be recovered on the proxy or a specific ESX host.

“When we entered the VMware backup market we realized that most backup vendors were ignoring the flexibility and cost savings that were inherent to virtualization”, said Nandan Arora, chief technical officer at Idealstor. “Virtualization offers a unique set of tools that enables companies to consolidate servers but also to quickly provision new server instances as needed without having to incur the costs of implementing a physical server. Most software companies on the market today ignored this and released VMware backup solutions that are tied to the number of virtual machines, physical processors or ESX hosts running on the network. iBac VIP for Virtual Center was designed to turn this licensing model upside down. VIP for Virtual Center lets you backup any number of virtual machines regardless of the number of processors or ESX hosts being run. The only limitation is that the backup proxy server will need to be able to handle the load, but we feel that iBac VIP is affordable enough that if another proxy server needs to be added to handle the load, we will still be far more competitive than the existing players in the VMware backup space.”

About Idealstor
Idealstor manufactures removable/ejectable disk backup systems that are designed to augment or completely replace tape as backup and offsite storage media. The Idealstor Backup Appliance has been on the market for over 5 years offering a fast, reliable and portable alternative to tape based backup systems. Each Idealstor system uses industry standard SATA disk as the target for backup data and as offsite media. Systems range from 1 removable drive up to 8 and can be used by a range of businesses from SMB to corporate data centers. Disk capacities mirror that of the major SATA manufacturers. Uncompressed capacities of 200GB, 320GB, 400GB, 500GB, 750GB, 1TB and 1.5TB are currently available.”

Idle Memory Tax

January 29th, 2009

Memory over commit is a money/infrastructure saving feature that fits perfectly within the theme of two of virtualization’s core concepts: doing more with less hardware, and helping save the environment with greenness. While Microsoft Hyper-V offers no memory over commit or page sharing technologies, VMware has understood the value in these technologies long before VI3. I’ve mentioned this before – if you haven’t read it yet, take a look at Carl Waldspurger’s 2002 white paper on Memory Resource management in VMware ESX Server.

One of VMware’s memory over commit technologies is called Idle Memory Tax. IMT basically allows the VMKernel to reclaim unused guest VM memory by assigning a higher “cost value” to unused allocated shares. The last piece of that sentence is key – did you catch it? This mechanism is tied to shares. When do shares come into play? When there is contention for physical host RAM allocated to the VMs. Or in short, when physical RAM on the ESX host has been over committed – we’ve granted more RAM to guest VMs than we actually have on the ESX host to cover at one time. When this happens, there is contention or a battle for who actually gets the physical RAM. Share values are what determine this. I don’t want to get too far off track here as this discussion is specifically on Idle Memory Tax, but shares are the foundation so they are important to understand.

Back to Idle Memory Tax. Quite simply it’s a mechanism to take idle/unused memory from guest VMs that are hogging it in order to give that memory to another VM where it’s more badly needed. Sort of like Robin Hood for VI. By default this is performed using VMware’s balloon driver which is the more optimal of the two available methods. Out of the box, the amount of idle memory that will be reclaimed is 75% as configured by Mem.IdleTax under advanced host configuration. The VMKernel polls for idle memory in guest VMs every 60 seconds. This interval was doubled from ESX2.x where the polling period was every 30 seconds.

Here’s a working example of the scenario:

  • Two guest VMs live on an ESX/ESXi host with 8GB RAM
  • Each VM is assigned 8GB RAM and 8,192 shares. Discounting memory overhead, content based page sharing, and COS memory usage, we’ve effectively over committed our memory by 100%
  • VM1 is running IIS using only 1GB RAM
  • VM2 is running SQL and is request the use of all 8GB RAM
  • Idle Memory Tax allows the VMKernel to “borrow” 75% of the 7GB of allocated but unused RAM from VM1 and give it to VM2.  25% of the unused allocated RAM will be left for the VM as a cushion for requests for additional memory before other memory over commit technologies kick in

Here are the values under ESX host advanced configuration that we can tweak to modify the default behavior of Idle Memory Tax:

  • Mem.IdleTax – default: 75, range: 0 to 99, specifies the percent of idle memory that may be reclaimed by the tax
  • Mem.SamplePeriod – default: 60 in ESX3.x 30 in ESX2.x, range: 0 to 180, specifies the polling interval in seconds at which the VMKernel will scan for idle memory
  • Mem.IdleTaxType – default: 1 (variable), range: 0 (flat – use paging mechanism) to 1 (variable – use the balloon driver), specifies the method at which the VMKernel will reclaim idle memory. It is highly recommended to leave this at 1 to use the balloon driver as paging is more detrimental to the performance of the VM

VMware recommends that changes to Idle Memory Tax are not necessary, or even appropriate. If you get into the situation where Idle Memory Tax comes into play, you need to question the VMs that have large quantities of allocated but idle memory. Rather than allocating more memory to the VM than it needs, thus wrongly inflating its share value, consider reducing the allocated amounts of RAM to those VMs.

ESXTOP drilldown

January 28th, 2009

Open up the service console on your ESX host and run the esxtop command.  You may already know that while in esxtop, interactively pressing the c, m, d, and n keys changes the esxtop focus to each of the four food groups:  CPU, Memory, Disk, and Network respectively, but did you know there are more advanced views for drilling down to more detailed information?

For example, we already know pressing the d key provides disk information from the adapter level which contains rolled up statistics from all current activity on the adapter:

1-28-2009 1-22-17 AM

 

Now try these interactive keys:

Press the u key to view disk information from the device level – this shows us statistics for each LUN per adapter:

1-28-2009 1-21-40 AM

Press the v key to view disk information from the VM level – the most granular level esxtop provides:

1-28-2009 1-22-39 AM

There’s also a key, when looking at CPU statistics, which will expand a VM showing the individual processes that make up that running VM.  Can you find it?  This will come in handy if you ever find yourself in the situation where you need to kill a VM from the service console.

If you would like to view the complete documentation for esxtop (known as man pages in the *nix world), use the command man esxtop in the service console.

ESXTOP is powerful tool whos capabilities extend quite a bit farther than what I’ve briefly talked about here.  I hope to see it in future versions of ESX (and ESXi).

Great iSCSI info!

January 27th, 2009

I’ve been using Openfiler 2.2 iSCSI in the lab for a few years with great success as a means for shared storage. Shared storage with VMware ESX/ESXi (along with the necessary licensing) allows us great things like VMotion, DRS, HA, etc. I’ve recently been kicking the tires of Openfiler 2.3 and have been anxious to implement partly due to the ease in its menu driven NIC bonding feature which I wanted to leverage for maximum disk I/O throughput.

Coincidentally, just yesterday a few of the big brains in the storage industry got together and published what I consider one of the best blog entries in the known universe. Chad Sakac and David Black (EMC), Andy Banta (VMware), Vaughn Stewart (NetApp), Eric Schott (Dell/EqualLogic), Adam Carter (HP/Lefthand) all conspired.

One of the iSCSI topics they cover is link aggregation over Ethernet. I read and re-read this section with great interest. My current swiSCSI configuration in the lab consists of a single 1Gb VMKernel NIC (along with a redundant failover NIC) connected to a single 1Gb NIC in the Openfiler storage box having a single iSCSI target with two LUNs. I’ve got more 1Gb NICs that I can add to the Openfiler storage box, so my million dollar question was “will this increase performance?” The short answer is NO with my current configuration. Although the additional NIC in the Openfiler box will provide a level of hardware redundancy, due to the way ESX 3.x iSCSI communicates with the iSCSI target, only a single Ethernet path will be used for by ESX to communicate to the single target backed by both LUNs.

However, what I can do to add more iSCSI bandwidth is to add the 2nd Gb NIC in the Openfiler box along with an additional IP address, and then configure an additional iSCSI target so that each LUN is mapped to a separate iSCSI target.  Adding the additional NIC in the Openfiler box for hardware redundancy is a no brainer and I probably could have done that long ago, but as far as squeezing more performance out of my modest iSCSI hardware, I’m going to perform some disk I/O testing to see if the single Gb NIC is a disk I/O bottleneck.  I may not have enough horsepower under the hood of the Openfiler box to warrant going through the steps of adding additional iSCSI targets and IP addressing.

A few of the keys I extracted from the blog post are as follows:

“The core thing to understand (and the bulk of our conversation – thank you Eric and David) is that 802.3ad/LACP surely aggregates physical links, but the mechanisms used to determine the whether a given flow of information follows one link or another are critical.

Personally, I found this doc very clarifying.: http://www.ieee802.org/3/hssg/public/apr07/frazier_01_0407.pdf

You’ll note several key things in this doc:

* All frames associated with a given “conversation” are transmitted on the same link to prevent mis-ordering of frames. So what is a “conversation”? A “conversation” is the TCP connection.
* The link selection for a conversation is usually done by doing a hash on the MAC addresses or IP address.
* There is a mechanism to “move a conversation” from one link to another (for loadbalancing), but the conversation stops on the first link before moving to the second.
* Link Aggregation achieves high utilization across multiple links when carrying multiple conversations, and is less efficient with a small number of conversations (and has no improved bandwith with just one). While Link Aggregation is good, it’s not as efficient as a single faster link.”

the ESX 3.x software initiator really only works on a single TCP connection for each target – so all traffic to a single iSCSI Target will use a single logical interface. Without extra design measures, it does limit the amount of IO available to each iSCSI target to roughly 120 – 160 MBs of read and write access.

“This design does not limit the total amount of I/O bandwidth available to an ESX host configured with multiple GbE links for iSCSI traffic (or more generally VMKernel traffic) connecting to multiple datastores across multiple iSCSI targets, but does for a single iSCSI target without taking extra steps.

Question 1: How do I configure MPIO (in this case, VMware NMP) and my iSCSI targets and LUNs to get the most optimal use of my network infrastructure? How do I scale that up?

Answer 1: Keep it simple. Use the ESX iSCSI software initiator. Use multiple iSCSI targets. Use MPIO at the ESX layer. Add Ethernet links and iSCSI targets to increase overall throughput. Ser your expectation for no more than ~160MBps for a single iSCSI target.

Remember an iSCSI session is from initiator to target. If use multiple iSCSI targets, with multiple IP addresses, you will use all the available links in aggregate, the storage traffic in total will load balance relatively well. But any individual one target will be limited to a maximum of single GbE connection’s worth of bandwidth.

Remember that this also applies to all the LUNs behind that target. So, consider that as you distribute the LUNs appropriately among those targets.

The ESX initiator uses the same core method to get a list of targets from any iSCSI array (static configuration or dynamic discovery using the iSCSI SendTargets request) and then a list of LUNs behind that target (SCSI REPORT LUNS command).”

Question 4: Do I use Link Aggregation and if so, how?

Answer 4: There are some reasons to use Link Aggregation, but increasing a throughput to a single iSCSI target isn’t one of them in ESX 3.x.

What about Link Aggregation – shouldn’t that resolve the issue of not being able to drive more than a single GbE for each iSCSI target? In a word – NO. A TCP connection will have the same IP addresses and MAC addresses for the duration of the connection, and therefore the same hash result. This means that regardless of your link aggregation setup, in ESX 3.x, the network traffic from an ESX host for a single iSCSI target will always follow a single link.

For swiSCSI users, they also mention some cool details about what’s coming in the next release of ESX/ESXi. Those looking for more iSCSI performance will want to pay attention. 10Gb Ethernet is also going to be a game changer, further threatening fibre channel SAN technologies.

I can’t stress enough how neat and informative this article is. To boot, technology experts from competing storage vendors pooled their knowledge for the greater good. That’s just awesome!