Archive for the ‘Virtualization’ category

Redefining Disk.MaxLUN

March 27th, 2013

Regardless of what the vSphere host Advanced Setting Disk.MaxLUN has stated as its definition for years, “Maximum number of LUNs per target scanned for” is technically not correct.  In fact, it’s quite misleading.

Snagit Capture

The true definition looks similar stated in English but carries quite a different meaning and it can be found in my SnagIt hack above or within VMware KB 1998 Definition of Disk.MaxLUN on ESX Server Systems and Clarification of 128 Limit.

The Disk.MaxLUN attribute specifies the maximum LUN number up to which the ESX Server system scans on each SCSI target as it is discovering LUNs. If you have a LUN 131 on a disk that you want to access, for example, then Disk.MaxLUN must be at least 132. Don’t make this value higher than you need to, though, because higher values can significantly slow VMkernel bootup.

The 128 LUN limit refers only to the total number of LUNs that the ESX Server system is able to discover. The system intentionally stops discovering LUNs after it finds 128 because of various service console and management interface limits. Depending on your setup, you can easily have a situation in which Disk.MaxLUN is high (255) but you see few LUNs, or a situation in which Disk.MaxLUN is low (16) but you reach the 128 LUN limit because you have many targets.

For more information about limiting the number of LUNs visible to the server, see http://kb.vmware.com/kb/1467.

Note the last sentence in the first paragraph above in the KB article.  Keep the value as small as possible for your environment when using block storage.  vSphere ships with this value configured for maximum compatibility out of the box which is the max value of 256.  Assuming you don’t assign LUN numbers up to 256 in your environment, this value can be immediately ratcheted down in your build documentation or automated deployment scripts.  Doing so will decrease the elapsed time spent rescanning the fabric for block devices/VMFS datastores.  This tweak may be of particular interest at DR sites when using Site Recovery Manager to carry out a Recovery Plan test, a Planned Migration, or an actual DR execution.  It will allow for a more efficient use of RTO (Recovery Time Objective) time especially where multiple recovery plans are run consecutively.

vExpert 2013 Applications Available

March 22nd, 2013

John Troyer (you know him as @jtroyer on Twitter or the guy with the disco ball jacket at social events) has made the announcement that vExpert 2013 applications are now available. Simply put, a vExpert is the formal recognition, by VMware, of being a virtualization rock star.  I haven’t read the latest charter but technically speaking I don’t think one even needs to specifically be a VMware virtualization rock star (hey, we’re all in this virtualization space together for the greater good right?) but it certainly helps.

There are three separate but interrelated tracks to being recognized as a vExpert

  • Evangelist – You’re a blogger, regular speaker, VMTN contributor, etc. who shares the passion with the rest of the community.  You might be employed, but not by VMware or a partner. Nobody really knows.
  • Customer – You’re a customer internal facing proxy evangelist if that makes any sense whatsoever.  You get it.  You make sure your internal organization gets it.
  • VPN (VMware Partner Network) – You work for VMware or a partner and you’re either a rock star by choice or by force.  Either way, you know your stuff and you’re good at sharing with your customers.

The paths are separate but they all converge on fundamental traits within the virtualization community:  Passion. Enthusiasm. Leadership. Knowledge. Outreach.

If you’ve made contributions in any of the areas listed above, consider filling out an application for yourself.  Now is not the time to be modest or bashful.  It is the time to be showered with gifts of VMware licensing and the type of real world respect that is recognized in every corner of the globe.

My application is submitted and I’ve got my fingers crossed.  If I make vExpert 2013, I’ll be in the exclusive Five Timers club (vExpert 2009-2013 inclusive).  Why I remember so long ago receiving the news of my first vExpert award… I was at VMworld Europe in Cannes…

 

Seriously, here are the important links you need from VMware:

Recommend that someone apply for vExpert 2013: http://bit.ly/vExpert2013recommend

Apply for vExpert 2013: http://bit.ly/vExpert2013application The deadline for applications is April 15, 2013 at midnight PDT.

The existing VMware vExpert 2012 directory is at http://communities.vmware.com/vexpert.jspa.

For questions about the application process or the vExpert Program, use the comments below or email vexpert@vmware.com.

VMware vSphere Design 2nd Edition Now Available

March 20th, 2013

Snagit Capture

Publication Date: March 25, 2013 | ISBN-10: 1118407911 | ISBN-13: 978-1118407912 | Edition: 2

The big splash was officially made yesterday but I’m following up with my announcement a day later to help spread the message to anyone who may have been heads down and missed it.  Forbes Guthrie (Snagit Capture Snagit Capture), Scott Lowe (Snagit Capture Snagit Capture), and Kendrick Coleman (Snagit Capture Snagit Capture) have teamed up to produce VMware vSphere Design 2nd Edition (a followup refresh of the popular 1st Edition).

As Technical Editor, I’m one of the few fortunate individuals who have already had the pleasure to have read the book.  I can tell you that it is jam-packed with the deep technical detail, design perspective, and methodology you’d expect from these seasoned and well-respected industry experts.

The book is 528 pages in length (compare to 384 pages in the 1st edition).  New in this version is coverage of vSphere 5.1, emerging infrastructure technologies and trends, as well as a section on vCloud Director design – a worthy topic which should be weighing heavily on the minds of many by now and in the future will likely spawn dedicated coverage in texts by Sybex and/or other publishers.

The publisher has made the introduction section of the book freely available.  You can take a look at that by clicking this link which is hosted at Forbes vReference blog.  As with the previous edition, this book is made available in both paperback and Kindle editions.  Support these authors and pick up your copy today.  Tell them Jason sent you and nothing special will likely take place.

Large Memory Pages and Shrinking Consolidation Ratios

March 19th, 2013

Here’s a discussion that has somewhat come full circle for me and could prove to be a handy for those with lab or production environments alike.

A little over a week ago I was having lunch with a former colleague and naturally a TPS discussion broke out.  We talked about how it worked and how effective it was with small memory pages (4KB in size) as well as large memory pages (2MB in size).  The topic was brought up with a purpose in mind.

Many moons ago, VMware virtualized datacenters consisted mainly of Windows 2000 Server and Windows Server 2003 virtual machines which natively leverage small memory pages – an attribute built into the guest operating system itself.  Later, Windows Vista as well as 2008 and its successors came onto the scene allocating large memory pages by default (again – at the guest OS layer) to boost performance for certain workload types.  To maintain flexibility and feature support, VMware ESX and ESXi hosts have supported large pages by default providing the guest operating system requested them.  For those operating systems that still used the smaller memory pages, those were supported by the hypervisor as well.  This support and configuration remains the default today in vSphere 5.1 in an advanced host-wide setting called Mem.AllocGuestLargePage (1 to enable and support both large and small pages – the default, 0 to disable and force small pages).  VMware released a small whitepaper covering this subject several years ago titled Large Page Performance which summarizes lab test results and provides the steps required to toggle large pages in the hypervisor as well as within Windows Server 2003

As legacy Windows platforms were slowly but surely replaced by their Windows Server 2008, R2, and now 2012 predecessors, something began to happen.  Consolidation ratios gated by memory (very typical mainstream constraint in most environments I’ve managed and shared stories about) started to slip.  Part of this can be attributed to the larger memory footprints assigned to the newer operating systems.  That makes sense, but this only explains a portion of the story.  The balance of memory has evaporated as a result of modern guest operating systems using large 2MB memory pages which will not be consolidated by the TPS mechanism (until a severe memory pressure threshold is crossed but that’s another story discussed here and here).

For some environments, many I imagine, this is becoming a problem which manifests itself as an infrastructure capacity growth requirement as guest operating systems are upgraded.  Those with chargeback models where the customer or business unit paid up front at the door for their VM or vApp shells are now getting pinched because compute infrastructure doesn’t spread as thin as it once did.  This will be most pronounced in the largest of environments.  A pod or block architecture that once supplied infrastructure for 500 or 1,000 VMs now fills up with significantly less.

So when I said this discussion has come full circle, I meant it.  A few years ago Duncan Epping wrote an article called KB Article 1020524 (TPS and Nehalem) and a portion of this blog post more or less took place in the comments section.  Buried in there was a comment I had made while being involved in the discussion (although I don’t remember it).  So I was a bit surprised when a Google search dug that up.  It wasn’t the first time that has happened and I’m sure it won’t be the last.

Back to reality.  After my lunch time discussion with Jim, I decided to head to my lab which, from a guest OS perspective, was all Windows Server 2008 R2 or better, plus a bit of Linux for the appliances.  Knowing that the majority of my guests were consuming large memory pages, how much more TPS savings would result if I forced small memory pages on the host?  So I evacuated a vSphere host using maintenance mode, configured Mem.AllocGuestLargePage to a value of 0, then placed all the VMs back onto the host.  Shown below are the before and after results.

 

A decrease in physical memory utilization of nearly 20% per host – TPS is alive again:

Snagit Capture Snagit Capture

 

124% increase in Shared memory in Tier1 virtual Machines:

Snagit Capture Snagit Capture

 

90% increase in Shared memory in Tier3 virtual Machines:

Snagit Capture Snagit Capture

 

Perhaps what was most interesting was the manner in which TPS consolidated pages once small pages were enabled.  The impact was not realized right away nor was it a gradual gain in memory efficiency as vSphere scanned for duplicate pages.  Rather it seemed to happen in batch almost all at once 12 hours after large pages had been disabled and VMs had been moved back onto the host:

Snagit Capture

 

So for those of you who may be scratching your heads wondering what is happening to your consolidation ratios lately, perhaps this has some or everything to do with it.  Is there an action item to be carried out here? That depends on what your top priority when comparing infrastructure performance in one hand and maximized consolidation in the other.

Those who are on a lean infrastructure budget (home lab would be an ideal fit here), consider forcing small pages to greatly enhance TPS opportunities to stretch your lab dollar which has been getting consumed by modern operating systems and and increasing number of VMware and 3rd party appliances.

Can you safely disable large pages in production clusters? It’s a performance question I can’t answer that globally.  You may or may not see performance hit to your virtual machines based on their workloads.  Remember that the use of small memory pages and AMD Rapid Virtualization Indexing (RVI) and Intel Extended Page Tables (EPT) is mutually exclusive.  Due diligence testing is required for each environment.  As it is a per host setting, testing with the use of vMotion really couldn’t be easier.  Simply disable large pages on one host in a cluster and migrate the virtual machines in question to that host and let them simmer.  Compare performance metrics before and after.  Query your users for performance feedback (phrase the question in a way that implies you added horsepower instead of asking the opposite “did the application seem slower?”)

That said, I’d be curious to hear if anyone in the community disables large pages in their environments as a regular habit or documented build procedure and what the impact has been if any on both the memory utilization as well as performance.

Last but not least, Duncan has another good blog post titled How many pages can be shared if Large Pages are broken up?  Take a look at that for some tips on using ESXTOP to monitor TPS activity.

Update 3/21/13:  I didn’t realize Gabrie had written about this topic back in January 2011.  Be sure to check out his post Large Pages, Transparent Page Sharing and how they influence the consolidation ratio.  Sorry Gabrie, hopeuflly understand I wasn’t trying to steal your hard work and originality 🙂

Update 10/20/14:  VMware announced last week that inter-VM TPS (memory page sharing between VMs, not to be confused with memory page sharing within a single VM) will no longer be enabled by default. This default ESXi configuration change will take place in December 2014.

VMware KB Article 2080735 explains Inter-Virtual Machine TPS will no longer be enabled by default starting with the following releases:

ESXi 5.5 Update release – Q1 2015
ESXi 5.1 Update release – Q4 2014
ESXi 5.0 Update release – Q1 2015
The next major version of ESXi

Administrators may revert to the previous behavior if they so wish.

and…

Prior to the above ESXi Update releases, VMware will release ESXi patches that introduce additional TPS management capabilities. These ESXi patches will not change the existing settings for inter-VM TPS. The planned ESXi patch releases are:

ESXi 5.5 Patch 3. For more information, see VMware ESXi 5.5, Patch ESXi550-201410401-BG: Updates esx-base (2087359).
ESXi 5.1 patch planned for Q4, 2014
ESXi 5.0 patch planned for Q4, 2014

The divergence is in response to new research which leveraged TPS to gain unauthorized access to data. Under certain circumstances, a data security breach may occur which effectively makes TPS across VMs a vulnerability.

Although VMware believes the risk of TPS being used to gather sensitive information is low, we strive to ensure that products ship with default settings that are as secure as possible.

Additional information, including the introduction of the Mem.ShareForceSalting host config option, is available in VMware KB Article 2091682 Additional Transparent Page Sharing management capabilities in ESXi 5.5 patch October 16, 2014 and ESXi 5.1 and 5.0 patches in Q4, 2014

As well as the VMware blog article  Transparent Page Sharing – additional management capabilities and new default settings

Baremetalcloud Special Promo Through MikeLaverick.com

March 14th, 2013

Snagit CaptureHe’s Laverick by name, Maverick by nature (and if I might add, a very cool chap and my friend) – Mike Laverick, formerly of RTFM Education of which I was a LONG time reader going back to my Windows and Citrix days, now has a blog cleverly and conveniently situated at mikelaverick.com.  Since Mike joined forces with VMware, he’s been focused on vCloud evangelism and recently visited the Sydney/Melbourne VMUG where he was inspired with a new interest in home labs by AutoLab ala Alastair Cooke of Demitasse fame.  AutoLab has garnered some much deserved attention and adoption.  One organization that has taken an interest is baremetalcloud who provide IaaS via AutoLab on top of physical hardware for its customers.

Long story short, baremetalcloud is offering a special promotion to the first 100 subscribers through Mike’s blog.  Visit the Maverick’s blog via the link in the previous sentence where you can grab the promo code and reserve your baremetalcloud IaaS while supplies last.  Mike also walks through an end to end deployment so you can get an idea of what that looks like beforehand or use it as a reference in case you get stuck.

Thank you Mike, Alastair, and baremetalcloud for lending your hand to the community.

Book Review: VMware vSphere 5 Building a Virtual Datacenter

March 4th, 2013

Snagit Capture

Publication Date: August 30, 2012 | ISBN-10: 0321832213 | ISBN-13: 978-0321832214 | Edition: 1

I’m long overdue on book reviews and I need to start off with an apology to the authors for getting this one out so late.  The title is VMware vSphere 5 Building a Virtual Datacenter by Eric Maillé and René-François Mennecier (Foreword by Chad Sakac and Technical Editor Tom Keegan).  This is a book which caught me off guard a little because I was unaware of the authors (both in virtualization and cloud gigs at EMC Corporation) but nonetheless meeting new friends in virtualization is always pleasant surprise.  It was written prior to and released at the beginning of September 2012 with vSphere coverage up to version 5.0 which launched early in September 2011.

The book starts off with the first two chapters more or less providing a history of VMware virtualization plus coverage of most of the products and where they fit.  I’ve been working with VMware products since just about the beginning and as such I’ve been fortunate to be able to absorb all of the new technology in iterations as it came over a period of many years.  Summarizing it all in 55 pages felt somewhat overwhelming (this is not by any means a negative critique of the authors’ writing).  Whereas advanced datacenter virtualization was once just a concatenation of vCenter and ESX, the portfolio has literally exploded to a point where design, implementation, and management has gotten fairly complex for IT when juggling all of the parts together.  I sympathize a bit for late adopters – it really must feel like a fire hose of details to sort through to flesh out a final bill of materials which fits their environment.

From there, the authors move on to cover key areas of the virtualized and consolidated datacenter including storage and networking as well as cluster features, backup and disaster recovery (including SRM), and installation methods.  In the eighth and final chapter, a case study is looked at in which the second phase of a datacenter consolidation project must be delivered.  Last but not least is a final section titled Common Acronyms which I’ll unofficially call Chapter 9.  It summarizes and translates acronyms used throughout the book.  I’m not sure if it’s unique but it’s certainly not a bad idea.

To summarize, the book is 286 pages in length, not including the index.  It’s not a technical deepdive which covers everything in the greatest of detail but I do view it as a good starting point which is going to answer a lot of questions for beginners and beyond as well as provide some early guidance along the path of virtualization with vSphere.  The links above will take you directly to the book on Amazon where you can purchase a paperback copy or Kindle version of the book.  Enjoy and thank you Eric and René-François.

Chapter List

  1. From Server Virtualization to Cloud Computing
  2. The Evolution of vSphere 5 and its Architectural Components
  3. Storage in vSphere 5
  4. Servers and Network
  5. High Availability and Disaster Recovery Plan
  6. Backups in vSphere 5
  7. Implementing vSphere 5
  8. Managing a Virtualization Project
  9. Common Acronyms

VAAI and the Unlimited VMs per Datastore Urban Myth

February 28th, 2013

Speaking for myself, it’s hard to believe that just a little over 2 years ago in October 2010, many were rejoicing the GA release of vSphere 4.1 and its awesome new features and added scalability.  It seems so long ago.  The following February 2011, Update 1 for vSphere 4.1 was launched and I celebrated my one year anniversary as a VCDX certificate holder.  Now two years later, 5.0 and 5.1 have both seen the light of day along with a flurry of other products and acquisitions rounding out and shaping what is now the vCloud Suite.  Today I’m as much involved with vSphere as I think I ever have been.  Not so much in the operational role I had in the past, but rather a stronger focus on storage integration and meeting with Dell Compellent/VMware customers on a regular basis.

I began this article with vSphere 4.1 for a purpose.  vSphere 4.1 shipped with a new Enterprise Plus feature named vStorage APIs for Array Integration or VAAI for short (pronounced ‘vee double-ehh eye’ to best avoid twist of the tongue).  These APIs offered three different hardware offload mechanisms for block storage enabling the vSphere hypervisor to push some of the storage related heavy lifting to a SAN which supported the APIs.  One of the primitives in particular lies at the root of this topic and a technical marketing urban myth that I have seen perpetuated off and on since the initial launch of VAAI.  I still see it pop up from time to time through present day.

One of the oldest debates in VMware lore is “How many virtual machines should I place on each datastore?”  For this discussion, the context is block storage (as opposed to NFS).  There were all sorts of opinions as well as technical constraints to be considered.  There was the tried and true rule of thumb answer of 10-15-20 which has more than stood the test of time.  The best qualified answer was usually: “Whatever fits best for your consolidated environment” which translates to “it depends” and an invoice in consulting language.

When VAAI was released, I began to notice a slight but alarming trend of credible sources citing claims that the Atomic Test and Set or Hardware Assisted Locking primitive once and for all solved the VMs per LUN conundrum to the point that the number of VMs per LUN no longer mattered because LUN based SCSI reservations were now a thing of the past.  To that point, I’ve got marketing collateral saved on my home network that literally states “unlimited number of VMs per LUN with ATS!”  Basically, VAAI is the promise land – if you can get there with compatible storage and can afford E+ licensing, you no longer need to worry about VM placement and LUN sprawl to satisfy performance needs and  generally reduce latency across the board.  I’ll get to why that doesn’t work in a moment but for the time being I think the general public, especially veterans, remained cautious and less optimistic – and this was good.

Then vSphere 5.0 was released.  By this time, VAAI was made more highly available and affordable to customers in the Enterprise tier and additional primitives had been added for both block and NFS based storage.  In addition, VMware added support for 64TB block datastores without using extents (a true cause for celebration in its own right).  This new feature aligned perfectly with the ATS urban myth because where capacity may have been a limiting constraint in the past, that issue has certainly been lifted now.  To complement that, consistently growing density drives and reduction of cost/GB in arrays and thin provisioning made larger datastores easily achievable.  Marketing decks were updating accordingly.  Everything else being equal, we should now have no problem nor hesitation with placing hundreds, if not thousands of virtual machines on a single block datastore as if it were NFS and free from the constraints associated with the SCSI protocol.

The ATS VAAI primitive was developed to address infrastructure latency as a result of LUN based SCSI reservations which were necessary for certain operations such as creating and deleting files on a LUN, growing a file in size, creating and extending datastores.  We encounter these types of operations by doing things like powering on virtual machines individually or in large groups such as in a VDI environment, creating vSphere snapshots (very popular integration point for backup technologies), provisioning virtual machines from a template.  All of these tasks have one thing in common: they result in the change of metadata on the LUN which in turn necessitates a LUN level lock by the vSphere host making the change.  This lock, albeit very brief in duration, drives noticeable storage I/O latency in large iterations for the hosts and virtual machines “locked out” of the LUN.  The ATS primitive offloads the locking mechanism to the array which only locks the data being updated instead of locking the entire LUN.  Any environment which has been historically encumbered by these types of tasks is going to benefit from the ATS primitive and a reduction of storage latency (both reads and writes, sequential and random) will be the result.

With that overview of ATS out of the way, let’s revisit the statement again and see if it makes sense: “unlimited number of VMs per LUN with ATS!”  If the VMs we’re talking about frequently exhibit the behavior patterns discussed above which cause SCSI reservations, then without a doubt, ATS is going to replace the LUN level locking mechanism as the previous bottleneck and reduce storage latency.  This in turn will allow more VMs to be placed on the LUN until the next bottleneck is introduced.  Unlimited?  Not even close to being correct.  And what about VMs which don’t fit the SCSI reservation use case?  Suppose I use array based snapshots for data protection?  Suppose I don’t use or there is a corporate policy against vSphere snapshots (trust me, they’re out there, they exist)?  Maybe I don’t have a large scale VDI environment or boot storms are not a concern.  This claim I see from time to time makes no mention of use cases and conceivably applies to me as well meaning in an environment not constrained by classic SCSI reservation problem.  I too can leverage VAAI ATS to double, triple, place an unlimited amount of VMs per block datastore.  I talk with customers on a fairly regular basis who are literally confused about VM to LUN placement because of mixed messages they receive, especially when it comes to VAAI.

Allow me to perfrom some Eric Sloof style VMware myth busting and put the uber VMs per ATS enabled LUN claim to the test.  Meet Mike – a DBA who has taken over his organization’s vSphere 5.1 environment.  Mike spends the majority of his time keeping up with four different types of database technologies deployed in his datacenter.  Unfortunately that doesn’t leave Mike much time to read vSphere Clustering Deepdives or Mastering VMware vSphere but he knows well enough to not use vSphere snapshotting because he has an array based data consistent solution which integrates with each of his databases.

Fortunately, Mike has a stable and well performing environment exhibited to the left which the previous vSphere architect left for him.  Demanding database VMs, 32 in all, are distributed across eight block datastores.  Performance characteristics for each VM in terms of IOPS and Throughput are displayed (these are real numbers generated by Iometer in my lab).  The previous vSphere architect was never able to get his organization to buy off on Enterprise licensing and thus the environment lacked VAAI even though their array supported it.

Unfortunately for Mike, he tends to trust random marketing advice without thorough validation or research on impact to his environment.  When Mike took over, he heard from someone that he could simplify infrastructure management by implementing VAAI ATS and consolidate his existing 32 VMs to just a single 64TB datastore on the same array, plus grow his environment by adding basically an unlimited amount of VMs to the datastore providing there is enough capacity.

This information was enough to convince Mike and his management that, risks aside, management and troubleshooting efficiency through a single datastore was definitely the way to go.  Mike installed his new licensing, ensured VAAI was enabled on each host of the cluster, and carved up his new 64TB datastore which is backed by the same pool of raw storage and spindles servicing the eight original datastores.  Over the weekend, Mike used Storage vMotion to migrate his 32 eager zero thick database VMs from their eight datastores to the new 64TB datastore.  He then destroyed his eight original LUNs and for the remainder of that Sunday afternoon, he put his feet up on the desk and basked in the presence of his vSphere Client exhibiting a cluster of hosts and 32 production database VMs running on a single 64TB datastore.

On Monday morning, his stores began to open up on the east coast and in the midwest.  At about 8:30AM central time, the helpdesk began receiving calls from various stores that the system seemed slow.  Par for the course for a Monday morning but with great pride and ethics, Mike began health checks on the database servers anyway.  While he was busy with that, stores on the west coast opened for business and then the calls to the helpdesk increased in frequency and urgency.  The system was crawling and in some rare cases the application was timing out producing transaction failure messages.

Finding no blocking or daytime re-indexing issues at the database layer, Mike turned to the statistical counters for storage and saw a significant decrease in IOPS and Throughput across the board – nearly 50% (again, real Iometer numbers to the right).  Conversely, latency (which is not shown) was through the roof which explained the application timeout failures.  Mike was bewildered.  He had made an additional investment in hardware assisted offload technology and was hoping for a noticeable increase in performance.  Least of all, he didn’t expect a net reduction in performance, especially this pronounced.  What happened?  How is it possible to change the VM:datastore ratio, backed by the same exact pool of storage Tier and RAID type, and come up with a dramatic shift in performance?  Especially when one resides in the kingdom of VAAI?

Queue Depth.  There’s only so much active I/O to go around, per LUN, per host, at any given moment in time.  When multiple VMs on the same host reside on the same LUN, they must share the queue depth of that LUN.  Queue depth is defined in many places along the path of an I/O and at each point, it specifies how many I/Os per LUN per host can be “active” in terms of being handled and processed (decreases latency) as opposed to being queued or buffered (increases latency).  Outside of an environment utilizing SIOC, the queue depth that each virtual machine on a given LUN per host must share is 32 as defined by the default vSphere DSNRO value.  What this effectively means is that all virtual machines on a host sharing the same datastore must share a pool of 32 active I/Os for that datastore.

Applied to Mike’s two-host cluster, whereas he used to have four VMs per datastore evenly distributed across two hosts, effectively each VM had a sole share of 16 IOPS to work with (1 datastore x queue depth of 32 x 2 hosts / 4 VMs or simplified further 1 datastore x queue depth of 32 x 1 host /2 VMs)

After Mike’s consolidation to a single datastore, 16 VMs per host had to share a single LUN with a default queue depth of 32 which reduced each virtual machine’s active IOPS from 16 to 2.

Although the array had the raw storage spindle count and IOPS capability to provide fault tolerance, performance, and capacity, at the end of the day, queue depth ultimately plays a role in performance per LUN per host per VM.  To circle back to the age old “How many virtual machines should I place on each datastore?” question, this is ultimately where the old 10-15-20 rule of thumb came in:

  • 10 high I/O VMs per datastore
  • 15 average I/O VMs per datastore
  • 20 low I/O VMs per datastore

Extrapolated across even the most modest sized cluster, each VM above is going to get a fairly sufficient share of the queue depth to work with.  Assuming even VM distribution across clustered hosts (you use DRS in automated mode right?), each host added to the cluster and attached to the shared storage brings with it, by default, an additional 32 IOPS per datastore for VMs to share in.  Note that this article is not intended to be an end to end queue depth discussion and safe assumptions are made that the DSNRO value of 32 represents the smallest queue depth in the entire path of the I/O which is generally true with most installations and default HBA card/driver values.

In summary, myth busted.  Each of the VAAI primitives was developed to address specific storage and fabric bottlenecks.  While the ATS primitive is ideal for drastically reducing SCSI reservation based latency and it can increase the VM: datastore ratio to a degree, it was never designed to imply large sums of or an unlimited number of VMs per datastore because this assumption simply does not factor in other block based storage performance inhibitors such as queue depth, RAID pools, controller/LUN ownership model, fabric balancing, risk, etc.  Every time I hear the claim, it sounds as foolish as ever.  Don’t be fooled.

Update 3/11/13: A few related links on queue depth:

QLogic Fibre Channel Adapter for VMware ESX User’s Guide

Execution Throttle and Queue Depth with VMware and Qlogic HBAs

Changing the queue depth for QLogic and Emulex HBAs (VMware KB 1267)

Setting the Maximum Outstanding Disk Requests for virtual machines (VMware KB 1268)

Controlling LUN queue depth throttling in VMware ESX/ESXi (VMware KB 1008113)

Disk.SchedNumReqOutstanding the story (covers Disk.SchedQuantum, Disk.SchedQControlSeqReqs, and Disk.SchedQControlVMSwitches)

Disk.SchedNumReqOutstanding and Queue Depth (an article I wrote back in June 2011)

Last but not least, a wonderful whitepaper from VMware I’ve held onto for years:  Scalable Storage Performance VMware ESX 3.5