Posts Tagged ‘ESXi’

Redefining Disk.MaxLUN

March 27th, 2013

Regardless of what the vSphere host Advanced Setting Disk.MaxLUN has stated as its definition for years, “Maximum number of LUNs per target scanned for” is technically not correct.  In fact, it’s quite misleading.

Snagit Capture

The true definition looks similar stated in English but carries quite a different meaning and it can be found in my SnagIt hack above or within VMware KB 1998 Definition of Disk.MaxLUN on ESX Server Systems and Clarification of 128 Limit.

The Disk.MaxLUN attribute specifies the maximum LUN number up to which the ESX Server system scans on each SCSI target as it is discovering LUNs. If you have a LUN 131 on a disk that you want to access, for example, then Disk.MaxLUN must be at least 132. Don’t make this value higher than you need to, though, because higher values can significantly slow VMkernel bootup.

The 128 LUN limit refers only to the total number of LUNs that the ESX Server system is able to discover. The system intentionally stops discovering LUNs after it finds 128 because of various service console and management interface limits. Depending on your setup, you can easily have a situation in which Disk.MaxLUN is high (255) but you see few LUNs, or a situation in which Disk.MaxLUN is low (16) but you reach the 128 LUN limit because you have many targets.

For more information about limiting the number of LUNs visible to the server, see http://kb.vmware.com/kb/1467.

Note the last sentence in the first paragraph above in the KB article.  Keep the value as small as possible for your environment when using block storage.  vSphere ships with this value configured for maximum compatibility out of the box which is the max value of 256.  Assuming you don’t assign LUN numbers up to 256 in your environment, this value can be immediately ratcheted down in your build documentation or automated deployment scripts.  Doing so will decrease the elapsed time spent rescanning the fabric for block devices/VMFS datastores.  This tweak may be of particular interest at DR sites when using Site Recovery Manager to carry out a Recovery Plan test, a Planned Migration, or an actual DR execution.  It will allow for a more efficient use of RTO (Recovery Time Objective) time especially where multiple recovery plans are run consecutively.

VMware vSphere Design 2nd Edition Now Available

March 20th, 2013

Snagit Capture

Publication Date: March 25, 2013 | ISBN-10: 1118407911 | ISBN-13: 978-1118407912 | Edition: 2

The big splash was officially made yesterday but I’m following up with my announcement a day later to help spread the message to anyone who may have been heads down and missed it.  Forbes Guthrie (Snagit Capture Snagit Capture), Scott Lowe (Snagit Capture Snagit Capture), and Kendrick Coleman (Snagit Capture Snagit Capture) have teamed up to produce VMware vSphere Design 2nd Edition (a followup refresh of the popular 1st Edition).

As Technical Editor, I’m one of the few fortunate individuals who have already had the pleasure to have read the book.  I can tell you that it is jam-packed with the deep technical detail, design perspective, and methodology you’d expect from these seasoned and well-respected industry experts.

The book is 528 pages in length (compare to 384 pages in the 1st edition).  New in this version is coverage of vSphere 5.1, emerging infrastructure technologies and trends, as well as a section on vCloud Director design – a worthy topic which should be weighing heavily on the minds of many by now and in the future will likely spawn dedicated coverage in texts by Sybex and/or other publishers.

The publisher has made the introduction section of the book freely available.  You can take a look at that by clicking this link which is hosted at Forbes vReference blog.  As with the previous edition, this book is made available in both paperback and Kindle editions.  Support these authors and pick up your copy today.  Tell them Jason sent you and nothing special will likely take place.

Large Memory Pages and Shrinking Consolidation Ratios

March 19th, 2013

Here’s a discussion that has somewhat come full circle for me and could prove to be a handy for those with lab or production environments alike.

A little over a week ago I was having lunch with a former colleague and naturally a TPS discussion broke out.  We talked about how it worked and how effective it was with small memory pages (4KB in size) as well as large memory pages (2MB in size).  The topic was brought up with a purpose in mind.

Many moons ago, VMware virtualized datacenters consisted mainly of Windows 2000 Server and Windows Server 2003 virtual machines which natively leverage small memory pages – an attribute built into the guest operating system itself.  Later, Windows Vista as well as 2008 and its successors came onto the scene allocating large memory pages by default (again – at the guest OS layer) to boost performance for certain workload types.  To maintain flexibility and feature support, VMware ESX and ESXi hosts have supported large pages by default providing the guest operating system requested them.  For those operating systems that still used the smaller memory pages, those were supported by the hypervisor as well.  This support and configuration remains the default today in vSphere 5.1 in an advanced host-wide setting called Mem.AllocGuestLargePage (1 to enable and support both large and small pages – the default, 0 to disable and force small pages).  VMware released a small whitepaper covering this subject several years ago titled Large Page Performance which summarizes lab test results and provides the steps required to toggle large pages in the hypervisor as well as within Windows Server 2003

As legacy Windows platforms were slowly but surely replaced by their Windows Server 2008, R2, and now 2012 predecessors, something began to happen.  Consolidation ratios gated by memory (very typical mainstream constraint in most environments I’ve managed and shared stories about) started to slip.  Part of this can be attributed to the larger memory footprints assigned to the newer operating systems.  That makes sense, but this only explains a portion of the story.  The balance of memory has evaporated as a result of modern guest operating systems using large 2MB memory pages which will not be consolidated by the TPS mechanism (until a severe memory pressure threshold is crossed but that’s another story discussed here and here).

For some environments, many I imagine, this is becoming a problem which manifests itself as an infrastructure capacity growth requirement as guest operating systems are upgraded.  Those with chargeback models where the customer or business unit paid up front at the door for their VM or vApp shells are now getting pinched because compute infrastructure doesn’t spread as thin as it once did.  This will be most pronounced in the largest of environments.  A pod or block architecture that once supplied infrastructure for 500 or 1,000 VMs now fills up with significantly less.

So when I said this discussion has come full circle, I meant it.  A few years ago Duncan Epping wrote an article called KB Article 1020524 (TPS and Nehalem) and a portion of this blog post more or less took place in the comments section.  Buried in there was a comment I had made while being involved in the discussion (although I don’t remember it).  So I was a bit surprised when a Google search dug that up.  It wasn’t the first time that has happened and I’m sure it won’t be the last.

Back to reality.  After my lunch time discussion with Jim, I decided to head to my lab which, from a guest OS perspective, was all Windows Server 2008 R2 or better, plus a bit of Linux for the appliances.  Knowing that the majority of my guests were consuming large memory pages, how much more TPS savings would result if I forced small memory pages on the host?  So I evacuated a vSphere host using maintenance mode, configured Mem.AllocGuestLargePage to a value of 0, then placed all the VMs back onto the host.  Shown below are the before and after results.

 

A decrease in physical memory utilization of nearly 20% per host – TPS is alive again:

Snagit Capture Snagit Capture

 

124% increase in Shared memory in Tier1 virtual Machines:

Snagit Capture Snagit Capture

 

90% increase in Shared memory in Tier3 virtual Machines:

Snagit Capture Snagit Capture

 

Perhaps what was most interesting was the manner in which TPS consolidated pages once small pages were enabled.  The impact was not realized right away nor was it a gradual gain in memory efficiency as vSphere scanned for duplicate pages.  Rather it seemed to happen in batch almost all at once 12 hours after large pages had been disabled and VMs had been moved back onto the host:

Snagit Capture

 

So for those of you who may be scratching your heads wondering what is happening to your consolidation ratios lately, perhaps this has some or everything to do with it.  Is there an action item to be carried out here? That depends on what your top priority when comparing infrastructure performance in one hand and maximized consolidation in the other.

Those who are on a lean infrastructure budget (home lab would be an ideal fit here), consider forcing small pages to greatly enhance TPS opportunities to stretch your lab dollar which has been getting consumed by modern operating systems and and increasing number of VMware and 3rd party appliances.

Can you safely disable large pages in production clusters? It’s a performance question I can’t answer that globally.  You may or may not see performance hit to your virtual machines based on their workloads.  Remember that the use of small memory pages and AMD Rapid Virtualization Indexing (RVI) and Intel Extended Page Tables (EPT) is mutually exclusive.  Due diligence testing is required for each environment.  As it is a per host setting, testing with the use of vMotion really couldn’t be easier.  Simply disable large pages on one host in a cluster and migrate the virtual machines in question to that host and let them simmer.  Compare performance metrics before and after.  Query your users for performance feedback (phrase the question in a way that implies you added horsepower instead of asking the opposite “did the application seem slower?”)

That said, I’d be curious to hear if anyone in the community disables large pages in their environments as a regular habit or documented build procedure and what the impact has been if any on both the memory utilization as well as performance.

Last but not least, Duncan has another good blog post titled How many pages can be shared if Large Pages are broken up?  Take a look at that for some tips on using ESXTOP to monitor TPS activity.

Update 3/21/13:  I didn’t realize Gabrie had written about this topic back in January 2011.  Be sure to check out his post Large Pages, Transparent Page Sharing and how they influence the consolidation ratio.  Sorry Gabrie, hopeuflly understand I wasn’t trying to steal your hard work and originality 🙂

Update 10/20/14:  VMware announced last week that inter-VM TPS (memory page sharing between VMs, not to be confused with memory page sharing within a single VM) will no longer be enabled by default. This default ESXi configuration change will take place in December 2014.

VMware KB Article 2080735 explains Inter-Virtual Machine TPS will no longer be enabled by default starting with the following releases:

ESXi 5.5 Update release – Q1 2015
ESXi 5.1 Update release – Q4 2014
ESXi 5.0 Update release – Q1 2015
The next major version of ESXi

Administrators may revert to the previous behavior if they so wish.

and…

Prior to the above ESXi Update releases, VMware will release ESXi patches that introduce additional TPS management capabilities. These ESXi patches will not change the existing settings for inter-VM TPS. The planned ESXi patch releases are:

ESXi 5.5 Patch 3. For more information, see VMware ESXi 5.5, Patch ESXi550-201410401-BG: Updates esx-base (2087359).
ESXi 5.1 patch planned for Q4, 2014
ESXi 5.0 patch planned for Q4, 2014

The divergence is in response to new research which leveraged TPS to gain unauthorized access to data. Under certain circumstances, a data security breach may occur which effectively makes TPS across VMs a vulnerability.

Although VMware believes the risk of TPS being used to gather sensitive information is low, we strive to ensure that products ship with default settings that are as secure as possible.

Additional information, including the introduction of the Mem.ShareForceSalting host config option, is available in VMware KB Article 2091682 Additional Transparent Page Sharing management capabilities in ESXi 5.5 patch October 16, 2014 and ESXi 5.1 and 5.0 patches in Q4, 2014

As well as the VMware blog article  Transparent Page Sharing – additional management capabilities and new default settings

Book Review: VMware vSphere 5 Building a Virtual Datacenter

March 4th, 2013

Snagit Capture

Publication Date: August 30, 2012 | ISBN-10: 0321832213 | ISBN-13: 978-0321832214 | Edition: 1

I’m long overdue on book reviews and I need to start off with an apology to the authors for getting this one out so late.  The title is VMware vSphere 5 Building a Virtual Datacenter by Eric Maillé and René-François Mennecier (Foreword by Chad Sakac and Technical Editor Tom Keegan).  This is a book which caught me off guard a little because I was unaware of the authors (both in virtualization and cloud gigs at EMC Corporation) but nonetheless meeting new friends in virtualization is always pleasant surprise.  It was written prior to and released at the beginning of September 2012 with vSphere coverage up to version 5.0 which launched early in September 2011.

The book starts off with the first two chapters more or less providing a history of VMware virtualization plus coverage of most of the products and where they fit.  I’ve been working with VMware products since just about the beginning and as such I’ve been fortunate to be able to absorb all of the new technology in iterations as it came over a period of many years.  Summarizing it all in 55 pages felt somewhat overwhelming (this is not by any means a negative critique of the authors’ writing).  Whereas advanced datacenter virtualization was once just a concatenation of vCenter and ESX, the portfolio has literally exploded to a point where design, implementation, and management has gotten fairly complex for IT when juggling all of the parts together.  I sympathize a bit for late adopters – it really must feel like a fire hose of details to sort through to flesh out a final bill of materials which fits their environment.

From there, the authors move on to cover key areas of the virtualized and consolidated datacenter including storage and networking as well as cluster features, backup and disaster recovery (including SRM), and installation methods.  In the eighth and final chapter, a case study is looked at in which the second phase of a datacenter consolidation project must be delivered.  Last but not least is a final section titled Common Acronyms which I’ll unofficially call Chapter 9.  It summarizes and translates acronyms used throughout the book.  I’m not sure if it’s unique but it’s certainly not a bad idea.

To summarize, the book is 286 pages in length, not including the index.  It’s not a technical deepdive which covers everything in the greatest of detail but I do view it as a good starting point which is going to answer a lot of questions for beginners and beyond as well as provide some early guidance along the path of virtualization with vSphere.  The links above will take you directly to the book on Amazon where you can purchase a paperback copy or Kindle version of the book.  Enjoy and thank you Eric and René-François.

Chapter List

  1. From Server Virtualization to Cloud Computing
  2. The Evolution of vSphere 5 and its Architectural Components
  3. Storage in vSphere 5
  4. Servers and Network
  5. High Availability and Disaster Recovery Plan
  6. Backups in vSphere 5
  7. Implementing vSphere 5
  8. Managing a Virtualization Project
  9. Common Acronyms

Thin Provisioning Storage Choices

February 8th, 2013

I talk with a lot of customers including those confined to vSphere, storage, and general datacenter management roles.  The IT footprint size varies quite a bit between discussions as does the level of experience across technologies. However, one particular topic seems to come up at regular intervals when talking vSphere and storage: Thin Provisioning – where exactly is the right place for it in the stack?  At the SAN layer? At the vSphere layer? Both?

Virtualization is penetrating datacenters from multiple angles: compute, storage, network, etc.  Layers of abstraction seem to be multiplying to provide efficiency, mobility, elasticity, high availability, etc.  The conundrum we’re faced with is that some of these virtualization efforts converge.  As with many decisions to be made, flexibility yields an array of choices.  Does the convergence introduce a conflict between technologies? Do the features “stack”?  Do they complement each other? Is one solution better than the other in terms of price or performance?

I have few opinions around thin provisioning (and to be clear, this discussion revolves around block storage.  Virtual machine disks are natively thin provisioned and written into thin on NFS datastores).

1.  Deploy and leverage with confidence.  Generally speaking, thin provisioning at either the vSphere or storage layer has proven itself as both cost effective and reliable for the widest variety of workloads including most tier 1 applications.  Corner cases around performance needs may present themselves and full provisioning may provide marginal performance benefit at the expense of raw capacity consumed up front in the tier(s) where the data lives.  However, full provisioning is just one of many ways to extract additional performance from existing storage.  Explore all available options.  For everything else, thinly provision.

2.  vSphere or storage vendor thin provisioning?  From a generic standpoint, it doesn’t matter so much, other than choose at least one to achieve the core benefits around thin provisioning.  Where to thin provision isn’t really a question of what’s right, or what’s wrong.  It’s about where the integration is the best fit with respect to other storage hosts that may be in the datacenter and what’s appropriate for the organizational roles.  Outside of RDMs, thin provisioning at the vSphere or storage layer yields about the same storage efficiency for vSphere environments.  For vSphere environments alone, the decision can be boiled down to reporting, visiblity, ease of use, and any special integration your storage vendor might have tied to thin provisioning at the storage layer.

The table below covers three scenarios of thin provisioning most commonly brought up.  It reflects reporting and storage savings component at the vSphere and SAN layers.  In each of the first three use cases, a VM with 100GB of attached .vmdk storage is provisioned of which a little over 3GB is consumed by an OS and the remainder is unused “white space”.

  • A)  A 100GB lazy zero thick VM is deployed on a 1TB thinly provisioned LUN.
    • The vSphere Client is unaware of thin provisioning at the SAN layer and reports 100GB of the datastore capacity provisioned into and consumed.
    • The SAN reports 3.37GB of raw storage consumed to SAN Administrators.  The other nearly 1TB of raw storage remains available on the SAN for any physical or virtual storage host on the fabric.  This is key for the heterogeneous datacenter where storage efficiency needs to be spread and shared across different storage hosts beyond just the vSphere clusters.
    • This is the default provisioning option for vSphere as well as some storage vendors such as Dell Compellent.  Being the default, it requires the least amount of administrative overhead and deployment time as well as providing infrastructure consistency.  As mentioned in the previous bullet, thin provisioning at the storage layer provides a benefit across the datacenter rather than exclusively for vSphere storage efficiency.  All of these benefits really make thin provisioning at the storage layer an overwhelmingly natural choice.
  • B)  A 100GB thin VM is deployed on a 1TB fully provisioned LUN.
    • The vSphere Client is aware of thin provisioning at the vSphere layer and reports 100GB of the datastore capacity provisioned into but only 3.08GB consumed.
    • Because this volume was fully provisioned instead of thin provisioned, SAN Administrators see a consumption of 1TB consumed up front from the pool of available raw storage.  Nearly 1TB of unconsumed datastore capacity remains available to the vSphere cluster only.  Thin provisioning at the vSphere layer does not leave the unconsumed raw storage available to other storage hosts on the fabric.
    • This is not the default provisioning option for vSphere nor is it the default volume provisioning default for shared storage.  Thin provisioning at the vSphere layer yields roughly the same storage savings as thin provisioning at the SAN layer.  However, only vSphere environments can expose and take advantage of the storage efficiency.  Because it is the default deployment option, it requires a slightly higher level of administrative overhead and can lead to environment inconsistency.  On the other hand, for SANs which do not support thin provisioning, vSphere thin provisioning is a fantastic option, and the only remaining option for block storage efficiency.
  • C)  A 100GB thin VM is deployed on a 1TB thinly provisioned LUN – aka thin on thin.
    • Storage efficiency is reported to both vSphere and SAN Administrator dashboards.
    • The vSphere Client is aware of thin provisioning at the vSphere layer and reports 100GB of the datastore capacity provisioned into but only 3.08GB consumed.
    • The SAN reports 3.08GB of raw storage consumed.  The other nearly 1TB of raw storage remains available on the SAN for any physical or virtual storage host on the fabric.  Once again, the efficiency benefit is spread across all hosts in the datacenter.
    • This is not the default provisioning option for vSphere and as a result the same inconsistencies mentioned above may result.  More importantly, thin provisioning at the vSphere layer on top of thin provisioning at the SAN layer doesn’t provide a significant amount additional storage efficiency.  The numbers below show slightly different but I’m going to attribute that difference to non-linear delta caused by VMFS formatting and call them a wash in the grand scheme of things.  While thin on thin doesn’t adversely impact the environment, the two approaches don’t stack.  Compared to just thin provisioning at the storage layer, the draw for this option is for reporting purposes only.

What I really want to call out is the raw storage consumed in the last column.  Each cell outlined in red reveals the net raw storage consumed before RAID overhead – and conversely paints a picture of storage savings and efficiency allowing a customer to double dip on storage or provision capacity today at next year’s cost – two popular drivers for thin provisioning.

      Vendor Integration
      vSphere Administrators SAN Administrators
      vSphere Client Virtualized Storage
      Virtual Disk Storage Datastore Capacity Page Pool Capacity
  100GB VM 1TB LUN Provisioned Consumed Provisioned Consumed Provisioned Consumed+
A Lazy Thick Thin Provision 100GB 100GB 1TB 100GB 1TB 3.37GB*
B Thin Full Provision 100GB 3.08GB 1TB 3.08GB 1TB 1TB
C Thin Thin Provision 100GB 3.08GB 1TB 3.08GB 1TB 3.08GB*
                 
  1TB RDM 1TB LUN            
D vRDM Thin Provision 1TB 1TB n/a n/a 1TB 0GB
E pRDM Thin Provision 1TB 1TB n/a n/a 1TB 0GB

+ Numbers exclude RAID overhead to provide accurate comparisons

* 200MB of pages consumed by the VMFS-5 file system was subtracted from the total to provide accurate comparisons

There are two additional but less mainstream considerations to think about: virtual and physical RDMs.  Neither can be thinly provisioned at the vSphere layer.  Storage efficiency can only come from and be reported on the SAN.

  • D and E)  Empty 1TB RDMs (both virtual and physical) are deployed on 1TB LUNs thinly provisioned at the storage layer.
    • Historically, the vSphere Client has always been poor at providing RDM visibility.  In this case, the vSphere Client is unaware of thin provisioning at the SAN layer and reports 1TB of storage provisioned (from somewhere unknown – the ultimate abstraction) and consumed.
    • The SAN reports zero raw storage consumed to SAN Administrators.  2TB of raw storage remains available on the SAN for any physical or virtual storage host on the fabric.
    • Again, thin provisioning from your storage vendor is the only way to write thinly into RDMs today.

So what is my summarized recommendation on thin provisioning in vSphere, at the SAN, or both?  I’ll go back to what I mentioned earlier, if the SAN is shared outside of the vSphere environment, then thin provisioning should be performed at the SAN level so that all datacenter hosts on the storage fabric can leverage provisioned but yet unallocated raw storage..  If the SAN is dedicated to your vSphere environment, then there really no right or wrong answer.  At that point it’s going to depend on your reporting needs, maybe the delegation of roles in your organization, and of course the type of storage features you may have that combine with thin provisioning to add additional value.  If you’re a Dell Compellent Storage Center customer, let the the vendor provided defaults guide you: Lazy zero thick virtual disks on datastores backed by thinly provisioned LUNs.  Thin provisioning at the storage layer is also going to save customers a bundle in unconsumed tier 1 storage costs.  Instead of islands of tier 1 pinned to a vSphere cluster, the storage remains freely available in the pool for any other storage host with tier 1 performance needs.  For virtual or physical RDMs, thin provisioning on the SAN is the only available option.  I don’t recommend thin on thin to compound or double space savings because it simply does not work the way some expect it to.  However, if there is a dashboard reporting need, go for it.

Depending on your storage vendor, you may have integration available to you that will provide management and reporting across platforms.  For instance, suppose we roll with option A above: thin provisioning at the storage layer.  Natively we don’t have storage efficiency visibility within the vSphere Client.  However, storage vendor integration through VASA or a vSphere Client plug-in can bring storage details into the vSphere Client (and vise versa).  One example is the vSphere Client plug-in from Dell Compellent shown below.  Aside from the various storage and virtual machine provisioning tasks it is able to perform, it brings a SAN Administrator’s dashboard into the vSphere Client.  Very handy in small to medium sized shops where roles spread across various technological boundaries.

Snagit Capture

Lastly, I thought I’d mention UNMAP – 1/2 of the 4th VAAI primitive for block storage.  I wrote an article last summer called Storage: Starting Thin and Staying Thin with VAAI UNMAP.  For those interested, the UNMAP primitive works only with thin provisioning at the SAN layer on certified storage platforms.  It was not intended to and does not integrate with thinly provisioned vSphere virtual disks alone.  Thin .vmdks in which data has been deleted from within will not dehydrate unless storage vMotioned. Raw storage pages will remain “pinned” to the datastore where the .vmdk resides until is is moved or deleted.  Only then can the pages be returned back to the pool if the datastore resides on a thin provisioned LUN.

Monster VMs & ESX(i) Heap Size: Trouble In Storage Paradise

September 12th, 2012

While running Microsoft Exchange Server Jetstress on vSphere 5 VMs in the lab, tests were failing about mid way through initializing its several TBs of databases.  This was a real head scratcher.  Symptoms were unwritable storage or lack of storage capacity.  Troubleshooting yielding errors such as “Cannot allocate memory”.  After some tail chasing, the road eventually lead to VMware KB article 1004424: An ESXi/ESX host reports VMFS heap warnings when hosting virtual machines that collectively use 4 TB or 20 TB of virtual disk storage.

As it turns out, ESX(i) versions 3 through 5 have a statically defined per-host heap size:

  • 16MB for ESX(i) 3.x through 4.0: Allows a max of 4TB open virtual disk capacity (again, per host)
  • 80MB for ESX(i) 4.1 and 5.x: Allows a max of 8TB open virtual disk capacity (per host)

This issue isn’t specific to Jetstress, Exchange, Microsoft, or a specific fabric type, storage protocol or storage vendor.  Exceeding the virtual disk capacities listed above, per host, results in the symptoms discussed earlier and memory allocation errors.  In fact, if you take a look at the KB article, there’s quite a laundry list of possible symptoms depending on what task is being attempted:

  • An ESXi/ESX 3.5/4.0 host has more that 4 terabytes (TB) of virtual disks (.vmdk files) open.
  • After virtual machines are migrated by vSphere HA from one host to another due to a host failover, the virtual machines fail to power on with the error:vSphere HA unsuccessfully failed over this virtual machine. vSphere HA will retry if the maximum number of attempts has not been exceeded. Reason: Cannot allocate memory.
  • You see warnings in /var/log/messages or /var/log/vmkernel.logsimilar to:vmkernel: cpu2:1410)WARNING: Heap: 1370: Heap_Align(vmfs3, 4096/4096 bytes, 4 align) failed. caller: 0x8fdbd0
    vmkernel: cpu2:1410)WARNING: Heap: 1266: Heap vmfs3: Maximum allowed growth (24) too small for size (8192)
    cpu15:11905)WARNING: Heap: 2525: Heap cow already at its maximum size. Cannot expand.
    cpu15:11905)WARNING: Heap: 2900: Heap_Align(cow, 6160/6160 bytes, 8 align) failed. caller: 0x41802fd54443
    cpu4:1959755)WARNING:Heap: 2525: Heap vmfs3 already at its maximum size. Cannot expand.
    cpu4:1959755)WARNING: Heap: 2900: Heap_Align(vmfs3, 2099200/2099200 bytes, 8 align) failed. caller: 0x418009533c50
    cpu7:5134)Config: 346: “SIOControlFlag2” = 0, Old Value: 1, (Status: 0x0)
  • Adding a VMDK to a virtual machine running on an ESXi/ESX host where heap VMFS-3 is maxed out fails.
  • When you try to manually power on a migrated virtual machine, you may see the error:The VM failed to resume on the destination during early power on.
    Reason: 0 (Cannot allocate memory).
    Cannot open the disk ‘<<Location of the .vmdk>>’ or one of the snapshot disks it depends on.
  • The virtual machine fails to power on and you see an error in the vSphere client:An unexpected error was received from the ESX host while powering on VM vm-xxx. Reason: (Cannot allocate memory)
  • A similar error may appear if you try to migrate or Storage vMotion a virtual machine to a destination ESXi/ESX host on which heap VMFS-3 is maxed out.
  • Cloning a virtual machine using the vmkfstools -icommand fails and you see the error:Clone: 43% done. Failed to clone disk: Cannot allocate memory (786441)
  • In the /var/log/vmfs/volumes/DatastoreName/VirtualMachineName/vmware.log file, you may see error messages similar to:2012-05-02T23:24:07.900Z| vmx| FileIOErrno2Result: Unexpected errno=12, Cannot allocate memory
    2012-05-02T23:24:07.900Z| vmx| AIOGNRC: Failed to open ‘/vmfs/volumes/xxxx-flat.vmdk’ : Cannot allocate memory (c00000002) (0x2013).
    2012-05-02T23:24:07.900Z| vmx| DISKLIB-VMFS : “/vmfs/volumes/xxxx-flat.vmdk” : failed to open (Cannot allocate memory): AIOMgr_Open failed. Type 3
    2012-05-02T23:24:07.900Z| vmx| DISKLIB-LINK : “/vmfs/volumes/xxxx.vmdk” : failed to open (Cannot allocate memory).
    2012-05-02T23:24:07.900Z| vmx| DISKLIB-CHAIN : “/vmfs/volumes/xxxx.vmdk” : failed to open (Cannot allocate memory).
    2012-05-02T23:24:07.900Z| vmx| DISKLIB-LIB : Failed to open ‘/vmfs/volumes/xxxx.vmdk’ with flags 0xa Cannot allocate memory (786441).
    2012-05-02T23:24:07.900Z| vmx| DISK: Cannot open disk “/vmfs/volumes/xxxx.vmdk”: Cannot allocate memory (786441).
    2012-05-02T23:24:07.900Z| vmx| Msg_Post: Error
    2012-05-02T23:24:07.900Z| vmx| [msg.disk.noBackEnd] Cannot open the disk ‘/vmfs/volumes/xxxx.vmdk’ or one of the snapshot disks it depends on.
    2012-05-02T23:24:07.900Z| vmx| [msg.disk.configureDiskError] Reason: Cannot allocate memory.

While VMware continues to raise the scale and performance bar for it’s vCloud Suite, this virtual disk and heap size limitation becomes a limiting constraint for monster VMs or vApps.  Fortunately, there’s a fairly painless resolution (at least up until a certain point):  Increase the Heap Size beyond its default value on each host in the cluster and reboot each host.  The advanced host setting to configure is VMFS3.MaxHeapSizeMB.

Let’s take another look at the default heap size and with the addition of its maximum allowable heap size value:

  • ESX(i) 3.x through 4.0:
    • Default value: 16MB – Allows a max of 4TB open virtual disk capacity
    • Maximum value: 128MB – Allows a max of 32TB open virtual disk capacity per host
  • ESX(i) 4.1 and 5.x:
    • Default value: 80MB – Allows a max of 8TB open virtual disk capacity
    • Maximum value: 256MB – Allows a max of 25TB open virtual disk capacity per host

After increasing the heap size and performing a reboot, the ESX(i) kernel will consume additional memory overhead equal to the amount of heap size increase in MB.  For example, on vSphere 5, the increase of heap size from 80MB to 256MB will consume an extra 176MB of base memory which cannot be shared with virtual machines or other processes running on the host.

Readers may have also noticed an overall decrease in the amount of open virtual disk capacity per host supported in newer generations of vSphere.  While I’m not overly concerned at the moment, I’d bet someone out there has a corner case requiring greater than 25TB or even 32TB of powered on virtual disk per host.  With two of VMware’s core value propositions being innovation and scalability, I would tip-toe lightly around the phrase “corner case” – it shouldn’t be used as an excuse for its gaps while VMware pushes for 100% data virtualization and vCloud adoption.  Short term, the answer may be RDMs. Longer term: vVOLS.

Updated 9/14/12: There are some questions in the comments section about what types of stoarge the heap size constraint applies to.  VMware has confirmed that heap size and max virtual disk capacity per host applies to VMFS only. The heap size constraint does not apply to RDMs nor does it apply to NFS datastores.

Updated 4/4/13: VMware has released patch ESXi500-201303401-BG to address heap issues.  This patch makes improvements to both default and maximum limits of open VMDK files per vSphere host.  After applying the above patch to each host, the default heap size for VMFS-5 datastores becomes 640MB which supports 60TB of open VMDK files per host.  These new default configurations are also the maximum values as well.  For additional reading on other fine blogs, see A Small Adjustment and a New VMware Fix will Prevent Heaps of Issues on vSphere VMFS Heap and The Case for Larger Than 2TB Virtual Disks and The Gotcha with VMFS.

Updated 4/30/13: VMware has released vSphere 5.1 Update 1 and as Cormac has pointed out here, heap issue resolution has been baked into this release as follows:

  1. VMFS heap can grow up to a maximum of 640MB compared to 256MB in earlier release. This is identical to the way that VMFS heap size can grow up to 640MB in a recent patch release (patch 5) for vSphere 5.0. See this earlier post.
  2. Maximum heap size for VMFS in vSphere 5.1U1 is set to 640MB by default for new installations. For upgrades, it may retain the values set before upgrade. In such cases, please set the values manually.
  3. There is also a new heap configuration “VMFS3.MinHeapSizeMB” which allows administrators to reserve the memory required for the VMFS heap during boot time. Note that “VMFS3.MinHeapSizeMB” cannot be set more than 255MB, but if additional heap is required it can grow up to 640MB. It alleviates the heap consumption issue seen in previous versions, allowing the ~ 60TB of open storage on VMFS-5 volumes per host to be accessed.

When reached for comment, Monster VM was quoted as saying “I’m happy about these changes and look forward to a larger population of Monster VMs like myself.”

photo

VMworld 2012 Announcements – Part I

August 27th, 2012

VMworld 2012 is underway in San Francisco.  Once again, a record number of attendees is expected to gather at the Moscone Center to see what VMware and their partners are announcing.  From a VMware perspective, there is plenty.

Given the sheer quantity of announcements, I’m actually going to break up them up into a few parts, this post being Part I.  Let’s start with the release of vSphere 5.1 and some of its notable features.

Enhanced vMotion – the ability to now perform a vMotion as well as a Storage vMotion simultaneously. In addition, this becomes an enabler to perform vMotion without the shared storage requirement.  Enhanced vMotion means we are able to migrate a virtual machine stored on local host storage, to shared storage, and then to local storage again.  Or perhaps migrate virtual machines from one host to another with each having their own locally attached storage only.  Updated 9/5/12 The phrase “Enhanced vMotion” should be correctly read as “vMotion that has been enhanced”.  “Enhanced vMotion” is not an actual feature, product, or separate license.  It is an improvement over the previous vMotion technology and included wherever vMotion is bundled.

Snagit Capture

Enhanced vMotion Requirements:

  • Hosts must be managed by same vCenter Server
  • Hosts must be part of same Datacenter
  • Hosts must be on the same layer-2 network (and same switch if VDS is used)

Operational Considerations:

  • Enhanced vMotion is a manual process
  • DRS and SDRS automation do not leverage enhanced vMotion
  • Max of two (2) concurrent Enhanced vMotions per host
  • Enhanced vMotions count against concurrent limitations for both vMotion and Storage vMotion
  • Enhanced vMotion will leverage multi-NIC when available

Next Generation vSphere Client a.k.a. vSphere Web Client – An enhanced version of the vSphere Web Client which has already been available in vSphere 5.0.  As of vSphere 5.1, the vSphere Web Client becomes the defacto standard client for managing the vSphere virtualized datacenter.  Going forward, single sign-on infrastructure management will converge into a unified interface which any administrator can appreciate.  vSphere 5.1 will be the last platform to include the legacy vSphere client. Although you may use this client day to day while gradually easing into the Web Client, understand that all future development from VMware and its partners now go into the Web Client. Plug-ins currently used today will generally still function with the legacy client (with support from their respective vendors) but they’ll need to be completely re-written vCenter Server side for the Web Client.  Aside from the unified interface, the architecture of the Web Client has scaling advantages as well.  As VMware adds bolt-on application functionality to the client, VMware partners will now have the ability to to bring their own custom objects objects into the Web Client thereby extending that single pane of glass management to other integrations in the ecosystem.

 

Here is a look at that vSphere Web Client architecture:

Snagit Capture

Requirements:

  • Internet Explorer / FireFox / Chrome
  • others (Safari, etc.) are possible, but will lack VM console access

A look at the vSphere Web Client interface and its key management areas:

Snagit Capture

Where the legacy vSphere Client fall short and now the vSphere Web Client solves these issues:

  • Single Platform Support (Windows)
    • vSphere Web Client is Platform Agnostic
  • Scalability Limits
    • Built to handle thousands of objects
  • White Screen of Death
    • Performance
  • Inconsistent look and feel across VMware solutions
    • Extensibility
  • Workflow Lock
    • Pause current task and continue later right where you left off (this one is cool!)
    • Browser Behavior
  • Upgrades
    • Upgrade a Single serverside component

 vCloud Director 5.1

In the recent past, VMware aligned common application and platform releases to ease issues that commonly occurred with compatibility.  vCloud Director, the cornerstone of the vCloud Suite, is obviously the cornerstone in how VMware will deliver infrastructure, applications, and *aaS now and into the future. So what’s new in vCloud Director 5.1?  First an overview of the vCloud Suite:

Snagit Capture

And a detailed list of new features:

  • Elastic Virtual Datacenters – Provider vDCs can span clusters leveraging VXLAN allowing the distribution and mobility of vApps across infrastructure and the growing the vCloud Virtual Datacenter
  • vCloud Networking & Security VXLAN
  • Profile-Driven Storage integration with user and storage provided capabilities
  • Storage DRS (SDRS) integration
    • Exposes storage Pod as first class storage container (just like a datastores) making it visible in all workflows where a datastore is visible
    • Creation, modification, and deletion of spods not possible in vCD
    • Member datastore operations not permissible in VCD
  • Single level Snapshot & Revert support for vApps (create/revert/remove); integration with Chargeback
  • Integrated vShield Edge Gateway
  • Integrated vShield Edge Configuration
  • vCenter Single Sign-On (SSO)
  • New Features in Networking
    • Integrated Organization vDC Creation Workflow
    • Creates compute, storage, and networking objects in a single workflow
    • The Edge Gateway are exposed at Organization vDC level
    • Organization vDC networks replace Organization networks
    • Edge Gateways now support:
      • Multiple Interfaces on a Edge Gateway
      • The ability to sub-allocate IP pools to a Edge Gatewa
      • Load balancing
      • HA (not the same as vSphere HA)
        • Two edge VMs deployed in Active-Passive mode
        • Enabled at time of gateway creation
        • Can also be changed after the gateway has been completed
        • Gets deployed with first Organizational network created that uses this gateway
      • DNS Relay
        • Provides a user selectable checkbox to enable
        • If DNS servers are defined for the selected external network, DNS requests will be sent to the specified server. If not, then DNS requests will be sent to the default gateway of the external network.
      • Rate limiting on external interface
    • Organization networks replaced by Organization vDC Networks
      • Organization vDC Networks are associated with an Organization vDC
      • The network pool associated with Organization vDC is used to create routed and isolated Organization vDC networks
      • Can be shared across Organization vDCs in an Organization
    • Edge Gateways
      • Are associated with an Organization vDC, can not be shared across Organization vDCs
      • Can be connected to multiple external networks
        • Multiple routed Organization vDC networks will be connected to the same Edge Gateway
      • External network connectivity for the Organization vDC Network can be changed after creation by changing the external networks which the edge gateway is connected.
      • Allows IP pool of external networks to be sub-allocated to the Edge Gateway
        • Needs to be specified in case of NAT and Load Balancer
    • New Features in Gateway Services
      • Load balancer service on Edge Gateways
      • Ability to add multiple subnets to VPN tunnels
      • Ability to add multiple DHCP IP pools
      • Ability to add explicit SNAT and DNAT rules providing user with full control over address translation
      • IP range support in Firewall and NAT services
      • Service Configuration Changes
        • Services are configured on Edge Gateway instead of at the network level
        • DHCP can be configured on Isolated Organization vDC networks.
  • Usability Features
    • New default branding style
      • Cannot revert back to the Charcoal color scheme
      • Custom CSS files will require modification
    • Improved “Add vApp from Catalog” wizard workflow
    • Easy access to VM Quota and Lease Expirations
    • New dropdown menu that includes details and search
    • Redesigned catalog navigation and sub-entity hierarchy
    • Enhanced help and documentation links
  • Virtual Hardware Version 9
    • Supports features presented by HW9 (like 64 CPU support)
    • Supports Hardware Virtualization Calls
    • VT-x/EPT or AMD-V/RVI
    • Memory overhead increased, vMotion limited to like hardware
    • Enable/Disable exposed to users who have rights to create a vApp Template
  • Additional Guest OS Support
    • Windows 8
    • Mac OS 10.5, 10.6 and 10.7
  • Storage Independent of VM Feature
    • Added support for Independent Disks
    • Provides REST API support for actions on Independent Disks
      • As these consume disk space, the vCD UI was updated to show user when they are used:
      • Organizations List Page
      • A new Independent Disks count column is added.
      • Organization Properties Page
      • Independent Disks tab is added to show all independent disks belonging to vDC
      • Tab is not shown if no independent disk exists in the vDC.
      • Virtual Machine Properties Page
      • Hardware tab->Hard Disks section, attached independent disks are shown by their names and all fields for the disk are disabled as they are not editable.

That’s all I have time for right now.  As I said, there is more to come later on topics such as vDS enhancements, VXLAN, SRM, vCD Load Balancing, and vSphere Replication.  Stay tuned!