Posts Tagged ‘Hardware’

Dell Enterprise Manager Client Gets Linux Makeover

April 24th, 2015

Dell storage customers who have been watching the evolution of Enterprise Manager may be interested in the latest release which was just made available.  Aside from adding support for the brand new SCv2000 Series Storage Centers and bundling Java Platform SE 7 Update 67 with the installation of both the Data Collector on Windows and the Client on Windows or Linux (a prerequisite Java installation is no longer required), a Linux client has been introduced for the first time and runs on several Linux operating systems.  The Linux client is Java based and has the same look and feel as the Windows based client.  Some of the details about this release below.

Enterprise Manager 2015 R1 Data Collector and Client management compatibility:

  • Dell Storage Center OS versions 5.5-6.6
  • Dell FS8600 versions 3.0-4.0
  • Dell Fluid Cache for SAN version 2.0.0
  • Microsoft System Center Virtual Machine Manager (SCVMM) versions 2012, 2012 SP1, and 2012 R2
  • VMware vSphere Site Recovery Manager versions 5.x (HCL), 6.0 (compatible)

Enterprise Manager 2015 R1 Client for Linux operating system requirements:

  • RHEL 6
  • RHEL 7
  • SUSE Linux Enterprise 12
  • Oracle Linux 6.5
  • Oracle Linux 7.0
  • 32-bit (x86) or 64-bit (x64) CPU
  • No support for RHEL 5 but I’ve tried it and it seems to work

Although the Enterprise Manager Client for Linux can be installed without a graphical environment, launching and using the client requires the graphical environment.  As an example, neither RHEL 6 or RHEL 7 install a graphical environment by default.  Overall, installing a graphical environment for both RHEL 6 and RHEL 7 is similar in that it requires a yum repository. However, the procedure is slightly different for each version.  There are several resources available on the internet which walk through the process.  I’ll highlight a few below.

Log in with root access.

To install a graphical environment for RHEL 6, create a yum repository and install GNOME or KDE by following the procedure here.

To install a graphical environment for RHEL 7, create a yum repository by following this procedure and install GNOME by following the procedure here.

Installing the Enterprise Manager Client is pretty straightforward.  Copy the RPM to a temporary directory on the Linux host and use rpm -U to install:

rpm -U dell-emclient-15.1.2-45.x86_64.rpm

Alternatively, download the client from the Enterprise Manager Data Collector using the following syntax as an example:

wget em1.boche.lab:3033 –no-check-certificate https://em1.boche.lab:3033/em/EnterpriseManager/web/apps/client/EmClient.rpm

rpm -U EmClient.rpm

Once installed, launch the Enterprise Manager Client from the /var/lib/dell/bin/ directory:

cd /var/lib/dell/bin/

./Client

or

/var/lib/dell/bin/Client

We’re rewarded with the Enterprise Manager 2015 R1 Client splash screen.  New features are found here to immediately manage SCv2000 Series Storage Centers (the SCv2000 Series is the first Storage Center whereby the web based management console has been retired).

Once logged in, it’s business as usual in a familiar UI.

Dell, and before it Compellent, has long since offered a variety of options and integrations to manage Storage Center as well as popular platforms and applications.  The new Enterprise Manager Client for Linux extends that list of management methods available.

A Common NPIV Problem with a Solution

December 29th, 2014

Several years ago, one of the first blog posts that I tackled was working in the lab with N_Port ID Virtualization often referred to as NPIV for short. The blog post was titled N_Port ID Virtualization (NPIV) and VMware Virtual Infrastructure. At the time it was one of the few blog posts available on the subject because it was a relatively new feature offered by VMware. Over the years that followed, I haven’t heard much in terms of trending adoption rates by customers. Likewise, VMware hasn’t put much effort into improving NPIV support in vSphere or promoting its use. One might contemplate, which is the cause and which is the effect. I feel it’s a mutual agreement between both parties that NPIV in its current state isn’t exciting enough to deploy and the benefits fall into a very narrow band of interest (VMware: Give us in guest virtual Fibre Channel – that would be interesting).

Despite its market penetration challenges, from time to time I do receive an email from someone referring to my original NPIV blog post looking for some help in deploying or troubleshooting NPIV. The nature of the request is common and it typically falls into one of two categories:

  1. How can I set up NPIV with a fibre channel tape library?
  2. Help – I can’t get NPIV working.

I received such a request a few weeks ago from the field asking for general assistance in setting up NPIV with Dell Compellent storage. The correct steps were followed to the best of their knowledge but the virtual WWPNs that were initialized at VM power on would not stay lit after the VM began to POST. In Dell Enterprise Manager, the path to the virtual machine’s assigned WWPN was down. Although the RDM storage presentation was functioning, it was only working through the vSphere host HBAs and not the NPIV WWPN. This effectively means that NPIV is not working:

In addition, the NPIV initialization failure is reflected in the vmkernel.log:

2014-12-15T16:32:28.694Z cpu25:33505)qlnativefc: vmhba64(41:0.0): vlan_id: 0x0
2014-12-15T16:32:28.694Z cpu25:33505)qlnativefc: vmhba64(41:0.0): vn_port_mac_address: 00:00:00:00:00:00
2014-12-15T16:32:28.793Z cpu25:33505)qlnativefc: vmhba64(41:0.0): Assigning new target ID 0 to fcport 0x410a524d89a0
2014-12-15T16:32:28.793Z cpu25:33505)qlnativefc: vmhba64(41:0.0): fcport 5000d3100002b916 (targetId = 0) ONLINE
2014-12-15T16:32:28.809Z cpu27:33505)qlnativefc: vmhba64(41:0.0): Assigning new target ID 1 to fcport 0x410a524d9260
2014-12-15T16:32:28.809Z cpu27:33505)qlnativefc: vmhba64(41:0.0): fcport 5000d3100002b90c (targetId = 1) ONLINE
2014-12-15T16:32:28.825Z cpu27:33505)qlnativefc: vmhba64(41:0.0): Assigning new target ID 2 to fcport 0x410a524d93e0
2014-12-15T16:32:28.825Z cpu27:33505)qlnativefc: vmhba64(41:0.0): fcport 5000d3100002b915 (targetId = 2) ONLINE
2014-12-15T16:32:28.841Z cpu27:33505)qlnativefc: vmhba64(41:0.0): Assigning new target ID 3 to fcport 0x410a524d9560
2014-12-15T16:32:28.841Z cpu27:33505)qlnativefc: vmhba64(41:0.0): fcport 5000d3100002b90b (targetId = 3) ONLINE
2014-12-15T16:32:30.477Z cpu22:19117991)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T16:32:32.477Z cpu22:19117991)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T16:32:34.480Z cpu22:19117991)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T16:32:36.480Z cpu22:19117991)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T16:32:38.482Z cpu22:19117991)ScsiNpiv: 1152: NPIV vport rescan complete, [5:24] (0x410943893dc0) [0x410943680ec0] status=0xbad0040
2014-12-15T16:32:38.503Z cpu22:19117991)ScsiScan: 140: Path ‘vmhba2:C0:T3:L24′: Peripheral qualifier 0x1 not supported
2014-12-15T16:32:38.503Z cpu22:19117991)WARNING: ScsiNpiv: 1141: Physical uid does not match VPORT uid, NPIV Disabled for this VM
2014-12-15T16:32:38.503Z cpu22:19117991)ScsiNpiv: 1152: NPIV vport rescan complete, [3:24] (0x410943856e80) [0x410943680ec0] status=0xbad0132
2014-12-15T16:32:38.503Z cpu22:19117991)WARNING: ScsiNpiv: 1788: Failed to Create vport for world 19117994, vmhba2, rescan failed, status=bad0001
2014-12-15T16:32:38.504Z cpu14:33509)ScsiAdapter: 2806: Unregistering adapter vmhba64

To review, the requirements for implementing NPIV with vSphere are documented by VMware and I outlined the key ones in my original blog post:

  • NPIV support on the fabric switches (typically found in 4Gbps or higher fabric switches but I’ve seen firmware support in 2Gbps switches also)
  • NPIV support on the vShpere host HBAs (this typically means 4Gbps or higher port speeds)
  • NPIV support from the storage vendor
  • NPIV support from a supported vSphere version
  • vSphere Raw Device Mapping
  • Correct fabric zoning configured between host HBAs, the virtual machine’s assigned WWPN(s), and the storage front end ports
  • Storage presentation to the vSphere host HBAs as well as the virtual machine’s assigned NPIV WWPN(s)

If any of the above requirements are not met (plus a handful of others and we’ll get to one of them shortly), vSphere’s NPIV feature will likely not function.

In this particular case, general NPIV requirements were met. However, it was discovered a best practice had been missed in configuring the QLogic HBA BIOS (the QLogic BIOS is accessed at host reboot by pressing CTRL + Q or ALT + Q when prompted). Connection Options remained at its factory default value of 2 or Loop preferred, otherwise point to point.

Dell Compellent storage with vSphere best practices call for this value to be hard coded to 1 or Point to point only. When the HBA has multiple ports, this configuration needs to be made across all ports that are used for Dell Compellent storage connectivity. It goes without saying this also applies across all of the fabric attached hosts in the vSphere cluster.

Once configured for Point to point connectivity on the fabric, the problem is resolved.

Despite the various error messages returned as vSphere probes for possible combinations between the vSphere assigned virtual WWPN and the host WWPNs, NPIV success looks something like this in the vmkernel.log (you’ll notice subtle differences showing success compared to the failure log messages above):

2014-12-15T18:43:52.270Z cpu29:33505)qlnativefc: vmhba64(41:0.0): vlan_id: 0x0
2014-12-15T18:43:52.270Z cpu29:33505)qlnativefc: vmhba64(41:0.0): vn_port_mac_address: 00:00:00:00:00:00
2014-12-15T18:43:52.436Z cpu29:33505)qlnativefc: vmhba64(41:0.0): Assigning new target ID 0 to fcport 0x410a4a569960
2014-12-15T18:43:52.436Z cpu29:33505)qlnativefc: vmhba64(41:0.0): fcport 5000d3100002b916 (targetId = 0) ONLINE
2014-12-15T18:43:52.451Z cpu29:33505)qlnativefc: vmhba64(41:0.0): Assigning new target ID 1 to fcport 0x410a4a569ae0
2014-12-15T18:43:52.451Z cpu29:33505)qlnativefc: vmhba64(41:0.0): fcport 5000d3100002b90c (targetId = 1) ONLINE
2014-12-15T18:43:52.466Z cpu29:33505)qlnativefc: vmhba64(41:0.0): Assigning new target ID 2 to fcport 0x410a4a569c60
2014-12-15T18:43:52.466Z cpu29:33505)qlnativefc: vmhba64(41:0.0): fcport 5000d3100002b915 (targetId = 2) ONLINE
2014-12-15T18:43:52.481Z cpu29:33505)qlnativefc: vmhba64(41:0.0): Assigning new target ID 3 to fcport 0x410a4a569de0
2014-12-15T18:43:52.481Z cpu29:33505)qlnativefc: vmhba64(41:0.0): fcport 5000d3100002b90b (targetId = 3) ONLINE
2014-12-15T18:43:54.017Z cpu0:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:43:56.018Z cpu0:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:43:58.020Z cpu0:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:44:00.022Z cpu0:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:44:02.024Z cpu0:36379)ScsiNpiv: 1152: NPIV vport rescan complete, [4:24] (0x4109436ce9c0) [0x410943684040] status=0xbad0040
2014-12-15T18:44:02.026Z cpu2:36379)ScsiNpiv: 1152: NPIV vport rescan complete, [2:24] (0x41094369ca40) [0x410943684040] status=0x0
2014-12-15T18:44:02.026Z cpu2:36379)ScsiNpiv: 1701: Physical Path : adapter=vmhba3, channel=0, target=5, lun=24
2014-12-15T18:44:02.026Z cpu2:36379)ScsiNpiv: 1701: Physical Path : adapter=vmhba2, channel=0, target=2, lun=24
2014-12-15T18:44:02.026Z cpu2:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:44:04.028Z cpu2:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:44:06.030Z cpu2:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:44:08.033Z cpu2:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:44:10.035Z cpu2:36379)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2014-12-15T18:44:12.037Z cpu2:36379)ScsiNpiv: 1152: NPIV vport rescan complete, [4:24] (0x4109436ce9c0) [0x410943684040] status=0xbad0040
2014-12-15T18:44:12.037Z cpu2:36379)ScsiNpiv: 1160: NPIV vport rescan complete, [2:24] (0x41094369ca40) [0x410943684040] vport exists
2014-12-15T18:44:12.037Z cpu2:36379)ScsiNpiv: 1701: Physical Path : adapter=vmhba3, channel=0, target=2, lun=24
2014-12-15T18:44:12.037Z cpu2:36379)ScsiNpiv: 1848: Vport Create status for world:36380 num_wwpn=1, num_vports=1, paths=4, errors=3

One last item I’ll note here for posterity is that this particular case, the problem does not present itself uniformly across all storage platforms. This was an element that prolonged troubleshooting to a degree because the vSphere cluster was successful in establishing NPIV fabric connectivity to two other types of storage using the same vSphere hosts, hardware, and fabric switches. Because of this in the beginning it seemed logical to rule out any configuration issues within the vSphere hosts.

To summarize, there are many technical requirements outlined in VMware documentation to correctly configure NPIV. If you’ve followed VMware’s steps correctly but problems with NPIV remain, refer to storage, fabric, and hardware documentation and verify best practices are being met in the deployment.

Storage Center 5.6 Released

November 25th, 2013

I don’t have the latest and greatest Dell Compellent SC8000 controllers or SC220 2.5″ drive enclosures in my home lab although I dream nightly about Santa unloading some on me this Christmas.  What I do have is an older Series 20 and I am thankful for that.  But having an older storage array doesn’t mean I cannot leverage some of the latest and greatest features and operating systems available for datacenters.

Storage Center 5.6 was released just a short time ago and it ushers in some feature and platform support currently built into Storage Center 6.x as well as a large number of bug fixes.  This is a big win for me and anyone with 32-bit system (Series 30 or below) needing these features because SCOS 6.x is 64-bit only for Series 40 and newer which today includes the SC8000.

So what are these new features in 5.6 and why am I so excited?  I’m glad you asked.  For this guy, and on top of the list, it’s full support of all VAAI primitives.  Storage Center 5.5 and older boasted support of the block zeroing primitive.  Space Reclamation was there as well although that primitve alone did not satisfy the other component of the thin provisioning primitive which was STUN.

Shown below a Storage Center 5.5 datastore where I lack Atomic Test and Set (aka Hardware Assisted Locking) and XCOPY.  I have block zeroing and Space Reclamation using the Free Space Recovery agent for vSphere guest VMs using physical RDMs. VAAI support status can be obtained in full using esxcli:

Snagit Capture

Or in part using the vSphere Client GUI:

Snagit Capture

After the Storage Center 5.6 upgrade, I’ve got additional VAAI primitive support where Clone in most cases is going to be the biggest one in terms of fabric and host efficiency and performance. Not shown is support for Thin Provisioning Stun but that has been added as well:

Snagit Capture

The vSphere Client GUI now reflects full VAAI support after the 5.6 upgrade:

Snagit Capture

What else? Added support for vSphere 5.5 as an operating system type:

Snagit Capture

Last but not least, added support for Windows 2012 and some of its features including Offloaded Data Transfer, Thin Provisioning, Space Reclamation, and Server Objects:

Snagit Capture

Storage Center 5.6 also adds new storage features which are storage host agnostic such as Background Media Scans (BMS) as well as improved disk and HBA management for server objects.  And the bug fixes I mentioned earlier – refer to the SCOS 5.6 Release Notes for details.

To wrap this up, if you’ve got an older Storage Center model and you want support for these new features while avoiding a forklift upgrade, Storage Center Operating System 5.6 is the way to go.

vSphere 5.5 UNMAP Deep Dive

September 13th, 2013

One of the features that has been updated in vSphere 5.5 is UNMAP which is one of two sub-components of what I’ll call the fourth block storage based thin provisioning VAAI primitive (the other sub-component is thin provisioning stun).  I’ve already written about UNMAP a few times in the past.  It was first introduced in vSphere 5.0 two years ago.  A few months later the feature was essentially recalled by VMware.  After it was re-released by VMware in 5.0 Update 1, I wrote about its use here and followed up with a short piece about the .vmfsBalloon file here.

For those unfamiliar, UNMAP is a space reclamation mechanism used to return blocks of storage back to the array after data which was once occupying those blocks has been moved or deleted.  The common use cases are deleting a VM from a datastore, Storage vMotion of a VM from a datastore, or consolidating/closing vSphere snapshots on a datastore.  All of these operations, in the end, involve deleting data from pinned blocks/pages on a volume.  Without UNMAP, these pages, albeit empty and available for future use by vSphere and its guests only, remain pinned to the volume/LUN backing the vSphere datastore.  The pages are never returned back to the array for use with another LUN or another storage host.  Notice I did not mention shrinking a virtual disk or a datastore – neither of those operations are supported by VMware.  I also did not mention the use case of deleting data from inside a virtual machine – while that is not supported, I believe there is a VMware fling for experimental use.  In summary, UNMAP extends the usefulness of thin provisioning at the array level by maintaining storage efficiency throughout the life cycle of the vSphere environment and the array which supports the UNMAP VAAI primitive.

On the Tuesday during VMworld, Cormac Hogan launched his blog post introducing new and updated storage related features in vSphere 5.5.  One of those features he summarized was UNMAP.  If you haven’t read his blog, I’d definitely recommend taking a look – particularly if you’re involved with vSphere storage.  I’m going to explore UNMAP in a little more detail.

The most obvious change to point out is the command line itself used to initiate the UNMAP process.  In previous versions of vSphere, the command issued on the vSphere host was:

vmkfstools -y x (where x represent the % of storage to unmap)

As Cormac points out, UNMAP has been moved to esxcli namespace in vSphere 5.5 (think remote scripting opportunities after XYZ process) where the basic command syntax is now:

esxcli storage vmfs unmap

In addition to the above, there are also three switches available for use; of first two listed below, one is required, and the third is optional.

-l|–volume-label= The label of the VMFS volume to unmap the free blocks.

-u|–volume-uuid= The uuid of the VMFS volume to unmap the free blocks.

-n|–reclaim-unit= Number of VMFS blocks that should be unmapped per iteration.

Previously with vmkfstools, we’d change to VMFS folder in which we were going to UNMAP blocks from.  In vSphere 5.5, the esxcli command can be run from anywhere so specifying the the datastore name or the uuid is one of the required parameters for obvious reasons.  So using the datastore name, the new UNMAP command in vSphere 5.5 is going to look like this:

esxcli storage vmfs unmap -l 1tb_55ds

As for the optional parameter, the UNMAP command is an iterative process which continues through numerous cycles until complete.  The reclaim unit parameter specifies the quantity of blocks to unmap per each iteration of the UNMAP process.  In previous versions of vSphere, VMFS-3 datastores could have block sizes of 1, 2, 4, or 8MB.  While upgrading a VMFS-3 datastore to VMFS-5 will maintain these block sizes, executing an UNMAP operation on a native net-new VMFS-5 datastore results in working with a 1MB block size only.  Therefore, if a reclaim unit value of 100 is specified on a VMFS-5 datastore with a 1MB block size, then 100MB data will be returned to the available raw storage pool per iteration until all blocks marked available for UNAMP are returned.  Using a value of 100, the UNMAP command looks like this:

esxcli storage vmfs unmap -l 1tb_55ds -n 100

If the reclaim unit value is unspecified when issuing the UNMAP command, the default reclaim unit value is 200, resulting in 200MB of data returned to the available raw storage pool per iteration assuming a 1MB block size datastore.

One additional piece to to note on the CLI topic is that in a release candidate build I was working with, while the old vmkfstools -y command is deprecated, it appears to still exist but with newer vSphere 5.5 functionality published in the –help section:

vmkfstools vmfsPath -y –reclaimBlocks vmfsPath [–reclaimBlocksUnit #blocks]

The next change involves the hidden temporary balloon file (refer to my link at the top if you’d like more information about the balloon file but basically it’s a mechanism used to guarantee blocks targeted for UNMAP are not in the interim written to by an outside I/O request until the UNMAP process is complete).  It is no longer named .vmfsBalloon.  The new name is .asyncUnmapFile as shown below.

/vmfs/volumes/5232dd00-0882a1e4-e918-0025b3abd8e0 # ls -l -h -A
total 998408
-r——–    1 root     root      200.0M Sep 13 10:48 .asyncUnmapFile
-r——–    1 root     root        5.2M Sep 13 09:38 .fbb.sf
-r——–    1 root     root      254.7M Sep 13 09:38 .fdc.sf
-r——–    1 root     root        1.1M Sep 13 09:38 .pb2.sf
-r——–    1 root     root      256.0M Sep 13 09:38 .pbc.sf
-r——–    1 root     root      250.6M Sep 13 09:38 .sbc.sf
drwx——    1 root     root         280 Sep 13 09:38 .sdd.sf
drwx——    1 root     root         420 Sep 13 09:42 .vSphere-HA
-r——–    1 root     root        4.0M Sep 13 09:38 .vh.sf
/vmfs/volumes/5232dd00-0882a1e4-e918-0025b3abd8e0 #

As discussed in the previous section, use of the UNMAP command now specifies the the actual size of the temporary file instead of the temporary file size being determined by a percentage of space to return to the raw storage pool.  This is an improvement in part because it helps avoid the catastrophe if UNMAP tried to remove 2TB+ in a single operation (discussed here).

VMware has also enhanced the functionality of the temporary file.  A new kernel interface in ESXi 5.5 allows the user to ask for blocks beyond a a specified block address in the VMFS file system.  This ensures that the blocks allocated to the temporary file were never allocated to the temporary file previously.  The benefit realized in the end is that any size temporary file can be created and with UNMAP issued to the blocks allocated to the temporary file, we can rest assured that we can issue UNMAP on all free blocks on the datastore.

Going a bit deeper and adding to the efficiency, VMware has also enhanced UNMAP to support multiple block descriptors.  Compared to vSphere 5.1 which issued just one block descriptor per UNMAP command, vSphere 5.5 now issues up to 100 block descriptors depending on the storage array (these identifying capabilities are specified internally in the Block Limits VPD (B0) page).

A look at the asynchronous and iterative vSphere 5.5 UNMAP logical process:

  1. User or script issues esxcli UNMAP command
  2. Does the array support VAAI UNMAP?  yes=3, no=end
  3. Create .asyncUnmapFile on root of datastore
  4. .asyncUnmapFile created and locked? yes=5, no=end
  5. Issue 10CTL to allocate reclaim-unit blocks of storage on the volume past the previously allocated block offset
  6. Did the previous block allocation succeed? yes=7, no=remove lock file and retry step 6
  7. Issue UNMAP on all blocks allocated above in step 5
  8. Remove the lock file
  9. Did we reach the end of the datastore? yes=end, no=3

From a performance perspective, executing the UNMAP command in my vSphere 5.5 RC lab showed peak write I/O of around 1,200MB/s with an average of around 200IOPS comprised of a 50/50 mix of read/write.  The UNMAP I/O pattern is a bit hard to gauge because with the asynchronous iterative process, it seemed to do a bunch of work, rest, do more work, rest, and so on.  Sorry no screenshots because flickr.com is currently down.  Perhaps the most notable takeaway from the performance section is that as of vSphere 5.5, VMware is lifting the recommendation of only running UNMAP during a maintenance window.  Keep in mind this is just a recommendation.  I encourage vSphere 5.5 customers to test UNMAP in their lab first using various reclaim unit sizes.  While do this, examine performance impacts to the storage fabric, the storage array (look at both front end and back end), as well as other applications sharing the array.  Remember that fundamentally the UNMAP command is only going to provide a benefit AFTER its associated use cases have occurred (mentioned at the top of the article).  Running UNMAP on a volume which has no pages to be returned will be a waste of effort.  Once you’ve become comfortable with using UNMAP and understanding its impacts in your environment, consider running it on a recurring schedule – perhaps weekly.  It really depends on how much the use cases apply to your environment.  Many vSphere backup solutions leverage vSphere snapshots which is one of the use cases.  Although it could be said there are large gains to be made with UNMAP in this case, keep in mind backups run regularly and and space that is returned to raw storage with UNMAP will likely be consumed again in the following backup cycle where vSphere snapshots are created once again.

To wrap this up, customers who have block arrays supporting the thin provision VAAI primitive will be able to use UNMAP in vSphere 5.5 environments (for storage vendors, both sub-components are required to certify for the primitive as a whole on the HCL).  This includes Dell Compellent customers with current version of Storage Center firmware.  Customers who use array based snapshots with extended retention periods should keep in mind that while UNMAP will work against active blocks, it may not work with blocks maintained in a snapshot.  This is to honor the snapshot based data protection retention.

The .vmfsBalloon File

July 1st, 2013

One year ago, I wrote a piece about thin provisioning and the role that the UNMAP VAAI primitive plays in thin provisioned storage environments.  Here’s an excerpt from that article:

When the manual UNMAP process is run, it balloons up a temporary hidden file at the root of the datastore which the UNMAP is being run against.  You won’t see this balloon file with the vSphere Client’s Datastore Browser as it is hidden.  You can catch it quickly while UNMAP is running by issuing the ls -l -a command against the datastore directory.  The file will be named .vmfsBalloonalong with a generated suffix.  This file will quickly grow to the size of data being unmapped (this is actually noted when the UNMAP command is run and evident in the screenshot above).  Once the UNMAP is completed, the .vmfsBalloon file is removed.

Has your curiosity ever got you wondering about the technical purpose of the .vmfsBalloon file?  It boils down to data integrity and timing.  At the time the UNMAP command is run, the balloon file is immediately instantiated and grows to occupy (read: hog) all of the blocks that are about to be unmapped.  It does this so that during the unmap process, none of the blocks are allocated during the process of new file creation elsewhere.  If you think about it, it makes sense – we just told vSphere to give these blocks back to the array.  If during the interim one or more of these blocks were suddenly allocated for a new file or file growth purposes, then we purge the block, we have a data integrity issue.  More accurately, newly created data will be missing as its block or blocks were just flushed back to the storage pool on the array.

Available Lab Gear

May 29th, 2013

Heads up to any locals looking for server grade vSphere hardware infrastructure.  I’ve been doing some lab spring cleaning the past few weeks and after some consolidation efforts, I’ve got some hardware available that’s not being put to good use any longer.  All of these are 64-bit and will run vSphere.  All have Ethernet and/or Fibre Channel options.

  • 2x HP DL385 (1x AMD DC Opteron, 4GB RAM, rails, RPS)
  • 2x HP DL385 G2 (2x AMD QC Opteron (Barcelona), 34GB RAM, rails, RPS)
  • 2x HP DL585 G2 (4x AMD DC Opteron, 64GB RAM, RPS, power cables)

If you have any questions not pertaining to power or heat, please ask.

You pick up – Lakeville, MN.

The price is right – email me if interested.

Large Memory Pages and Shrinking Consolidation Ratios

March 19th, 2013

Here’s a discussion that has somewhat come full circle for me and could prove to be a handy for those with lab or production environments alike.

A little over a week ago I was having lunch with a former colleague and naturally a TPS discussion broke out.  We talked about how it worked and how effective it was with small memory pages (4KB in size) as well as large memory pages (2MB in size).  The topic was brought up with a purpose in mind.

Many moons ago, VMware virtualized datacenters consisted mainly of Windows 2000 Server and Windows Server 2003 virtual machines which natively leverage small memory pages – an attribute built into the guest operating system itself.  Later, Windows Vista as well as 2008 and its successors came onto the scene allocating large memory pages by default (again – at the guest OS layer) to boost performance for certain workload types.  To maintain flexibility and feature support, VMware ESX and ESXi hosts have supported large pages by default providing the guest operating system requested them.  For those operating systems that still used the smaller memory pages, those were supported by the hypervisor as well.  This support and configuration remains the default today in vSphere 5.1 in an advanced host-wide setting called Mem.AllocGuestLargePage (1 to enable and support both large and small pages – the default, 0 to disable and force small pages).  VMware released a small whitepaper covering this subject several years ago titled Large Page Performance which summarizes lab test results and provides the steps required to toggle large pages in the hypervisor as well as within Windows Server 2003

As legacy Windows platforms were slowly but surely replaced by their Windows Server 2008, R2, and now 2012 predecessors, something began to happen.  Consolidation ratios gated by memory (very typical mainstream constraint in most environments I’ve managed and shared stories about) started to slip.  Part of this can be attributed to the larger memory footprints assigned to the newer operating systems.  That makes sense, but this only explains a portion of the story.  The balance of memory has evaporated as a result of modern guest operating systems using large 2MB memory pages which will not be consolidated by the TPS mechanism (until a severe memory pressure threshold is crossed but that’s another story discussed here and here).

For some environments, many I imagine, this is becoming a problem which manifests itself as an infrastructure capacity growth requirement as guest operating systems are upgraded.  Those with chargeback models where the customer or business unit paid up front at the door for their VM or vApp shells are now getting pinched because compute infrastructure doesn’t spread as thin as it once did.  This will be most pronounced in the largest of environments.  A pod or block architecture that once supplied infrastructure for 500 or 1,000 VMs now fills up with significantly less.

So when I said this discussion has come full circle, I meant it.  A few years ago Duncan Epping wrote an article called KB Article 1020524 (TPS and Nehalem) and a portion of this blog post more or less took place in the comments section.  Buried in there was a comment I had made while being involved in the discussion (although I don’t remember it).  So I was a bit surprised when a Google search dug that up.  It wasn’t the first time that has happened and I’m sure it won’t be the last.

Back to reality.  After my lunch time discussion with Jim, I decided to head to my lab which, from a guest OS perspective, was all Windows Server 2008 R2 or better, plus a bit of Linux for the appliances.  Knowing that the majority of my guests were consuming large memory pages, how much more TPS savings would result if I forced small memory pages on the host?  So I evacuated a vSphere host using maintenance mode, configured Mem.AllocGuestLargePage to a value of 0, then placed all the VMs back onto the host.  Shown below are the before and after results.

A decrease in physical memory utilization of nearly 20% per host – TPS is alive again:

Snagit Capture Snagit Capture

124% increase in Shared memory in Tier1 virtual Machines:

Snagit Capture Snagit Capture

90% increase in Shared memory in Tier3 virtual Machines:

Snagit Capture Snagit Capture

Perhaps what was most interesting was the manner in which TPS consolidated pages once small pages were enabled.  The impact was not realized right away nor was it a gradual gain in memory efficiency as vSphere scanned for duplicate pages.  Rather it seemed to happen in batch almost all at once 12 hours after large pages had been disabled and VMs had been moved back onto the host:

Snagit Capture

So for those of you who may be scratching your heads wondering what is happening to your consolidation ratios lately, perhaps this has some or everything to do with it.  Is there an action item to be carried out here? That depends on what your top priority when comparing infrastructure performance in one hand and maximized consolidation in the other.

Those who are on a lean infrastructure budget (home lab would be an ideal fit here), consider forcing small pages to greatly enhance TPS opportunities to stretch your lab dollar which has been getting consumed by modern operating systems and and increasing number of VMware and 3rd party appliances.

Can you safely disable large pages in production clusters? It’s a performance question I can’t answer that globally.  You may or may not see performance hit to your virtual machines based on their workloads.  Remember that the use of small memory pages and AMD Rapid Virtualization Indexing (RVI) and Intel Extended Page Tables (EPT) is mutually exclusive.  Due diligence testing is required for each environment.  As it is a per host setting, testing with the use of vMotion really couldn’t be easier.  Simply disable large pages on one host in a cluster and migrate the virtual machines in question to that host and let them simmer.  Compare performance metrics before and after.  Query your users for performance feedback (phrase the question in a way that implies you added horsepower instead of asking the opposite “did the application seem slower?”)

That said, I’d be curious to hear if anyone in the community disables large pages in their environments as a regular habit or documented build procedure and what the impact has been if any on both the memory utilization as well as performance.

Last but not least, Duncan has another good blog post titled How many pages can be shared if Large Pages are broken up?  Take a look at that for some tips on using ESXTOP to monitor TPS activity.

Update 3/21/13:  I didn’t realize Gabrie had written about this topic back in January 2011.  Be sure to check out his post Large Pages, Transparent Page Sharing and how they influence the consolidation ratio.  Sorry Gabrie, hopeuflly understand I wasn’t trying to steal your hard work and originality :)

Update 10/20/14:  VMware announced last week that inter-VM TPS (memory page sharing between VMs, not to be confused with memory page sharing within a single VM) will no longer be enabled by default. This default ESXi configuration change will take place in December 2014.

VMware KB Article 2080735 explains Inter-Virtual Machine TPS will no longer be enabled by default starting with the following releases:

ESXi 5.5 Update release – Q1 2015
ESXi 5.1 Update release – Q4 2014
ESXi 5.0 Update release – Q1 2015
The next major version of ESXi

Administrators may revert to the previous behavior if they so wish.

and…

Prior to the above ESXi Update releases, VMware will release ESXi patches that introduce additional TPS management capabilities. These ESXi patches will not change the existing settings for inter-VM TPS. The planned ESXi patch releases are:

ESXi 5.5 Patch 3. For more information, see VMware ESXi 5.5, Patch ESXi550-201410401-BG: Updates esx-base (2087359).
ESXi 5.1 patch planned for Q4, 2014
ESXi 5.0 patch planned for Q4, 2014

The divergence is in response to new research which leveraged TPS to gain unauthorized access to data. Under certain circumstances, a data security breach may occur which effectively makes TPS across VMs a vulnerability.

Although VMware believes the risk of TPS being used to gather sensitive information is low, we strive to ensure that products ship with default settings that are as secure as possible.

Additional information, including the introduction of the Mem.ShareForceSalting host config option, is available in VMware KB Article 2091682 Additional Transparent Page Sharing management capabilities in ESXi 5.5 patch October 16, 2014 and ESXi 5.1 and 5.0 patches in Q4, 2014

As well as the VMware blog article  Transparent Page Sharing – additional management capabilities and new default settings