Posts Tagged ‘ESXi’

Configure ntpd to start with host via CLI

August 4th, 2020

Good afternoon. I hope you’re all staying safe and healthy in the midst of the COVID-19 pandemic.

I had someone reach out to me yesterday with a need to script NTP configuration on ESXi hosts. He had all of the NTP configuration working except enabling the ntpd daemon to start automatically with the host. That’s easy enough I said. I use the following PowerCLI script block to configure many of my vSphere hosts. Row three takes care of automatically starting the daemon.

Get-VMHost | Get-VMHostFirewallException | Where-Object {$_.Name -eq "NTP client"} |Set-VMHostFirewallException -Enabled:$true
Get-VMHost | Get-VmHostService | Where-Object {$_.key -eq "ntpd"} | Start-VMHostService
Get-VMhost | Get-VmHostService | Where-Object {$_.key -eq "ntpd"} | Set-VMHostService -policy "on"
Get-VMHost | Get-VmHostService | Where-Object {$_.key -eq "ntpd"}
Get-VMHost | Get-VMHostFirewallException | Where-Object {$_.Name -eq "NTP client"}

However, he didn’t want to use PowerCLI – he needed an ESXi command line method. He had scoured the internet finding no solution and actually turned up many discussions saying it can’t be done.

I dug deep into my ESX 3.5.0 build document, last updated 1/23/10 (just prior to my VCDX defense which occurred in early February 2010). ‘Try this”, I said:

chkconfig ntpd on

He responded that it didn’t work. He wants the UI to show the NTP Service startup policy was Start and stop with host. It was still grayed out showing Start and stop manually.

“Ok do this – restart hostd”:

/etc/init.d/hostd restart

That worked and was likely the missing step from many of the forum threads saying it this can’t be done.

vSphere with Kubernetes

May 17th, 2020

During the past couple of months, I’ve had the opportunity to participate in both the vSphere 7 and Project Pacific beta programs. While the vSphere 7 beta was fairly straightforward (no intent to downplay the incredible amount of work that went into one of VMware’s biggest and most anticipated releases in company history), Project Pacific bookends the start of my Kubernetes journey – something I’ve wanted to get moving on once the busy hockey season concluded (I just wrapped up my 4th season coaching at the Peewee level).

For myself, the learning process can be broken down into two distinct parts:

  1. Understanding the architecture and deploying Kubernetes on vSphere. Sidebar: Understanding the Tanzu portfolio (and the new names for VMware modern app products). To accomplish this In vSphere 7, we need to deploy NSX-T and then enable Workload Management in the UI. That simplification easily represents several hours of work when you consider planning and in my case, failing a few times. I’ve seen a few references made on how easy this process is. Perhaps if you already have a strong background in NSX-T. I found it challenging during the beta.
  2. Day 1 Kubernetes. The supervisor cluster is up and running (I think). Now how do I use it? YAML? Pods? What’s a persistent volume claim (PVC)? Do I now have a Tanzu Kubernetes Grid Cluster? No, not yet.

This blog post is going to focus mainly on part 1 – deployment of the Kubernetes platform on vSphere 7, learning the ropes, and some of the challenges I overcame to achieve a successful deployment.

During the Project Pacific beta, we had a wizard which deployed most of the NSX-T components. The NSX-T Manager, the Edges, the Tier-0 Gateway, the Segment, the uplinks, it was all handled by the wizard. I’m an old hand with vShield Manager and NSX Manager after that for vCloud Director, but NSX-T is a beast. If you don’t know your way around NSX-T yet, the wizard was a blessing because all we had to do was understand what was needed, and then supply the correlating information to the wizard. I think the wizard also helped drive the beta program to success within the targeted start and end dates (these are typical beta program constraints).

When vSphere 7 went GA, a few notable things had changed.

  1. Licensing. Kubernetes deployment on vSphere 7 requires VMware vSphere Enterprise Plus with Add-on for Kubernetes. Right now I believe the only path is through VMware Cloud Foundation (VCF) 4.0 licensing.
  2. Unofficially you can deploy Kubernetes on vSphere 7 without VCF. All of the bits needed already exist in vCenter, ESXi, and NSX-T 3.0. But as the Kubernetes features seem to be buried in the ESXi license key, it involves just a bit of trickery. More on that in a bit.
  3. Outside of VCF, there is no wizard based installation like we had in the Project Pacific beta. It’s a manual deployment and configuration of NSX-T. To be honest and from a learning perspective, this is a good thing. There’s no better way to learn than to crack open the books, read, and do.

So here’s VMware’s book to follow:

vSphere with Kubernetes Configuration and Management (PDF, online library).

It’s a good guide and should cover everything you need from planning to deployment of both NSX-T as well as Kubernetes. If you’re going to use it, be aware that it does tend to get updated so watch for those changes to stay current. To that point, I may make references to specific page numbers that could change over time.

I’ve made several mentions of NSX-T. If you haven’t figured it out by now, the solution is quite involved when it comes to networking. It’s important to understand the networking architecture and how that will overlay your own network as well as utilize existing infrastructure resources such as DNS, NTP, and internet access. When it comes to filling in the blanks for the various VLANs, subnets, IP addresses, and gateways, it’s important to provide the right information and configure correctly. Failure to do so will either end up in a failed deployment, or a deployment that from the surface appears successful but Kubernetes work later on fails miserably. Ask me how I know.

There are several network diagrams throughout VMware’s guide. You’ll find more browsing the internet. I borrowed this one from the UI.

They all look about the same. Don’t worry so much about the internal networking of the supervisor cluster or even the POD or Service CIDRs. For the most part these pieces are autonomous. The workload enablement wizard assigns these CIDR blocks automatically so that means if you leave them alone, you can’t possibly misconfigure them.

What is important can be boiled down to just three required VLANs. Mind you I’m talking solely about Kubernetes on vSphere in the lab here. For now, forget about production VCF deployments and the VLAN requirements it brings to the table (but do scroll down to the end for a link to a click through demo of Kubernetes with VCF).

Just three VLANs. It does sound simple but where some of the confusion may start is terminology – depending on the source, I’ve seen these VLANs referred to in different ways using different terms. I’ll try and simply as much as I can.

  1. ESXi host TEP VLAN – Just a private empty VLAN. Must route to Edge node TEP VLAN. Must support minimum 1600 MTU (jumbo frames) both intra VLAN as well as routing jumbo frames to the Edge node TEP VLAN. vmk10 is tied to this VLAN.
  2. Edge node TEP VLAN– Another private empty VLAN. Must route to ESXi host TEP VLAN. Must support minimum 1600 MTU (jumbo frames) both intra VLAN as well as routing jumbo frames to the ESXi host TEP VLAN. The Edge TEP is tied to this VLAN.

    A routed tunnel is established between the ESXi host tunnel endpoints on vmk10 (and vmk11 if you’re deploying with redundancy in mind) and each Edge node TEP interface. If jumbo frames aren’t making it unfragmented through this tunnel, you’re dead in the water.
  3. The third VLAN is what VMware calls the Tier 0 gateway and uplink for transport node on page 49 of their guide. I’ve seen this called the Overlay network. I’ve seen this called the Edge uplink network. The Project Pacific beta quickstart guide called it the Edge Logical Router uplink VLAN as well as the Workload Network VLAN. Later in the wizard it was simply referred to as the Uplink VLAN. Don’t ever confuse this with the TEP VLANs. In all diagrams it’s going to be the External Network or the network where the DevOps staff live. The Tier-0 gateway provides the north/south connectivity between the external network and the Kubernetes stack (which also includes a Tier-1 gateway). Another helpful correlation: The Egress and Ingress CIDRs live on this third VLAN. You’ll find out sooner or later that existing resources must exist on this external network just as DNS, NTP, and internet access.

All of the network diagrams I’ve seen, including the one above, distinguish between the external network and the management network. For the home labbers out there, these two will most often be the same network. In my initial deployment, I made the mistake of deploying Kubernetes with a management VLAN and a separate DevOps VLAN that had no route to the internet. Workload enablement was successful but I found out later that applying a simple YAML resulted in endless failed pods being created. This is because the ESXi host based image fetcher container runtime executive (CRX) had no route to the internet to access public repository images (a firewall blocking traffic can cause this as well). I was seeing errors such as the following in /var/log/spherelet.log on the vSphere host where the pod was placed:

Failed to resolve image: Http request failed. Code 400: ErrorType(2) failed to do request: Head https://registry-1.docker.io/v2/library/nginx/manifests/alpine: dial tcp 34.197.189.129:443: connect: network is unreachable
spherelet.log:time="2020-03-25T02:47:24.881025Z" level=info msg="testns1/nginx-3a1d01bf5d03a391d168f63f6a3005ff4d17ca65-v246: Start new image fetcher instance. Crx-cli cmd args [/bin/crx-cli ++group=host/vim/vmvisor/spherelet/imgfetcher run --with-opaque-network nsx.LogicalSwitch --opaque-network-id a2241f05-9229-4703-9815-363721499b59 --network-address 04:50:56:00:30:17 --external-id bfa5e9d2-8b9d-4b34-9945-5b7452ee76a0 --in-group host/vim/vmvisor/spherelet/imgfetcher imgfetcher]\n"

The NSX-T Manager and Edge nodes both have management interfaces that tie to the management network, but much like vCenter and ESXi management interfaces, these are for management only and are not in the data path nor are they a part of the Geneve tunnel. As such, the management work does not require jumbo frames.

Early on in the beta, I took some lumps trying to deploy Kubernetes on vSphere. These attempts were unsuccessful for a few reasons and 100% of the cause was networking problems.

First networking problem: My TEP VLANs were not routed. That was purely my fault for not understanding in full the networking requirements for the two TEP VLANs. Easy fix – I contacted my lab administrator and had him add two default gateways, one for each of the TEP VLANs. Problem solved.

Second networking problem: My TEP VLANs supported jumbo frames at Layer 2 (hosts on the same VLAN can successfully send and receive unfragmented jumbo frames all day), but did not support the routing of jumbo frames. (Using vmkping with the -d switch is very important in testing for jumbo frame success the command looks something like vmkping -I vmk10 <edge TEP IP> -S vxlan -s 1572 -d). In other words, when trying to send a jumbo frame from an ESXi host TEP to an Edge TEP on the other VLAN, standard MTU frames make it through, but jumbo frames are dropped at the physical switch interface which was performing the intra switch intervlan routing.

A problem with jumbo frames can manifest itself into somewhat of a misleading problem and resulting diagnosis. When a jumbo frames problem exists between the two TEP VLANs:

  • A workload enablement appears successful and healthy in the UI
  • The Control Plane Node IP Address is pingable
  • The individual supervisor cluster nodes are reachable on their respective IP addresses and accept kubectl API commands
  • The Harbor Image Registry is successfully deployed

But…

  • The Control Plane Node IP Address is not reachable over https in a web browser
  • The Harbor Image Registry is unreachable via web browser at its published IP address

These are symptoms of an underlying jumbo frames problem but they can be misidentified as a load balancer issue.

I spent some time on this because my lab administrator assured me jumbo frames were enabled on the physical switch. It took some more digging to find out intervlan routing of jumbo frames was a separate configuration on the switch. To be fair, I didn’t initially ask for this configuration (I didn’t know what I didn’t know at the time). Once that configuration was made on the switch, jumbo frames were making it to both ends of the tunnel was traversed VLANs. Problem solved.

Just one more note on testing for intervlan routing of jumbo frames. Although the switch may be properly configured and jumbo frames are making it through between VLANs, I have found that sending vmkping commands with jumbo frames to the switch interfaces themselves (this would be the default gateway for the VLAN) can be successful and it can also fail. I think it all depends on the switch make and model. Call it a red herring and try not to pay attention to it. What’s important is that the jumbo frames ultimately make it through to the opposite tunnel endpoint.

Third networking problem: The third critical VLAN mentioned above (call it the overlay, call it the Edge Uplink, call it the External Network, call it the DevOps network), is not well understood and is implemented incorrectly. There are few ways you can go wrong here.

  1. Use the wrong VLAN – in other words a VLAN which has no reachable network services such as DNS, NTP, or a gateway to the internet. You’ll be able to deploy the Kubernetes framework but the deployment of pods requiring access to a public image repository will fail. During deployment, failed pods will quickly stack up in the namespace.
  2. Using the correct VLAN but using Egress and Ingress CIDR blocks from some other VLAN. This won’t work and it is pretty well spelled out everywhere I’ve looked that the Egress and Ingress CIDR blocks need to be on the same VLAN which represents the External Network. Among other things, the Egress portion is used for outbound traffic through the Tier-0 Gateway to the external network. The image fetcher CRX is one such function which uses Egress. The Ingress portion is used for inbound traffic from the External Network through the Tier-0 Gateway. In fact, the first usable IP address in this block ends up being the Control Plane Node IP Address for the Supervisor Cluster where all the kubectl API commands come through. If you’ve just finished enabling workload management and your Control Plane Node IP Address is not on your external network, you’ve likely assigned the wrong Egress/Ingress CIDR addresses.

Fourth networking problem: The sole edge portgroup that I had created on the distributed switch needs to be in VLAN Trunking mode (1-4094), not VLAN with a specified tag. This one can be easy to miss and early in the beta I missed it. I followed the Project Pacific quickstart guide but don’t ever remember seeing the requirement. It is well documented now on page 55 of the VMware document I mentioned early on.

To summarize the common theme thus far, networking, understanding the requirements, mapping to your environment, and getting it right for a successful vSphere with Kubernetes deployment.

Once the beta had concluded and vSphere 7 was launched, I was anxious to deploy Kubernetes on GA bits. After deploying and configuring NSX-T in the lab, I ran into the licensing obstacle. During the Project Pacific beta, license keys were not an issue. The problem is when we try to enable Workload Management after NSX-T is ready and waiting. Without the proper licensing, I was greeted with This vCenter does not have the license to support Workload Management. You won’t see this in a newly stood up greenfield environment if you haven’t gotten a chance to license the infrastructure. Where you will see it is if you’ve already licensed your ESXi hosts with Enterprise Plus licenses for instance. Since Enterprise Plus licenses by themselves are not entitled to the Kubernetes feature, they will disable the Workload Management feature.

The temporary workaround I found is to simply remove the Enterprise Plus license keys and apply the Evaluation License to the hosts. Once you do this and refresh the Workload Management page, the padlock disappears and I was able to continue with the evaluation.

Unfortunately the ESXi host Evaluation License keys are only good for 60 days. As of this writing, vSphere 7 has not yet been GA for 60 days so anyone who stood up a vSphere 7 GA environment on day 1 still has a chance to evaluate vSphere with Kubernetes.

One other minor issue I’ve run into that I’ll mention has to do with NSX-T Compute Managers. A Compute Manager in NSX-T is a vCenter Server registration. You might be familiar with the process of registering a vCenter Server with other products such as storage array or data protection software. This really is no different.

However, a problem can present itself whereby a vCenter Server has been register to an NSX-T Manager previously, that NSX-T Manager is decommissioned (improperly), and then an attempt is made sometime later to register the vCenter Server with a newly commissioned NSX-T Manager. The issue itself is a little deceptive because at first glance that subsequent registration with a new NSX-T Manager appears successful – no errors are thrown in the UI and we continue our work setting up the fabric in NSX-T Manager.

What lurks in the shadows is that the registration wasn’t entirely successful. The Registration Status shows Not Registered and the Connection Status shows Down. It’s a simple fix really – not something ugly that you might expect in the CLI. Simply click on the Status link and you’re offered an opportunity to Select errors to resolve. I’ve been through this motion a few times and the resolution is quick and effortless. Within a few seconds the Registration Status is Registered and the Connection Status is Up.

Deploying Kubernetes on vSphere can be challenging, but in the end it is quite satisfying. It also ushers in Day 1 Kubernetes. Becoming a kubectl Padawan. Observing the rapid deployment and tear down of applications on native VMware integrated Kubernetes pods. Digging into the persistent storage aspects. Deploying a Tanzu Kubernetes Grid Cluster.

Day 2 Kubernetes is also a thing. Maintenance, optimization, housekeeping, continuous improvement, backup and restoration. Significant dependencies now tie into vSphere and NSX-T infrastructure. Keeping these components healthy and available will be more important than ever to maintain a productive and happy DevOps organization.

I would be remiss if I ended this before calling out a few fantastic resources. David Stamen’s blog series Deploying vSphere with Kubernetes provides a no nonsense walk through highlighting all of the essential steps from configuring NSX-T to enabling workload management. He wraps up the series with demo applications and a Tanzu Kubernetes Grid Cluster deployment.

It should be no surprise that William Lam’s name comes up here as well. William has done some incredible work in the areas of automated vSphere deployments for lab environments. In his Deploying a minimal vSphere with Kubernetes environment article, he shows us how we can deploy Kubernetes on a two or even one node vSphere cluster (this is unsupported of course – a minimum of three vSphere hosts is required as of this writing). This is a great tip for those who want to get their hands on vSphere with Kubernetes but have a limited number of vSphere hosts in their cluster to work with. I did run into one caveat with the two node cluster in my own lab – I was unable to deploy a Tanzu Kubernetes Grid Cluster. After deployment and power on of the the TKG control plane VM, it waits indefinitely to deploy the three worker VMs. I believe the TKG cluster is looking for three supervisor control plane VMs. Nonetheless, I was able to deploy applications on native pods and it demonstrates that William’s efforts enable the community at large to do more with less in their home or work lab environments. If you find his work useful, make a point to reach out and thank him.

How to Validate MTU in an NSX-T Environment – This is a beautifully written chapter. Round up the usual vmkping suspects (Captain Louis Renault, Casablanca). You’ll find them all here. The author also utilizes esxcli (general purpose ESXi CLI), esxcfg-vmknic (vmkernel NIC CLI), nsxdp-cli (NSX datapath), and edge node CLI for diagnostics.

NSX-T Command Line Reference Guide – I stumbled onto this guide covering nsxcli. Although I didn’t use it for getting the Kubernetes lab off the ground, it looks very interesting and useful for NSX-T Datacenter so I’m bookmarking it for later. What’s interesting to note here is nsxcli CLI is run from the ESXi host via installed kernel module, as well as from the NSX-T Manager and the edge nodes.

What is Dell Technologies PowerStore – Part 13, Integrating with VMware Tanzu Kubernetes Grid – Itzik walks through VMware Tanzu Kubernetes Grid with vSphere 7 and Tanzu Kubernetes clusters. He walks through the creation of a cluster with Dell EMC PowerStore and vVols.

Lastly, here’s a cool click-through demo of a Kubernetes deployment on VCF 4.0 – vSphere with Kubernetes on Cloud Foundation (credit William Lam for sharing this link).

With that, I introduce a new blog tag: Kubernetes

Peace out.

VMware Tools causes virtual machine snapshot with quiesce error

July 30th, 2016

Last week I was made aware of an issue a customer in the field was having with a data protection strategy using array-based snapshots which were in turn leveraging VMware vSphere snapshots with VSS quiesce of Windows VMs. The problem began after installing VMware Tools version 10.0.0 build-3000743 (reported as version 10240 in the vSphere Web Client) which I believe is the version shipped in ESXI 6.0 Update 1b (reported as version 6.0.0, build 3380124 in the vSphere Web Client).

The issue is that creating a VMware virtual machine snapshot with VSS integration fails. The virtual machine disk configuration is simply two .vmdks on a VMFS-5 datastore but I doubt the symptoms are limited only to that configuration.

The failure message shown in the vSphere Web Client is “Cannot quiesce this virtual machine because VMware Tools is not currently available.”  The vmware.log file for the virtual machine also shows the following:

2016-07-29T19:26:47.378Z| vmx| I120: SnapshotVMX_TakeSnapshot start: ‘jgb’, deviceState=0, lazy=0, logging=0, quiesced=1, forceNative=0, tryNative=1, saveAllocMaps=0 cb=1DE2F730, cbData=32603710
2016-07-29T19:26:47.407Z| vmx| I120: DISKLIB-LIB_CREATE : DiskLibCreateCreateParam: vmfsSparse grain size is set to 1 for ‘/vmfs/volumes/51af837d-784bc8bc-0f43-e0db550a0c26/rmvm02/rmvm02-000001.
2016-07-29T19:26:47.408Z| vmx| I120: DISKLIB-LIB_CREATE : DiskLibCreateCreateParam: vmfsSparse grain size is set to 1 for ‘/vmfs/volumes/51af837d-784bc8bc-0f43-e0db550a0c26/rmvm02/rmvm02_1-00000
2016-07-29T19:26:47.408Z| vmx| I120: SNAPSHOT: SnapshotPrepareTakeDoneCB: Prepare phase complete (The operation completed successfully).
2016-07-29T19:26:56.292Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:07.790Z| vcpu-0| I120: Tools: Tools heartbeat timeout.
2016-07-29T19:27:11.294Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:17.417Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:17.417Z| vmx| I120: Msg_Post: Warning
2016-07-29T19:27:17.417Z| vmx| I120: [msg.snapshot.quiesce.rpc_timeout] A timeout occurred while communicating with VMware Tools in the virtual machine.
2016-07-29T19:27:17.417Z| vmx| I120: —————————————-
2016-07-29T19:27:17.420Z| vmx| I120: Vigor_MessageRevoke: message ‘msg.snapshot.quiesce.rpc_timeout’ (seq 10949920) is revoked
2016-07-29T19:27:17.420Z| vmx| I120: ToolsBackup: changing quiesce state: IDLE -> DONE
2016-07-29T19:27:17.420Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Done with snapshot ‘jgb’: 0
2016-07-29T19:27:17.420Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (31).
2016-07-29T19:27:17.420Z| vmx| I120: VigorTransport_ServerSendResponse opID=ffd663ae-5b7b-49f5-9f1c-f2135ced62c0-95-ngc-ea-d6-adfa seq=12848: Completed Snapshot request.
2016-07-29T19:27:26.297Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.

After performing some digging, I found VMware had released VMware Tools version 10.0.9 on June 6, 2016. The release notes identify the root cause has been identified and resolved.

Resolved Issues

Attempts to take a quiesced snapshot in a Windows Guest OS fails
Attempts to take a quiesced snapshot after booting a Windows Guest OS fails

After downloading and upgrading VMware Tools version 10.0.9 build-3917699 (reported as version 10249 in the vSphere Web Client), the customer’s problem was resolved. Since the faulty version of VMware Tools was embedded in the customer’s templates used to deploy virtual machines throughout the datacenter, there were a number of VMs needing their VMware Tools upgraded, as well as the templates themselves.

vSphere Consulting Opportunity in Twin Cities

December 14th, 2013

If you know me well, you know the area I call home.  If you’re a local friend, acquaintance, or member any of the three Minnesota VMware User Groups, then I have an opportunity that has crossed my desk which you or someone you know may be interested in.

A local business here in the Twin Cities has purchased vSphere and EMC VNXe storage infrastructure and is looking for a Consulting Engineer to deploy the infrastructure per an existing design.

Details:

  • Install and configure VMware vSphere 5.1 on two hosts
  • Install and configure VMware vCenter
  • Install and configure VMware Update Manager
  • Configure vSphere networking
  • Configure EMC VNXe storage per final design.

It’s a great opportunity to help a locally owned business deploy a vSphere infrastructure and I would think this would be in the wheelhouse of 2,000+ people I’ve met while running the Minneapolis VMware User Group.  As much as I’d love to knock this out myself, I’m a Dell Storage employee and as such I’m removing myself as a candidate for the role.  The best way I can help is to get the word out into the community.

If you’re interested, email me with your contact information and I’ll get you connected to the Director.

Happy Holidays!

vSphere 5.5 UNMAP Deep Dive

September 13th, 2013

One of the features that has been updated in vSphere 5.5 is UNMAP which is one of two sub-components of what I’ll call the fourth block storage based thin provisioning VAAI primitive (the other sub-component is thin provisioning stun).  I’ve already written about UNMAP a few times in the past.  It was first introduced in vSphere 5.0 two years ago.  A few months later the feature was essentially recalled by VMware.  After it was re-released by VMware in 5.0 Update 1, I wrote about its use here and followed up with a short piece about the .vmfsBalloon file here.

For those unfamiliar, UNMAP is a space reclamation mechanism used to return blocks of storage back to the array after data which was once occupying those blocks has been moved or deleted.  The common use cases are deleting a VM from a datastore, Storage vMotion of a VM from a datastore, or consolidating/closing vSphere snapshots on a datastore.  All of these operations, in the end, involve deleting data from pinned blocks/pages on a volume.  Without UNMAP, these pages, albeit empty and available for future use by vSphere and its guests only, remain pinned to the volume/LUN backing the vSphere datastore.  The pages are never returned back to the array for use with another LUN or another storage host.  Notice I did not mention shrinking a virtual disk or a datastore – neither of those operations are supported by VMware.  I also did not mention the use case of deleting data from inside a virtual machine – while that is not supported, I believe there is a VMware fling for experimental use.  In summary, UNMAP extends the usefulness of thin provisioning at the array level by maintaining storage efficiency throughout the life cycle of the vSphere environment and the array which supports the UNMAP VAAI primitive.

On the Tuesday during VMworld, Cormac Hogan launched his blog post introducing new and updated storage related features in vSphere 5.5.  One of those features he summarized was UNMAP.  If you haven’t read his blog, I’d definitely recommend taking a look – particularly if you’re involved with vSphere storage.  I’m going to explore UNMAP in a little more detail.

The most obvious change to point out is the command line itself used to initiate the UNMAP process.  In previous versions of vSphere, the command issued on the vSphere host was:

vmkfstools -y x (where x represent the % of storage to unmap)

As Cormac points out, UNMAP has been moved to esxcli namespace in vSphere 5.5 (think remote scripting opportunities after XYZ process) where the basic command syntax is now:

esxcli storage vmfs unmap

In addition to the above, there are also three switches available for use; of first two listed below, one is required, and the third is optional.

-l|–volume-label=<str> The label of the VMFS volume to unmap the free blocks.

-u|–volume-uuid=<str> The uuid of the VMFS volume to unmap the free blocks.

-n|–reclaim-unit=<long> Number of VMFS blocks that should be unmapped per iteration.

Previously with vmkfstools, we’d change to VMFS folder in which we were going to UNMAP blocks from.  In vSphere 5.5, the esxcli command can be run from anywhere so specifying the the datastore name or the uuid is one of the required parameters for obvious reasons.  So using the datastore name, the new UNMAP command in vSphere 5.5 is going to look like this:

esxcli storage vmfs unmap -l 1tb_55ds

As for the optional parameter, the UNMAP command is an iterative process which continues through numerous cycles until complete.  The reclaim unit parameter specifies the quantity of blocks to unmap per each iteration of the UNMAP process.  In previous versions of vSphere, VMFS-3 datastores could have block sizes of 1, 2, 4, or 8MB.  While upgrading a VMFS-3 datastore to VMFS-5 will maintain these block sizes, executing an UNMAP operation on a native net-new VMFS-5 datastore results in working with a 1MB block size only.  Therefore, if a reclaim unit value of 100 is specified on a VMFS-5 datastore with a 1MB block size, then 100MB data will be returned to the available raw storage pool per iteration until all blocks marked available for UNAMP are returned.  Using a value of 100, the UNMAP command looks like this:

esxcli storage vmfs unmap -l 1tb_55ds -n 100

If the reclaim unit value is unspecified when issuing the UNMAP command, the default reclaim unit value is 200, resulting in 200MB of data returned to the available raw storage pool per iteration assuming a 1MB block size datastore.

One additional piece to to note on the CLI topic is that in a release candidate build I was working with, while the old vmkfstools -y command is deprecated, it appears to still exist but with newer vSphere 5.5 functionality published in the –help section:

vmkfstools vmfsPath -y –reclaimBlocks vmfsPath [–reclaimBlocksUnit #blocks]

The next change involves the hidden temporary balloon file (refer to my link at the top if you’d like more information about the balloon file but basically it’s a mechanism used to guarantee blocks targeted for UNMAP are not in the interim written to by an outside I/O request until the UNMAP process is complete).  It is no longer named .vmfsBalloon.  The new name is .asyncUnmapFile as shown below.

/vmfs/volumes/5232dd00-0882a1e4-e918-0025b3abd8e0 # ls -l -h -A
total 998408
-r——–    1 root     root      200.0M Sep 13 10:48 .asyncUnmapFile
-r——–    1 root     root        5.2M Sep 13 09:38 .fbb.sf
-r——–    1 root     root      254.7M Sep 13 09:38 .fdc.sf
-r——–    1 root     root        1.1M Sep 13 09:38 .pb2.sf
-r——–    1 root     root      256.0M Sep 13 09:38 .pbc.sf
-r——–    1 root     root      250.6M Sep 13 09:38 .sbc.sf
drwx——    1 root     root         280 Sep 13 09:38 .sdd.sf
drwx——    1 root     root         420 Sep 13 09:42 .vSphere-HA
-r——–    1 root     root        4.0M Sep 13 09:38 .vh.sf
/vmfs/volumes/5232dd00-0882a1e4-e918-0025b3abd8e0 #

As discussed in the previous section, use of the UNMAP command now specifies the the actual size of the temporary file instead of the temporary file size being determined by a percentage of space to return to the raw storage pool.  This is an improvement in part because it helps avoid the catastrophe if UNMAP tried to remove 2TB+ in a single operation (discussed here).

VMware has also enhanced the functionality of the temporary file.  A new kernel interface in ESXi 5.5 allows the user to ask for blocks beyond a a specified block address in the VMFS file system.  This ensures that the blocks allocated to the temporary file were never allocated to the temporary file previously.  The benefit realized in the end is that any size temporary file can be created and with UNMAP issued to the blocks allocated to the temporary file, we can rest assured that we can issue UNMAP on all free blocks on the datastore.

Going a bit deeper and adding to the efficiency, VMware has also enhanced UNMAP to support multiple block descriptors.  Compared to vSphere 5.1 which issued just one block descriptor per UNMAP command, vSphere 5.5 now issues up to 100 block descriptors depending on the storage array (these identifying capabilities are specified internally in the Block Limits VPD (B0) page).

A look at the asynchronous and iterative vSphere 5.5 UNMAP logical process:

  1. User or script issues esxcli UNMAP command
  2. Does the array support VAAI UNMAP?  yes=3, no=end
  3. Create .asyncUnmapFile on root of datastore
  4. .asyncUnmapFile created and locked? yes=5, no=end
  5. Issue 10CTL to allocate reclaim-unit blocks of storage on the volume past the previously allocated block offset
  6. Did the previous block allocation succeed? yes=7, no=remove lock file and retry step 6
  7. Issue UNMAP on all blocks allocated above in step 5
  8. Remove the lock file
  9. Did we reach the end of the datastore? yes=end, no=3

From a performance perspective, executing the UNMAP command in my vSphere 5.5 RC lab showed peak write I/O of around 1,200MB/s with an average of around 200IOPS comprised of a 50/50 mix of read/write.  The UNMAP I/O pattern is a bit hard to gauge because with the asynchronous iterative process, it seemed to do a bunch of work, rest, do more work, rest, and so on.  Sorry no screenshots because flickr.com is currently down.  Perhaps the most notable takeaway from the performance section is that as of vSphere 5.5, VMware is lifting the recommendation of only running UNMAP during a maintenance window.  Keep in mind this is just a recommendation.  I encourage vSphere 5.5 customers to test UNMAP in their lab first using various reclaim unit sizes.  While do this, examine performance impacts to the storage fabric, the storage array (look at both front end and back end), as well as other applications sharing the array.  Remember that fundamentally the UNMAP command is only going to provide a benefit AFTER its associated use cases have occurred (mentioned at the top of the article).  Running UNMAP on a volume which has no pages to be returned will be a waste of effort.  Once you’ve become comfortable with using UNMAP and understanding its impacts in your environment, consider running it on a recurring schedule – perhaps weekly.  It really depends on how much the use cases apply to your environment.  Many vSphere backup solutions leverage vSphere snapshots which is one of the use cases.  Although it could be said there are large gains to be made with UNMAP in this case, keep in mind backups run regularly and and space that is returned to raw storage with UNMAP will likely be consumed again in the following backup cycle where vSphere snapshots are created once again.

To wrap this up, customers who have block arrays supporting the thin provision VAAI primitive will be able to use UNMAP in vSphere 5.5 environments (for storage vendors, both sub-components are required to certify for the primitive as a whole on the HCL).  This includes Dell Compellent customers with current version of Storage Center firmware.  Customers who use array based snapshots with extended retention periods should keep in mind that while UNMAP will work against active blocks, it may not work with blocks maintained in a snapshot.  This is to honor the snapshot based data protection retention.

Veeam Launches Backup & Replication v7

August 22nd, 2013

Data protection, data replication, and data recovery are challenging.  Consolidation through virtualization has forced customers to retool automated protection and recovery methodologies in the datacenter and at remote DR sites.

For VMware environments, Veeam has been with customers helping them every step of the way with their flagship Backup & Replication suite.  Once just a simple backup tool, it has evolved into an end to end solution for local agentless backup and restore with application item intelligence as well as a robust architecture to fulfill the requirements of replicating data offsite and providing business continuation while meeting aggressive RPO and RTO metrics.  Recent updates have also bridged the gap for Hyper-V customers, rounding out the majority of x86 virtualized datacenters.

But don’t take their word for it.  Talk to one of their 200,000+ customers – for instance myself.  I’ve been using Veeam in the boche.net lab for well over five years to achieve nightly backups of not only my ongoing virtualization projects, but my growing family’s photos, videos, and sensitive data as well.  I also tested, purchased, and implemented in a previous position to facilitate the migration of virtual machines from one large datacenter to another via replication.  In December of 2009, I was also successful in submitting a VCDX design to VMware incorporating Veeam Backup & Replication, and followed up in Feburary 2010 successfully defending that design.

Veeam is proud to announce another major milestone bolstering their new Modern Data Protection campaign – version 7 of Veeam Backup & Replication.  In this new release, extensive R&D yields 10x faster performance as well as many new features such as built-in WAN acceleration, backup from storage snapshots, long requested support for tape, and a solid data protection solution for vCloud Director.  Value was added for Hyper-V environments as well – SureBackup automated verification support, Universal Application Item Recovery, as well as the on-demand Sandbox.  Aside from the vCD support, one of the new features I’m interested in looking at is parallel processing of virtual machine backups.  It’s a fact that with globalized business, backup windows have shrunk while data footprints have grown exponentially.  Parallel VM and virtual disk backup, refined compression algorithms, and 64-bit backup repository architecture will go a long way to meet global business challenges.

v7 available now.  Check it out!

This will likely be my last post until VMworld.  I’m looking forward to seeing everyone there!

The .vmfsBalloon File

July 1st, 2013

One year ago, I wrote a piece about thin provisioning and the role that the UNMAP VAAI primitive plays in thin provisioned storage environments.  Here’s an excerpt from that article:

When the manual UNMAP process is run, it balloons up a temporary hidden file at the root of the datastore which the UNMAP is being run against.  You won’t see this balloon file with the vSphere Client’s Datastore Browser as it is hidden.  You can catch it quickly while UNMAP is running by issuing the ls -l -a command against the datastore directory.  The file will be named .vmfsBalloonalong with a generated suffix.  This file will quickly grow to the size of data being unmapped (this is actually noted when the UNMAP command is run and evident in the screenshot above).  Once the UNMAP is completed, the .vmfsBalloon file is removed.

Has your curiosity ever got you wondering about the technical purpose of the .vmfsBalloon file?  It boils down to data integrity and timing.  At the time the UNMAP command is run, the balloon file is immediately instantiated and grows to occupy (read: hog) all of the blocks that are about to be unmapped.  It does this so that during the unmap process, none of the blocks are allocated during the process of new file creation elsewhere.  If you think about it, it makes sense – we just told vSphere to give these blocks back to the array.  If during the interim one or more of these blocks were suddenly allocated for a new file or file growth purposes, then we purge the block, we have a data integrity issue.  More accurately, newly created data will be missing as its block or blocks were just flushed back to the storage pool on the array.