Posts Tagged ‘Windows’

Site Recovery Manager Firewall Rules for Windows Server

April 29th, 2020

I have a hunch Google sent you here. Before we get to what you’re looking for, I’m going to digress a little. tl;dr folks feel free to jump straight to the frown emoji below for what you’re looking for.

Since the Industrial Revolution, VMware has supported Microsoft Windows and SQL Server platforms to back datacenter and cloud infrastructure products such as vCenter Server, Site Recovery Manager, vCloud Director (rebranded recently to VMware Cloud Director), and so on. However, if you’ve been paying attention to product documentation and compatibility guides, you will have noticed support for Microsoft platforms diminishing in favor of easy to deploy appliances based on Photon OS and VMware Postgres (vPostgres). This is a good thing – spoken by a salty IT veteran with a strong Windows background.

2019 is where we really hit a brick wall. vCenter Server 6.7 is the last version that supports installation on Windows and that ended on Windows Server 2016 – there was never support for Windows Server 2019 (reference VMware KB 2091273 – Supported host operating systems for VMware vCenter Server installation). In vSphere 7.0, vCenter Server for Windows has been removed and support is not available. For more information, see Farewell, vCenter Server for Windows. Likewise, Microsoft SQL Server 2016 was the last version to support vCenter Server (matrix reference).

Site Recovery Manager (SRM) is in the same boat. It was born and bred on Winodws and SQL Server back ends. But once again we find a Photon OS-based appliance with embedded vPostres available along with product documentation which highlights diminishing support for Microsoft Windows and SQL.

Taking a closer look at the documentation…

Compatibility Matrices for VMware Site Recovery Manager 8.2

Compatibility Matrices for VMware Site Recovery Manager 8.3

  • “Site Recovery Manager Server 8.3 supports the same Windows host operating systems that vCenter Server 7.0 supports.” SRM 8.3 supports vCenter Server 6.7 as well so that should been included here also but was left out, probably an oversight.
  • Supported host operating systems for VMware vCenter Server installation (2091273)
  • Takeaway: vCenter Server 7 cannot be installed on Windows. This implies SRM 8.3 supports no version of Windows Server for installation (this implication is not at all correct as SRM 8.3 ships as a Windows executable installation for vSphere 6.x environments). Not a great spot to be in since the Photon OS-based SRM appliance employs a completely different Storage Replication Adapter (SRA) than the Windows installation and not all storage vendors support both (yet).

Ignoring the labyrinth of supported product and platform compatibility matrices above, one may choose to forge ahead and install SRM on Windows Server 2019 anyway. I’ve done it several times in the lab but there was a noted takeaway.

When I logged into the vSphere Client, the SRM plug-in was not visible. In my travels, there’s a few reasons why this symptom can occur.

  • The SRM services are not started.
  • The logged on user account is not a member of the SRM Administrators group (yes even super users like administrator@vsphere.local will need to be added to this group for SRM management).
  • The Windows Firewall is blocking ports used to present the plug-in.

Wait, what? The Windows Firewall wasn’t typically a problem in the past. That is correct. The SRM installation does create four inbound Windows Firewall rules (none outbound) on Windows Server up through 2016. However, for whatever reason, the SRM installation does not create these needed firewall rules on Windows Server 2019. The lack of firewall rules allowing SRM related traffic will block the plug-in. Reference Network Ports for Site Recovery Manager.

One obvious workaround here would be to disable the Windows Firewall but what fun would that be? Also this may violate IT security plans, trigger an audit, require exception filings. Been there, done that, ish. Let’s dig a little deeper.

The four inbound Windows Firewall rules ultimately wind up in the Windows registry. A registry export of the four rules actually created by an SRM installation is shown below. Through trial and error I’ve found that importing the rules into the Windows registry with a .reg file results in broken rules so for now I would not recommend that method.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SharedAccess\Parameters\FirewallPolicy\FirewallRules]
"{B37EDE84-6AC1-4F7D-9E42-FA44B8D928D0}"="v2.26|Action=Allow|Active=TRUE|Dir=In|App=C:\\Program Files\\VMware\\VMware vCenter Site Recovery Manager\\bin\\vmware-dr.exe|Name=VMware vCenter Site Recovery Manager|Desc=This feature allows connections using VMware vCenter Site Recovery Manager.|"
"{F6EAE3B7-C58F-4582-816B-C3610411215B}"="v2.26|Action=Allow|Active=TRUE|Dir=In|Protocol=6|LPort=9086|Name=VMware vCenter Site Recovery Manager - HTTPS Listener Port|"
"{F6E7BE93-C469-4CE6-80C4-7069626636B0}"="v2.26|Action=Allow|Active=TRUE|Dir=In|App=C:\\Program Files\\VMware\\VMware vCenter Site Recovery Manager\\external\\commons-daemon\\prunsrv.exe|Name=VMware vCenter Site Recovery Manager Client|Desc=This feature allows connections using VMware vCenter Site Recovery Manager Client.|"
"{66BF278D-5EF4-4E5F-BD9E-58E88719FA8E}"="v2.26|Action=Allow|Active=TRUE|Dir=In|Protocol=6|LPort=443|Name=VMware vCenter Site Recovery Manager Client - HTTPS Listener Port|"

The four rules needed can be created by hand in the Windows Firewall UI, configured centrally via Group Policy Object (GPO), or scripted with netsh or PowerShell. I’ve chosen PowerShell and created the script below for the purpose of adding the rules. Pay close attention in that two of these rules are application path specific. Change the drive letter and path to the applications as necessary or the two rules won’t work properly.

# Run this PowerShell script directly on a Windows based VMware Site
# Recovery Manager server to add four inbound Windows firewall rules
# needed for SRM functionality.
# Jason Boche
# http://boche.net
# 4/29/20

New-NetFirewallRule -DisplayName "VMware vCenter Site Recovery Manager" -Description "This feature allows connections using VMware vCenter Site Recovery Manager." -Direction Inbound -Program "C:\Program Files\VMware\VMware vCenter Site Recovery Manager\bin\vmware-dr.exe" -Action Allow

New-NetFirewallRule -DisplayName "VMware vCenter Site Recovery Manager - HTTPS Listener Port" -Direction Inbound -LocalPort 9086 -Protocol TCP -Action Allow

New-NetFirewallRule -DisplayName "VMware vCenter Site Recovery Manager Client" -Description "This feature allows connections using VMware vCenter Site Recovery Manager Client." -Direction Inbound -Program "C:\Program Files\VMware\VMware vCenter Site Recovery Manager\external\commons-daemon\prunsrv.exe" -Action Allow

New-NetFirewallRule -DisplayName "VMware vCenter Site Recovery Manager Client - HTTPS Listener Port" -Direction Inbound -LocalPort 443 -Protocol TCP -Action Allow

Test execution was a success.

PS S:\PowerShell scripts> .\srmaddwindowsfirewallrules.ps1


Name                  : {10ba5bb3-6503-44f8-aad3-2f0253c980a6}
DisplayName           : VMware vCenter Site Recovery Manager
Description           : This feature allows connections using VMware vCenter Site Recovery Manager.
DisplayGroup          :
Group                 :
Enabled               : True
Profile               : Any
Platform              : {}
Direction             : Inbound
Action                : Allow
EdgeTraversalPolicy   : Block
LooseSourceMapping    : False
LocalOnlyMapping      : False
Owner                 :
PrimaryStatus         : OK
Status                : The rule was parsed successfully from the store. (65536)
EnforcementStatus     : NotApplicable
PolicyStoreSource     : PersistentStore
PolicyStoreSourceType : Local

Name                  : {ea88dee8-8c96-4218-a23d-8523e114d2a9}
DisplayName           : VMware vCenter Site Recovery Manager - HTTPS Listener Port
Description           :
DisplayGroup          :
Group                 :
Enabled               : True
Profile               : Any
Platform              : {}
Direction             : Inbound
Action                : Allow
EdgeTraversalPolicy   : Block
LooseSourceMapping    : False
LocalOnlyMapping      : False
Owner                 :
PrimaryStatus         : OK
Status                : The rule was parsed successfully from the store. (65536)
EnforcementStatus     : NotApplicable
PolicyStoreSource     : PersistentStore
PolicyStoreSourceType : Local

Name                  : {a707a4b8-b0fd-4138-9ffa-2117c51e8ed4}
DisplayName           : VMware vCenter Site Recovery Manager Client
Description           : This feature allows connections using VMware vCenter Site Recovery Manager Client.
DisplayGroup          :
Group                 :
Enabled               : True
Profile               : Any
Platform              : {}
Direction             : Inbound
Action                : Allow
EdgeTraversalPolicy   : Block
LooseSourceMapping    : False
LocalOnlyMapping      : False
Owner                 :
PrimaryStatus         : OK
Status                : The rule was parsed successfully from the store. (65536)
EnforcementStatus     : NotApplicable
PolicyStoreSource     : PersistentStore
PolicyStoreSourceType : Local

Name                  : {346ece5b-01a9-4a82-9598-9dfab8cbfcda}
DisplayName           : VMware vCenter Site Recovery Manager Client - HTTPS Listener Port
Description           :
DisplayGroup          :
Group                 :
Enabled               : True
Profile               : Any
Platform              : {}
Direction             : Inbound
Action                : Allow
EdgeTraversalPolicy   : Block
LooseSourceMapping    : False
LocalOnlyMapping      : False
Owner                 :
PrimaryStatus         : OK
Status                : The rule was parsed successfully from the store. (65536)
EnforcementStatus     : NotApplicable
PolicyStoreSource     : PersistentStore
PolicyStoreSourceType : Local



PS S:\PowerShell scripts>

After logging out of the vSphere Client and logging back in, the Site Recovery plug-in loads and is available.

Feel free to use this script but be advised, as with anything from this site, it comes without warranty. Practice due diligence. Test in a lab first. Etc.

With virtual appliance being fairly mainstream at this point, this article probably won’t age well but someone may end up here. Maybe me. It has happened before.

vSphere 6.7 Storage and action_OnRetryErrors=on

February 8th, 2019

VMware introduced a new storage feature in vSphere 6.0 which was designed as a flexible option to better handle certain storage problems. Cormac Hogan did a fine job introducing the feature here. Starting with vSphere 6.0 and continuing on in vSphere 6.5, each block storage device (VMFS or RDM) is configured with an option called action_OnRetryErrors. Note that in vSphere 6.0 and 6.5, the default value is off meaning the new feature is effectively disabled and there is no new storage error handling behavior observed.

This value can be seen with the esxcli storage nmp device list command.

vSphere 6.0/6.5:
esxcli storage nmp device list | grep -A9 naa.6000d3100002b90000000000000ec1e1
naa.6000d3100002b90000000000000ec1e1
Device Display Name: sqldemo1vmfs
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=off; {TPG_id=61459,TPG_state=AO}{TPG_id=61460,TPG_state=AO}{TPG_id=61462,TPG_state=AO}{TPG_id=61461,TPG_state=AO}}
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0; lastPathIndex=0: NumIOsPending=0,numBytesPending=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba1:C0:T2:L141, vmhba1:C0:T3:L141, vmhba2:C0:T3:L141, vmhba2:C0:T2:L141
Is USB: false

If vSphere loses access to a device on a given path, the host will send a Test Unit Ready (TUR) command down the given path to check path state. When action_OnRetryErrors=off, vSphere will continue to retry for an amount of time because it expects the path to recover. It is important to note here that a path is not immediately marked dead when the first Test Unit Ready command is unsuccessful and results in a retry. It would seem many retries in fact and you’ll be able to see them in /var/log/vmkernel.log. Also note that a device typically has multiple paths and the process will be repeated for each additional path tried assuming the first path is eventually marked as dead.

Starting with vSphere 6.7, action_OnRetryErrors is enabled by default.

vSphere 6.7:
esxcli storage nmp device list | grep -A9 naa.6000d3100002b90000000000000ec1e1
naa.6000d3100002b90000000000000ec1e1
Device Display Name: sqldemo1vmfs
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=61459,TPG_state=AO}{TPG_id=61460,TPG_state=AO}{TPG_id=61462,TPG_state=AO}{TPG_id=61461,TPG_state=AO}}
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0; lastPathIndex=2: NumIOsPending=0,numBytesPending=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba1:C0:T2:L141, vmhba1:C0:T3:L141, vmhba2:C0:T3:L141, vmhba2:C0:T2:L141
Is USB: false

If vSphere loses access to a device on a given path, the host will send a Test Unit Ready (TUR) command down the given path to check path state. When action_OnRetryErrors=on, vSphere will immediately mark the path dead when the first retry is returned. vSphere will not continue the retry TUR commands on a dead path.

This is the part where VMware thinks it’s doing the right thing by immediately fast failing a misbehaving/dodgy/flaky path. The assumption here is that other good paths to the device are available and instead of delaying an application while waiting for path failover during the intensive TUR retry process, let’s fail this one bad path right away so that the application doesn’t have to spin its wheels.

However, if all other paths to the device are impacted by the same underlying (and let’s call it transient) condition, what happens is that each additional path iteratively goes through the process of TUR, no retry, immediately mark path as dead, move on to the next path. When all available paths have been exhausted, All Paths Down (APD) for the device kicks in. If and when paths to an APD device become available again, they will be picked back up upon the next storage fabric rescan, whether that’s done manually by an administrator, or automatically by default every 300 seconds for each vSphere host (Disk.PathEvalTime). From an application/end user standpoint, I/O delay for up to 5 minutes can be a painfully long time to wait. The irony here is that VMware can potentially turn a transient condition lasting only a few seconds into a more of a Permanent Device Loss like condition.

All of the above leads me to a support escalation I got involved in with a customer having an Active/Passive block storage array. Active/Passive is a type of array which has multiple storage processors/controllers (usually two) and LUNs are distributed across the controllers in an ownership model whereby each controller owns both the LUNs and the available paths to those LUNs. When an active controller fails or is taken offline proactively (think storage processor reboot due to a firmware upgrade), the paths to the active controller go dark, the passive controller takes ownership of the LUNs and lights up the paths to them – a process which can be measured in seconds, typically more than 2 or 3, often much more than that (this dovetails into the discussion of virtual machine disk timeout best practices). With action_OnRetryErrors=off, vSphere tolerates the transient path outage during the controller failover. With action_OnRetryErrors=on, it doesn’t – each path that goes dark is immediately failed and we have APD for all the volumes on that controller in a fraction of a second.

The problem which was occurring in this customer escalation was a convergence of circumstances:

  • The customer was using vSphere 6.7 and its defaults
    action_OnRetryErrors=on
  • The customer was using an Active/Passive storage array
  • The customer virtualized Microsoft Windows SQL cluster servers (cluster disk resources are extremely sensitive to APDs in the hypervisor and immediately fail when it detects a dependent cluster disk has been removed – a symptom introduced by APD)
  • The customer was testing controller failovers
Windows failover clusters have zero tolerance for APD disk

To resolve the problem in vSphere 6.7, action_OnRetryErrors needs to be disabled for each device backed by the Active/Passive storage array. This must be performed on every host in the cluster having access to the given devices (again, these can be VMFS volumes and/or RDMs). There are a few ways to go about this.

To modify the configuration without a host reboot, take a look at the following example. A command such as this would need to be run on every host in the cluster, and for each device (ie. in an 8 host cluster with 8 VMFS/RDMs, we need to identify the applicable naa.xxx IDs and run 64 commands. Yes, this could be scripted. Be my guest.):

esxcli storage nmp satp generic deviceconfig set -c disable_action_OnRetryErrors -d naa.6000d3100002b90000000000000ec1e1

I don’t prefer that method a whole lot. It’s tedious and error prone. It could result in cluster inconsistencies. But on the plus side, a host reboot isn’t required, and this setting will persist across reboots. That also means a configuration set at this device level will override any claim rules that could also apply to this device. Keep this in mind if a claim rule is configured but you’re not seeing the desired configuration on any specific device.

The above could also be scripted for a number of devices on a host. Here’s one example. Be very careful that the base naa.xxx string matches all of the devices from one array that should be configured, and does not modify devices from other array types that should not be configured. Also note that this script is a one liner but for blog formatting purposes I manually added a line break starting with esxcli.:

for i in `ls /vmfs/devices/disks | grep -v ":" | grep -i naa.6000D31`; do echo $i; 
esxcli storage nmp satp generic deviceconfig set -c disable_action_OnRetryErrors -d $i; done

Now to verify:

for i in `ls /vmfs/devices/disks | grep -v ":" | grep -i naa.6000D31`; do echo $i; 
esxcli storage nmp device list | grep -A2 $i | egrep -io action_OnRetryErrors=\\w+; done

I like adding a SATP claim rule using a vendor device string a lot better, although changes to claim rules for existing devices generally requires a reboot of the host to reclaim existing devices with the new configuration. Here’s an example:

esxcli storage nmp satp rule add -s VMW_SATP_ALUA -V COMPELNT -P VMW_PSP_RR -o disable_action_OnRetryErrors

Here’s another example using quotes which is also acceptable and necessary when setting multiple option string parameters (refer to this):

esxcli storage nmp satp rule add -s “VMW_SATP_ALUA” -V “COMPELNT” -P “VMW_PSP_RR” -o “disable_action_OnRetryErrors”

When a new claim rule is added, claim rules can be reloaded with the following command.

esxcli storage core claimrule load

Keep in mind the new claim rule will only apply to unclaimed devices. Newly presented devices will inherit the new claim rule. Existing devices which are already claimed will not until the next vSphere host reboot. Devices can be unclaimed without a host reboot but all I/O to the device must be halted – somewhat of a conundrum if we’re dealing with production volumes, datastores being used for heartbeating, etc. Assuming we’re dealing with multiple devices, a reboot is just going to be easier and cleaner.

I like claim rules here better because of the global nature. It’s one command line per host in the cluster and it’ll take care of all devices from the Active/Passive storage array vendor. No need to worry about coming up with and testing a script. No need to worry about spending hours identifying the naa.xxx IDs and making all of the changes across hosts. No need to worry about tagging other storage vendor devices with an improper configuration. Lastly, the claim rule in effect is visible in a SATP claim rule list (sincere apologies for the formatting – it’s bad I know):

esxcli storage nmp satp rule list

Name Device Vendor Model Driver Transport Options Rule Group Claim Options Default PSP PSP Options Description
——————- —— ——– —————- —— ——— —————————- ———- ———————————– ———– ———– ———————————————–
VMW_SATP_ALUA COMPELNT disable_action_OnRetryErrors user VMW_PSP_RR

By the way… to remove the SATP claim rules above respectively:

esxcli storage nmp satp rule remove -s VMW_SATP_ALUA -V COMPELNT -P VMW_PSP_RR -o disable_action_OnRetryErrors

esxcli storage nmp satp rule remove -s “VMW_SATP_ALUA” -V “COMPELNT” -P “VMW_PSP_RR” -o “disable_action_OnRetryErrors”

The bottom line here is there may be a number of VMware customers with Active/Passive storage arrays, running vSphere 6.7. If and when planned or unplanned controller/storage processor failover occurs, APDs may unexpectedly occur, impacting virtual machines and their applications, whereas this was not the case with previous versions of vSphere.

In closing, I want to thank VMware Staff Technical Support Engineering for their work on this case and ultimately exposing “what changed in vSphere 6.7” because we had spent some time trying to reproduce this problem on vSphere 6.5 where we had an environment similar to what the customer had and we just weren’t seeing any problems.

References:

Managing SATPs

No Failover for Storage Path When TUR Command Is Unsuccessful

Storage path does not fail over when TUR command repeatedly returns retry requests (2106770)

Handling Transient APD Conditions

VSPHERE 6.0 STORAGE FEATURES PART 6: ACTION_ONRETRYERRORS

Updated 2-20-19: VMware published a KB article on this issue today:
ESXi 6.7 hosts with active/passive or ALUA based storage devices may see premature APD events during storage controller fail-over scenarios (67006)

VMware Horizon Share Folders Issue with Windows 10

June 12th, 2017

I spent some time the last few weekends making various updates and changes to the lab. Too numerous and not all that paramount to go into detail here, with the exception of one issue I did run into. I created a new VMware Horizon pool consisting of Windows 10 Enterprise, Version 1703 (Creators Update). The VM has 4GB RAM and VMware Horizon Agent 7.1.0.5170901 is installed. This is all key information contributing to my new problem which is the Shared Folders feature seems to have stopped functioning.

That is to say, when launching my virtual desktop from the Horizon Client, there are no shared folders or drives being passed through from where I launched the Horizon Client. Furthermore, the Share Folders menu item is completely missing from the blue Horizon Client pulldown menu.

I threw something out on Twitter and received a quick response from a very helpful VMware Developer by the name of Adam Gross (@grossag).

Adam went on to explain that the issue stems from a registry value defining an amount of memory which is less that the amount of RAM configured in the VM.

The registry key is HKLM\SYSTEM\CurrentControlSet\Control\ and the value configured for SvcHostSplitThresholdInKB is 3670016 (380000 Hex). The 3670016 is expressed in KB which comes out to be 3.5GB. The default Windows 10 VM configuration is deployed with 4GB of RAM which is what I did this past weekend. Since 3.5GB is less than 4GB, the bug rears its head.

Adam mentioned the upcoming 7.2 agent will configure this value at 32GB on Windows 10 virtual machines (that’s 33554432 or 2000000 in Hex) and perhaps even larger in the 7.2 version or some future release of the agent because the reality some day is that 32GB won’t be large enough. Adam went on to explain the maximum amount of RAM supported by Windows 10 x64 is 2TB which comes out to be 2147483648 expressed in KB or 80000000 in Hex. Therefore, it is guaranteed safe (at least to avoid this issue) to set the registry value to 80000001 (in Hex) or higher for any vRAM configuration.

To move on, the value needs to be tweaked manually in the registry. I’ll set mine to 32GB as I’ll likely never have a VDI desktop deployed between now and when the 7.2 agent ships and is installed in my lab.

And the result for posterity.

I found a reboot of the Windows 10 VM was required before the registry change made the positive impact I was looking for. After all was said and done, my shared folders came back as did the menu item from the pulldown on the blue Horizon Client pulldown menu. Easy fix for a rather obscure issue. Once again my thanks to Adam Gross for providing the solution.

VMware Tools causes virtual machine snapshot with quiesce error

July 30th, 2016

Last week I was made aware of an issue a customer in the field was having with a data protection strategy using array-based snapshots which were in turn leveraging VMware vSphere snapshots with VSS quiesce of Windows VMs. The problem began after installing VMware Tools version 10.0.0 build-3000743 (reported as version 10240 in the vSphere Web Client) which I believe is the version shipped in ESXI 6.0 Update 1b (reported as version 6.0.0, build 3380124 in the vSphere Web Client).

The issue is that creating a VMware virtual machine snapshot with VSS integration fails. The virtual machine disk configuration is simply two .vmdks on a VMFS-5 datastore but I doubt the symptoms are limited only to that configuration.

The failure message shown in the vSphere Web Client is “Cannot quiesce this virtual machine because VMware Tools is not currently available.”  The vmware.log file for the virtual machine also shows the following:

2016-07-29T19:26:47.378Z| vmx| I120: SnapshotVMX_TakeSnapshot start: ‘jgb’, deviceState=0, lazy=0, logging=0, quiesced=1, forceNative=0, tryNative=1, saveAllocMaps=0 cb=1DE2F730, cbData=32603710
2016-07-29T19:26:47.407Z| vmx| I120: DISKLIB-LIB_CREATE : DiskLibCreateCreateParam: vmfsSparse grain size is set to 1 for ‘/vmfs/volumes/51af837d-784bc8bc-0f43-e0db550a0c26/rmvm02/rmvm02-000001.
2016-07-29T19:26:47.408Z| vmx| I120: DISKLIB-LIB_CREATE : DiskLibCreateCreateParam: vmfsSparse grain size is set to 1 for ‘/vmfs/volumes/51af837d-784bc8bc-0f43-e0db550a0c26/rmvm02/rmvm02_1-00000
2016-07-29T19:26:47.408Z| vmx| I120: SNAPSHOT: SnapshotPrepareTakeDoneCB: Prepare phase complete (The operation completed successfully).
2016-07-29T19:26:56.292Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:07.790Z| vcpu-0| I120: Tools: Tools heartbeat timeout.
2016-07-29T19:27:11.294Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:17.417Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:17.417Z| vmx| I120: Msg_Post: Warning
2016-07-29T19:27:17.417Z| vmx| I120: [msg.snapshot.quiesce.rpc_timeout] A timeout occurred while communicating with VMware Tools in the virtual machine.
2016-07-29T19:27:17.417Z| vmx| I120: —————————————-
2016-07-29T19:27:17.420Z| vmx| I120: Vigor_MessageRevoke: message ‘msg.snapshot.quiesce.rpc_timeout’ (seq 10949920) is revoked
2016-07-29T19:27:17.420Z| vmx| I120: ToolsBackup: changing quiesce state: IDLE -> DONE
2016-07-29T19:27:17.420Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Done with snapshot ‘jgb’: 0
2016-07-29T19:27:17.420Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (31).
2016-07-29T19:27:17.420Z| vmx| I120: VigorTransport_ServerSendResponse opID=ffd663ae-5b7b-49f5-9f1c-f2135ced62c0-95-ngc-ea-d6-adfa seq=12848: Completed Snapshot request.
2016-07-29T19:27:26.297Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.

After performing some digging, I found VMware had released VMware Tools version 10.0.9 on June 6, 2016. The release notes identify the root cause has been identified and resolved.

Resolved Issues

Attempts to take a quiesced snapshot in a Windows Guest OS fails
Attempts to take a quiesced snapshot after booting a Windows Guest OS fails

After downloading and upgrading VMware Tools version 10.0.9 build-3917699 (reported as version 10249 in the vSphere Web Client), the customer’s problem was resolved. Since the faulty version of VMware Tools was embedded in the customer’s templates used to deploy virtual machines throughout the datacenter, there were a number of VMs needing their VMware Tools upgraded, as well as the templates themselves.

Dell Compellent Storage Center Command Set Shell cmdlets

January 9th, 2015

If you manage Dell Compellent storage, you may or may not be aware that Windows PowerShell cmdlets are available to ease management pain by way of automation and consistency. While I am able to recognize when scripting is the right tool for the job, I do not author PowerShell scripts on a regular basis. For that reason, I’m not as deeply familiar with all of the cmdlets available within the Dell Compellent Storage Center Command Set Shell as I would like to be.

So how do I get started – what are the cmdlets? There are a few different ways to retrieve a list of cmdlets made available by a PowerShell snapin or module.

VMware vSphere PowerCLI simplifies the process by providing a cmdlet called Get-VICommand. When executed, it returns a list of all the cmdlets provided by the VMware.VimAutomation.Core snapin used to manage a vSphere environment via PowerShell. As of this writing in the 5.5.x generation of vSphere, there are a few other vSphere specific snapins installed with PowerCLI but the cmdlets provided by those aren’t returned by Get-VICommand. Those snapins are:

  • VMware.VimAutomation.Vds – This Windows PowerShell snap-in contains cmdlets that let you manage vSphere Distributed Switches.
  • VMware.VimAutomation.License – This Windows Powershell snap-in contains cmdlets for managing License components.
  • VMware.DeployAutomation – Cmdlets for Rule-Based-Deployment
  • VMware.ImageBuilder – This Windows PowerShell snap-in contains VMware ESXi Image Builder cmdlets used to generate custom images.
  • VMware.VimAutomation.Cloud – This Windows Powershell snap-in contains cmdlets used to manage VMware vCloud Director.

However, not all PowerShell snapins ship with a native shortcut to retrieve a list of their respective cmdlets. In these cases, use Get-Command. Now Get-Command by itself returns cmdlets for all snapins. For a snapin specific list, either of the following will work:

Get-Command -Module “snapin_name”
Get-Command | Where-Object{$_.PSSnapin.Name -eq “snapin_name”}

In the case of Dell Compellent Storage Center Command Set Shell, the snapin is named Compellent.StorageCenter.PSSnapin. To retrieve a list of Dell Compellent cmdlets, use one of the following:

Get-Command -Module “Compellent.StorageCenter.PSSnapin”
Get-Command | Where-Object{$_.PSSnapin.Name -eq “Compellent.StorageCenter.PSSnapin”}

At the time of this writing, there are 105 cmdlets:

Get-Command -Module Compellent.StorageCenter.PSSnapin | Measure-Object

Count    : 105
Average  :
Sum      :
Maximum  :
Minimum  :
Property :

Those who don’t use PowerShell on a regular basis may find the above difficult to easily recall from memory. I had a discussion with Justin Braun (author of The Braun Blog – check out his Dell Compellent articles here) and Mike Matthews (a peer in my office who specialize in Microsoft SQL Server, PowerShell, and is an all around good guy). Is there an easier and persistent method to retrieve cmdlets from a given snapin? What resulted was a function that can be added to a PowerShell profile which performs just like VMware’s Get-VICommand (I’ll be original and call this one Get-SCCommand to get the list of Storage Center cmdlets).

Edit the PowerShell profile (%profile). It’s default location is:

%USERPROFILE%\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1

If the path and profile doesn’t already exist, it can be created in PowerShell using the following cmdlet:

new-item -itemtype file -path $profile -force

If using PowerShell ISE, the default profile location is:

%USERPROFILE%\Documents\WindowsPowerShell\Microsoft.PowerShellISE_profile.ps1

Add the following to verify the Dell Compellent snapin is loaded. If not, load it.

If ( !( Get-PSSnapin | Where-Object { $_.Name -eq “Compellent.StorageCenter.PSSnapin” } ) )
{
Add-PSSnapin Compellent.StorageCenter.PSSnapin | Out-Null
}

Add the Get-SCCommand shortcut function:

Function Get-SCCommand { Get-Command -Module “Compellent.StorageCenter.PSSnapin” }

Save the profile.

Now open any PowerShell environment and use Get-SCCommand which shows a list of 105 Dell Compellent cmdlets (There are 49 additional cmdlets in the compellent.replaymanager.scripting snapin for Replay Manager):

It works with PowerShell ISE as well when the Microsoft.PowerShellISE_profile.ps1 profile is modified:

How about PowerGUI? Yes…

Of course the shortcut function provided in the example above is specific to the Dell Compellent snapin but it should work for for any PowerShell snapin including the list of VMware snapins not included in Get-VICommand discussed at the top of the article.

For more on scripting Storage Center, visit the Dell Storage PowerShell Community. Rick Gouin also has a nice collection of scripts that he has authored.

Have a great weekend!

vCloud Director, RHEL 6.3, and Windows Server 2012 NFS

July 16th, 2013

One of the new features introduced in vCloud Director 5.1.2 is cell server support on the RHEL 6 Update 3 platform (you should also know that cell server support on RHEL 5 Update 7 was silently removed in the recent past – verify the version of RHEL in your environment using cat /etc/issue).  When migrating your cell server(s) to RHEL 6.3, particularly from 5.x, you may run into a few issues.

First is the lack of the libXdmcp package (required for vCD installation) which was once included by default in RHEL 5 versions.  You can verify this at the RHEL 6 CLI with the following command line:

yum search libXdmcp

or

yum list |grep libXdmcp

Not to worry, the package is easily installable by inserting/mounting the RHEL 6 DVD or .iso, copying the appropriate libXdmcp file to /tmp/ and running either of the following commands:

yum install /tmp/libXdmcp-1.0.3-1.el6.x86_64.rpm

or

rpm -i /tmp/libXdmcp-1.0.3-1.el6.x86_64.rpm

Update 6/22/18: It is really not necessary to point to a package file location or a specific version (this overly complicates the task) when a YUM repository is created. Also… RHEL7 Infrastructure Server base environment excludes the following packages required by vCloud Director 9.1 for Service Providers:

  • libICE
  • libSM
  • libXdmcp
  • libXext
  • libXi
  • libXt
  • libXtst
  • redhat-lsb

If the YUM DVD repository has been created and the RHEL DVD is mounted, install the required packages with the following one liner:

yum install -y libICE libSM libXdmcp libXext libXi libXt libXtst redhat-lsb

Next up is the use of Windows Server 2012 (or Windows 8) as NFS for vCloud Transfer Server Storage in conjunction with the newly supported RHEL 6.3.  Creating the path and directory for the Transfer Server Storage is performed during the initial deployment of vCloud Director using the command mkdir -p /opt/vmware/vcloud-director/data/transfer. When mounting the NFS export for Transfer Server Storage (either manually or via /etc/fstab f.q.d.n:/vcdtransfer/opt/vmware/vcloud-director/data/transfer nfs rw 0 0 ), the mount command fails with error message mount.nfs: mount system call failed. I ran across this in one particular environment and my search turned up Red Hat Bugzilla – Bug 796352.  In the bug documentation, the problem is identified as follows:

On Red Hat Enterprise Linux 6, mounting an NFS export from a Windows 2012 server failed due to the fact that the Windows server contains support for the minor version 1 (v4.1) of the NFS version 4 protocol only, along with support for versions 2 and 3. The lack of the minor version 0 (v4.0) support caused Red Hat Enterprise Linux 6 clients to fail instead of rolling back to version 3 as expected. This update fixes this bug and mounting an NFS export works as expected.

Further down in the article, Steve Dickson outlines the workarounds:

mount -o v3 # to use v3

or

Set the ‘Nfsvers=3’ variable in the “[ Server “Server_Name” ]”
section of the /etc/nfsmount.conf file
An Example will be:
[ Server “nfsserver.lab.local” ]
Nfsvers=3

The first option works well at the command line but doesn’t lend itself to /etc/fstab syntax so I opted for the second option which is to establish a host name and NFS version in the /etc/nfsmount.conf file.  With this method, the mount is attempted as called for in /etc/fstab and by reading /etc/nfsmount.conf, the mount operation succeeds as desired instead of failing at negotiation.

There is a third option which would be to avoid the use of /etc/fstab and /etc/nfsmount altogether and instead establish a mount -o v3 command in /etc/rc.local which is executed at the end of each RHEL boot process.  Although this may work, it feels a little sloppy in my opinion.

Lastly, one could install the kernel update (Red Hat reports as being fixed in kernel-2.6.32-280.el6). The kernel package update is located here.

Update 5/27/18: See also http://www.boche.net/blog/2012/07/03/creating-vcloud-director-transfer-server-storage-on-nfs/ for other new requirements when trying to mount NFS exports with RHEL 7.5.

QuickPrep and Sysprep

May 2nd, 2013

Those who manage VMware View currently or have used it in the past may be familiar with desktop customization which is required to provide a unique identity on the network for each View Composer VDI session in a pool.  If you’ve got a pretty good Microsoft background, you’re probably already familiar with Sysprep – Microsoft’s tool for customizing Windows server and desktop OS deployments.  VMware View Administrators have an alternative tool which can be used for desktop customization called QuickPrep.  For all intents and purposes, QuickPrep was designed to accomplish many of the same tasks Sysprep did, but the obvious advantage QuickPrep has is that the code and development belongs to VMware and as a result can be tightly integrated with products in VMware’s portfolio.

I was on a call this morning with VMware Senior Technical Trainer Linus Bourque (Twitter: @LinusBourque Blog: http://communities.vmware.com/blogs/lbourque Cigars: yes) when he pulled up a table slide which was the result of VMware KB Article 2003797 Differences between QuickPrep and Sysprep.  For those who are curious about the similarities and differences between the two (like me), look no further.

From the KB Article:

QuickPrep is a VMware system tool executed by View Composer during a linked-clone desktop deployment. QuickPrep personalizes each desktop created from the Master Image. Microsoft Sysprep is a tool to deploy the configured operating system installation from a base image. The desktop can then be customized based on an answer script. Sysprep can modify a larger number of configurable parameters than QuickPrep.
During the initial startup of each new desktop, QuickPrep:
  • Creates a new computer account in Active Directory for each desktop.
  • Gives the linked-clone desktop a new name.
  • Joins the desktop to the appropriate domain.
  • Optionally, mounts a new volume that contains the user profile information.
This table lists the main differences between QuickPrep and Sysprep:
Function QuickPrep Sysprep
Removing local accounts No Yes
Changing Security Identifiers (SID) No Yes
Removing parent from domain No Yes
Changing computer name Yes Yes
Joining the new instance to the domain Yes Yes
Generating new SID No Yes
Language, regional settings, date, and time customization No Yes
Number of reboots 0 1 (seal & mini-setup)
Requires configuration file and Sysprep No Yes
Note: A Guest Customization script is required in vCenter Server to use Sysprep. Sysprep is bundled in with Windows 7. For Windows XP, an appropriate Sysprep program needs to be installed on the vCenter Server.
For information on installing Sysprep tools, see Sysprep file locations and versions (1005593).
For more information on the use of Sysprep and the Guest Customisation wizard, see the Customizing Guest Operating Systems and Installing the Microsoft Sysprep Tools sections of the vSphere Virtual Machine Administration Guide.