Last week I was made aware of an issue a customer in the field was having with a data protection strategy using array-based snapshots which were in turn leveraging VMware vSphere snapshots with VSS quiesce of Windows VMs. The problem began after installing VMware Tools version 10.0.0 build-3000743 (reported as version 10240 in the vSphere Web Client) which I believe is the version shipped in ESXI 6.0 Update 1b (reported as version 6.0.0, build 3380124 in the vSphere Web Client).
The issue is that creating a VMware virtual machine snapshot with VSS integration fails. The virtual machine disk configuration is simply two .vmdks on a VMFS-5 datastore but I doubt the symptoms are limited only to that configuration.
The failure message shown in the vSphere Web Client is “Cannot quiesce this virtual machine because VMware Tools is not currently available.” The vmware.log file for the virtual machine also shows the following:
2016-07-29T19:26:47.378Z| vmx| I120: SnapshotVMX_TakeSnapshot start: ‘jgb’, deviceState=0, lazy=0, logging=0, quiesced=1, forceNative=0, tryNative=1, saveAllocMaps=0 cb=1DE2F730, cbData=32603710
2016-07-29T19:26:47.407Z| vmx| I120: DISKLIB-LIB_CREATE : DiskLibCreateCreateParam: vmfsSparse grain size is set to 1 for ‘/vmfs/volumes/51af837d-784bc8bc-0f43-e0db550a0c26/rmvm02/rmvm02-000001.
2016-07-29T19:26:47.408Z| vmx| I120: DISKLIB-LIB_CREATE : DiskLibCreateCreateParam: vmfsSparse grain size is set to 1 for ‘/vmfs/volumes/51af837d-784bc8bc-0f43-e0db550a0c26/rmvm02/rmvm02_1-00000
2016-07-29T19:26:47.408Z| vmx| I120: SNAPSHOT: SnapshotPrepareTakeDoneCB: Prepare phase complete (The operation completed successfully).
2016-07-29T19:26:56.292Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:07.790Z| vcpu-0| I120: Tools: Tools heartbeat timeout.
2016-07-29T19:27:11.294Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:17.417Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-07-29T19:27:17.417Z| vmx| I120: Msg_Post: Warning
2016-07-29T19:27:17.417Z| vmx| I120: [msg.snapshot.quiesce.rpc_timeout] A timeout occurred while communicating with VMware Tools in the virtual machine.
2016-07-29T19:27:17.417Z| vmx| I120: —————————————-
2016-07-29T19:27:17.420Z| vmx| I120: Vigor_MessageRevoke: message ‘msg.snapshot.quiesce.rpc_timeout’ (seq 10949920) is revoked
2016-07-29T19:27:17.420Z| vmx| I120: ToolsBackup: changing quiesce state: IDLE -> DONE
2016-07-29T19:27:17.420Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Done with snapshot ‘jgb': 0
2016-07-29T19:27:17.420Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (31).
2016-07-29T19:27:17.420Z| vmx| I120: VigorTransport_ServerSendResponse opID=ffd663ae-5b7b-49f5-9f1c-f2135ced62c0-95-ngc-ea-d6-adfa seq=12848: Completed Snapshot request.
2016-07-29T19:27:26.297Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
Attempts to take a quiesced snapshot in a Windows Guest OS fails
Attempts to take a quiesced snapshot after booting a Windows Guest OS fails
After downloading and upgrading VMware Tools version 10.0.9 build-3917699 (reported as version 10249 in the vSphere Web Client), the customer’s problem was resolved. Since the faulty version of VMware Tools was embedded in the customer’s templates used to deploy virtual machines throughout the datacenter, there were a number of VMs needing their VMware Tools upgraded, as well as the templates themselves.