VMware View 5.0 VDI vHardware 8 vMotion Error

September 20th, 2011 by jason Leave a reply »

General awareness/heads up blog post here on something I stumbled on with VMware View 5.0.  A few weeks ago while working with View 5.0 BETA in the lab, I ran into an issue where a Windows 7 virtual machine would not vMotion from one ESXi 5.0 host to another.  The resulting error in the vSphere Client was:

A general system error occurred: Failed to flush checkpoint data

I did a little searching and found similar symptoms in VMware KB 1011971 which speaks to an issue that can arise  when Video RAM (VRAM) is greater than 30MB for a virtual machine. In my case it was greater than 30MB but I could not adjust it due to the fact that it was being managed by the View Connection Server.  At the same time, a VMware source on Twitter volunteered his assistance and quickly came up with some inside information on the issue.  He had me try adding the following line to /etc/vmware/config on the ESXi 5.0 hosts (no reboot required):

migrate.baseCptCacheSize = “16777216”

The fix worked and I was able to vMotion the Windows 7 VM back and forth between hosts.  The information was taken back to Engineering for a KB to be released.  That KB is now available: VMware KB 2005741 vMotion of a virtual machine fails with the error: A general system error occurred: Failed to flush checkpoint data! The new KB article lists the following background information and several workarounds:

Cause

Due to new features with Hardware Version 8 for the WDDM driver, the vMotion display graphics memory requirement has increased. The default pre-allocated buffer may be too small for certain virtual machines with higher resolutions. The buffer size is not automatically increased to account for the requirements of those new features if mks.enable3d is set to FALSE (the default).

Resolution

To work around this issue, perform one of these options:

  • Change the resolution to a single screen of 1280×1024 or smaller before the vMotion.
  • Do not upgrade to Virtual Machine Hardware version 8.
  • Increase the base checkpoint cache size. Doubling it from its default 8MB to 16MB (16777216 byte) should be enough for every single display resolution. If you are using two displays at 1600×1200 each, increase the setting to 20MB (20971520 byte).To increase thebase checkpoint cache size:

    1. Power off the virtual machine.
    2. Click the virtual machine in the Inventory.
    3. On the Summary tab for that virtual machine, click Edit Settings.
    4. In the virtual machine Properties dialog box, click the Options tab.
    5. Under Advanced, select General and click Configuration Parameters.
    6. Click Add Row.
    7. In the new row, add migrate. baseCptCacheSize to the name column and add 16777216 to the value column.
    8. Click OK to save the change.

    Note: If you don’t want to power off your virtual machine to change the resolution, you can also add the parameter to the /etc/vmware/config file on the target host. This adds the option to every VMX process that is spawning on this host, which happens when vMotion is starting a virtual machine on the server.

  • Set mks.enable3d = TRUE for the virtual machine:
    1. Power off the virtual machine.
    2. Click the virtual machine in the Inventory.
    3. On the Summary tab for that virtual machine, click Edit Settings.
    4. In the virtual machine Properties dialog box, click the Options tab.
    5. Under Advanced, select General and click Configuration Parameters.
    6. Click Add Row.
    7. In the new row, add mks.enable3d to the name column and add True to the value column.
    8. Click OK to save the change.
Caution: This workaround increases the overhead memory reservation by 256MB. As such, it may have a negative impact on HA Clusters with strict Admission Control. However, this memory is only used if the 3d application is active. If, for example, Aero Basic and not Aero Glass is used as a window theme, most of the reservation is not used and the memory could be kept available for the ESX host. The reservation still affects HA Admission Control if large multi-monitor setups are used for the virtual machine and if the CPU is older than a Nehalem processor and does not have the SSE 4.1 instruction set. In this case, using 3d is not recommended. The maximum recommended resolution for using 3d, regardless of CPU type and SSE 4.1 support, is 1920×1200 with dual screens.

The permanent fix for this issue did not make it into the recent View 5.0 GA release but I expect it will be included in a future release or patch.

Update 12/23/11: VMware released five (5) non-critical patches last week.  One of those patches is ESXi500-201112401-SG which permanently resolves the issues described above.  Full patch details below:

Summaries and Symptoms

This patch updates the esx-base VIB to resolve the following issues:

  • Updates the glibc third party library to resolve multiple security issues.
    The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the names CVE-2010-0296, CVE-2011-0536, CVE-2011-1071, CVE-2011-1095, CVE-2011-1658 and CVE-2011-1659 to these issues.
  • When a hot spare disk that is added to a RAID group is accessed before the disk instance finishes initialization or if the disk is removed while an instance of it is being accessed, a race condition might occur causing the vSphere Client to not display information about the RAID controllers and the vSphere Client user interface might also not respond for a very long time.
  • vMotion fails with the A general system error occurred: Failed to flush checkpoint data!error message when:
    • The resolution of the virtual machines is higher than 1280×1024, or smaller if you are using a second screen
    • The guest operating system is using the WDDM driver (Windows 7, Windows 2008 R2, Windows 2008, Windows Vista)
    • The virtual machine is using Virtual Machine Hardware version 8.
  • Creating host profiles of ESX i 5.0 hosts might fail when the host profile creation process is unable to resolve the hostname and IP address of the host by relying on the DNS for hostname and IP address lookup. An error message similar to the following is displayed:
    Call"HostProfileManager.CreateProfile" for object "HostProfileManager" on vCenter Server"<Server_Name> failed.
    Error extracting indication configuation: [Errno- 2] Name or service not known.
  • In vSphere 5.0, Thin Provisioning is enabled by default on devices that adhere to T10 standards. On such thin provisioned LUNs, vSphere issues SCSI UNMAP commands to help the storage arrays reclaim unused space. Sending UNMAP commands might cause performance issues with operations such as snapshot consolidation or storage vMotion.
    This patch resolves the issue by disabling the space reclamation feature, by default.
  • If a user subscribes for an ESXi Server’s CIM indications from more that one client (for example, c1 and c2) and deletes the subscription from the first client (c1), the other clients (C2) might fail to receive any indication notification from the host.

This patch also provides you with the option of configuring the iSCSI initiator login timeout value for software iSCSI and dependent iSCSI adapters.
For example, to set the login timeout value to 10 seconds you can use commands similar to the following:

  • ~ # vmkiscsi-tool -W -a "login_timeout=10" vmhba37
  • ~ # esxcli iscsi adapter param set -A vmhba37 -k LoginTimeout -v 10

The default login timeout value is 5 seconds and the maximum value that you can set is 60 seconds.
We recommend that you change the login timeout value only if suggested by the storage vendor.

Advertisement