Posts Tagged ‘ESXi’

ESXi 4.0 Update 2 hosts may PSOD after vCenter Server is upgraded to 5.0

October 11th, 2011

This important notice just came across my radar:

VMware has become aware of some ESXi 4.0 hosts experiencing a purple screen after vCenter Server is upgraded to 5.0

The Knowledgebase Team has prepared KB article: ESXi 4.0 Update 2 hosts may experience a purple screen after vCenter Server is upgraded to 5.0 (2007269) and an alert has been placed on the Support page to alert customers of this issue.

This Knowledge Base article will be updated if new information becomes available (you can subscribe to rss feeds on individual KB articles for this purpose). If you have been affected by this, please read the KB.

We apologize for any inconvenience this may have caused you. If you know how to spread the word to your friends and colleagues, please do so.

Symptoms

You may encounter an issue where:

  • You have recently upgraded your vCenter Server to version 5.0
  • You have ESXi 4.0 Update 2 hosts in the inventory of this vCenter Server
  • After the vpxa agents are upgraded, the ESXi 4.0 Update 2 hosts experience a purple screen that includes this error:NOT_IMPLEMENTED bora/vmkernel/filesystems/visorfs/visorfsObj.c:3391

Cause

This is caused by an issue in the handling of the vpxa agent upgrade.

Resolution

This issue has been resolved in 4.0 Update 3. To avoid this issue, upgrade all ESXi 4.0 Update 2 hosts to at least version ESXi 4.0 Update 3 before upgrading vCenter Server to 5.0.
ESXi 4.0 Update 3 and later versions can be downloaded from the VMware Download Center.

VMTurbo Q4 Release Assures Quality of Service for Apps in the Cloud

October 4th, 2011

Press Release:

VMTurbo Q4 Release Assures Quality of Service for Apps in the Cloud

Streamlines management of large, dynamically changing, shared environments

 

Snagit CaptureWaltham, MA, October 4, 2011 — VMTurbo, the leading provider of intelligent workload management software, today announced new features in its fourth quarter appliance release.  The new version enables enterprises and cloud service providers to guarantee application quality of service (QoS) and to improve policy control in dynamically changing shared infrastructures.

To manage virtualized workloads in large environments, infrastructure operations teams can now group and prioritize applications in order to assure QoS for mission critical workloads.  New business dashboards provide visibility across all layers of the IT stack, and recommend resolutions of the issues impacting application performance in the shared infrastructure.

The new VMTurbo release also adds features to help hosting and cloud providers better manage and support customers in shared, multi-tenant environments:

  • More customizable workload placement policies support a broad range of service level agreements, security policies and compliance requirements;
  • Flexible management of roles and permissions allows assigning a specific scope to a user account;
  • Rebranding support enables cloud service providers to transparently private-label VMTurbo as part of their overall IT offering.

“Our new release is a major step forward for enterprises and cloud service providers looking to guarantee application performance for their business units and customers,” said N. Louis Shipley, President and CEO of VMTurbo. “As more workloads get virtualized, guaranteeing application performance and guaranteeing application quality of service becomes critical.”

Pricing and Availability

For a free, 30-day VMTurbo trial download, go to: http://www.vmturbo.com/vmturbo-management-suite-download/

For more information on and screenshots of the Q4 2011VMTurbo appliance release, read the VMTurbo blog at: http://www.vmturbo.com/blog/

About VMTurbo

VMTurbo delivers an Intelligent Workload Management solution for Cloud and virtualized environments.  VMTurbo uses an economic scheduling engine to dynamically adjust resource allocation to meet business goals.  Using VMTurbo our customers ensure that applications get the resources they need to operate reliably, while utilizing infrastructure and human resources in the most efficient way.

StarWind Software Inc. Announces Opening of German Office

October 4th, 2011

Press Release:

StarWind Software Inc. Announces Opening of German Office

StarWind Software Inc. Opens a New Office in Germany to Drive Local Channel Growth

Snagit CaptureBurlington, MA – October 1, 2011StarWind Software Inc., a global leader and pioneer in SAN software for building iSCSI storage servers, announced today that it has opened a new office in Sankt Augustin, Germany to service the growing demand for StarWind’s iSCSI SAN solutions. The German office expands StarWind’s ability to offer local sales and support services to its fast growing base of customers and prospects in the region.

“We have seen substantial growth in our customer base and level of interest in our solutions in Europe,” said Artem Berman, Chief Executive Officer of StarWind Software. “Since the market potential for our products is significant, we have opened a new office in Germany to strengthen our presence there. We shall use our best efforts to complete the localization of resources.”

“Our local presence in Germany will help us to work closely with our partners and customers, to better meet their needs as well as sweepingly develop their distribution networks,” said Roman Shovkun, Chief Sales Officer of StarWind Software. “The new office permits us to deliver superior sales, support to our customers, and to the growing prospect base in the region.”

The new office is located at:
Monikastr. 13
53757 Sankt Augustin
Germany
Primary contact: Veronica Schmidberger
+49-171-5109103

About StarWind Software Inc.
StarWind Software is a global leader in storage management and SAN software for small and midsize companies. StarWind’s flagship product is SAN software that turns any industry-standard Windows Server into a fault-tolerant, fail-safe iSCSI SAN. StarWind iSCSI SAN is qualified for use with VMware, Hyper-V, XenServer and Linux and Unix environments. StarWind Software focuses on providing small and midsize companies with affordable, highly availability storage technology which previously was only available in high-end storage hardware. Advanced enterprise-class features in StarWind include Automated Storage Node Failover and Failback, Replication across a WAN, CDP and Snapshots, Thin Provisioning and Virtual Tape management.

StarWind is a pioneer, since 2003, in the iSCSI SAN software industry and is the solution of choice for over 30,000 customers worldwide in more than 100 countries and from small and midsize companies to governments and Fortune 1000 companies.

SRM 5.0 Replication Bits and Bytes

October 3rd, 2011

VMware has pushed out several releases and features in the past several weeks.  It can be a lot to digest, particularly if you’ve been involved in the beta programs for these new products because there were some changes made when the bits made their GA debut. One of those new products is SRM 5.0.  I’ve been working a lot with this product lately and I thought it would be helpful to share some of the information I’ve collected along the way.

One of the new features in SRM 5.0 is vSphere Replication.  I’ve heard some people refer to it as Host Based Replication or HBR for short.  In terms of how it works, this is an accurate description and it was the feature name during the beta phase.  However, by the time SRM 5.0 went to GA, each of the replication components went through a name change as you’ll see below. If you know me, you’re aware that I’m somewhat of a stickler on branding.  As such, I try to get it right as much as possible myself, and I’ll sometimes point out corrections to others in an effort to lessen or perpetuate confusion.

Another product feature launched around the same time is the vSphere Storage Appliance or VSA for short.  In my brief experience with both products I’ve mentioned so far, I find it’s not uncommon for people to associate or confuse SRM replication with a dependency on the VSA.  This is not the case – they are quite independent.  In fact, one of the biggest selling points of SRM based replication is that it works with any VMware vSphere certified storage and protocol.  If you think about it for a minute, this now becomes a pretty powerful for getting a DR site set up with what you have today storage wise.  It also allows you to get SRM in the door based on the same principles, with the ability to grow into scalable array based replication in an upcoming budget cycle.

With that out of the way, here’s a glimpse at the SRM 5.0 native replication components and terminology (both beta and GA).

Beta Name GA Name GA Acronym
HBR vSphere Replication VR
HMS vSphere Replication Management Server vRMS
HBR server vSphere Replication Server vRS
ESXi HBR agent vSphere Replication Agent vR agent

 

Here is a look at how the SRM based replication pieces fit in the SRM 5.0 architecture.  Note the storage objects shown are VMFS but they could be both VMFS datastores as well as NFS datastores on either side:

Snagit Capture

Diagram courtesy VMware, Inc.

To review, the benefits of vSphere Replication are:

  1. No requirement for enterprise array based replication at both sites.
  2. Replication between heterogeneous storage, whatever that storage vendor or protocol might be at each site (so long as it’s supported on the HCL).
  3. Per VM replication. I didn’t mention this earlier but it’s another distinct advantage of VR over per datastore replication.
  4. It’s included in the cost of SRM licensing. No extra VMware or array based replication licenses are needed.

Do note that access to the VR feature is by way of a separate installable component of SRM 5.0.  If you haven’t already installed the component during the initial SRM installation, you can do so afterwards by running the SRM 5.0 setup routine again at each site.

I’ve talked about the advantages of VR.  Again, I think they are a big enabler for small to medium sized businesses and I applaud VMware for offering this component which is critical to the best possible RPO and RTO.  But what about the disadvantages compared to array based replication?  In no particular order:

  1. Cannot replicate templates.  The ‘why’ comes next.
  2. Cannot replicate powered off virtual machines.  The ‘why’ for this follows.
  3. Cannot replicate files which don’t change (powered off VMs, ISOs, etc.)  This is because replications are handled by the vRA component – a shim in vSphere’s storage stack deployed on each ESX(i) host.  By the way, Changed Block Tracking (CBT) and VMware snapshots are not used by the vRA.  The mechanism uses a bandwidth efficient technology similar to CBT but it’s worth pointing out it is not CBT.  Another item to note here is that VMs which are shut down won’t replicate writes during the shutdown process.  This is fundamentally because only VMs which are powered on and stay powered on are replicated by VR.  Current state of the VM would, however, be replicated once the VM is powered back on.
  4. Cannot replicate FT VMs. Note that array based replication can be used to protect FT VMs but once recovered they are not longer FT enabled.
  5. Cannot replicate linked clone trees (Lab Manager, vCD, View, etc.)
  6. Array based replication will replicate a VMware based snapshot hierarchy to the destination site while leaving them in tact. VR can replicate VMs with snapshots but they will be consolidated at the destination site.  This is again based on the principle that only changes are replicated to the destination site.
  7. Cannot replicate vApp consistency groups.
  8. VR does not work with virtual disks opened in “multi-writer mode” which is how MSCS VMs are configured.
  9. VR can only be used with SRM.  It can’t be used as a data replication for your vSphere environment outside of SRM.
  10. Losing a vSphere host means that the vRA and the current replication state of a VM or VMs is also lost.  In the event of HA failover, a full-sync must be performed for these VMs once they are powered on at the new host (and vRA).
  11. The number of VMs which can be replicated with VR will likely be less than array based replication depending on the storage array you’re comparing to.  In the beta, VR supported 100 VMs.  At GA, SRM 5.0 supports up to 500 VMs with vSphere Replication. (Thanks Greg)
  12. In band VR requires additional open TCP ports:
    1. 31031 for initial replication
    2. 44046 for ongoing replication
  13. VR requires vSphere 5 hosts at both the protected and recovery sites while array based replication follows only general SRM 5.0 minimum requirements of vCenter 5.0 and hosts which can be 3.5, 4.x, and/or 5.0.

The list of disadvantages appears long but don’t let that stop you from taking a serious look at SRM 5.0 and vSphere Replication.  I don’t think there are many, if any, showstoppers in that list for small to medium businesses.

I hope you find this useful.  I gathered the information from various sources, much of it from an SRM Beta FAQ which to the best of my knowledge are still fact today in the GA release.  If you find any errors or would like to offer corrections or additions, as always please feel free to use the Comments section below.

VMware issues recall on new vSphere 5.0 UNMAP feature

September 30th, 2011

One of the new features in vSphere 5.0 is Thin Provisioning Block Space Reclamation (UNMAP).  This was released as one part of a new VAAI primitive (the other component of the new primitive being thin provision stun).

Today, VMware released KB 2007427 Disabling VAAI Thin Provisioning Block Space Reclamation (UNMAP) in ESXi 5.0.

Due to varied response times from the storage devices, UNMAP command can result in poor performance of the system and should be disabled on the ESXi 5.0 host.  This variation of response times in critical regions could potentially interfere with operations such as Storage vMotion and Virtual Machine Snapshot consolidation.

VMware intends to disable UNMAP in an upcoming patch release till full support for Space Reclamation is available.

As described in the article, the workaround to avoid the use of UNMAP commands on Thin Provisioned LUNs is as follows:

  1. Log into your host using Tech Support mode. For more information on using Tech Support mode see Tech Support Mode in ESXi 4.1 and 5.0 (1017910).
  2. From your ESXi 5.0 host, issue this esxcli command:  esxcli system settings advanced set –int-value 0 –option /VMFS3/EnableBlockDelete

Note: In the command above, double hyphens are used before “int-value” and “option”; the font used may render them as a single long hypen. This is a per-host setting and must be issued on each ESXi 5.0 host in your cluster.

Update 12/16/11: VMware released five (5) non-critical patches last night.  One of those patches is ESXi500-201112401-SG which is the anticipated update that disables the UNMAP functionality in the new vSphere 5 Thin Provisioning VAAI primitive.  Full patch details below:

Summaries and Symptoms

This patch updates the esx-base VIB to resolve the following issues:

  • Updates the glibc third party library to resolve multiple security issues.
    The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the names CVE-2010-0296, CVE-2011-0536, CVE-2011-1071, CVE-2011-1095, CVE-2011-1658 and CVE-2011-1659 to these issues.
  • When a hot spare disk that is added to a RAID group is accessed before the disk instance finishes initialization or if the disk is removed while an instance of it is being accessed, a race condition might occur causing the vSphere Client to not display information about the RAID controllers and the vSphere Client user interface might also not respond for a very long time.
  • vMotion fails with the A general system error occurred: Failed to flush checkpoint data!error message when:
    • The resolution of the virtual machines is higher than 1280×1024, or smaller if you are using a second screen
    • The guest operating system is using the WDDM driver (Windows 7, Windows 2008 R2, Windows 2008, Windows Vista)
    • The virtual machine is using Virtual Machine Hardware version 8.
  • Creating host profiles of ESX i 5.0 hosts might fail when the host profile creation process is unable to resolve the hostname and IP address of the host by relying on the DNS for hostname and IP address lookup. An error message similar to the following is displayed:
    Call"HostProfileManager.CreateProfile" for object "HostProfileManager" on vCenter Server"<Server_Name> failed.
    Error extracting indication configuation: [Errno- 2] Name or service not known.
  • In vSphere 5.0, Thin Provisioning is enabled by default on devices that adhere to T10 standards. On such thin provisioned LUNs, vSphere issues SCSI UNMAP commands to help the storage arrays reclaim unused space. Sending UNMAP commands might cause performance issues with operations such as snapshot consolidation or storage vMotion.
    This patch resolves the issue by disabling the space reclamation feature, by default.
  • If a user subscribes for an ESXi Server’s CIM indications from more that one client (for example, c1 and c2) and deletes the subscription from the first client (c1), the other clients (C2) might fail to receive any indication notification from the host.

This patch also provides you with the option of configuring the iSCSI initiator login timeout value for software iSCSI and dependent iSCSI adapters.
For example, to set the login timeout value to 10 seconds you can use commands similar to the following:

  • ~ # vmkiscsi-tool -W -a "login_timeout=10" vmhba37
  • ~ # esxcli iscsi adapter param set -A vmhba37 -k LoginTimeout -v 10

The default login timeout value is 5 seconds and the maximum value that you can set is 60 seconds.
We recommend that you change the login timeout value only if suggested by the storage vendor.

VMware View 5.0 VDI vHardware 8 vMotion Error

September 20th, 2011

General awareness/heads up blog post here on something I stumbled on with VMware View 5.0.  A few weeks ago while working with View 5.0 BETA in the lab, I ran into an issue where a Windows 7 virtual machine would not vMotion from one ESXi 5.0 host to another.  The resulting error in the vSphere Client was:

A general system error occurred: Failed to flush checkpoint data

I did a little searching and found similar symptoms in VMware KB 1011971 which speaks to an issue that can arise  when Video RAM (VRAM) is greater than 30MB for a virtual machine. In my case it was greater than 30MB but I could not adjust it due to the fact that it was being managed by the View Connection Server.  At the same time, a VMware source on Twitter volunteered his assistance and quickly came up with some inside information on the issue.  He had me try adding the following line to /etc/vmware/config on the ESXi 5.0 hosts (no reboot required):

migrate.baseCptCacheSize = “16777216”

The fix worked and I was able to vMotion the Windows 7 VM back and forth between hosts.  The information was taken back to Engineering for a KB to be released.  That KB is now available: VMware KB 2005741 vMotion of a virtual machine fails with the error: A general system error occurred: Failed to flush checkpoint data! The new KB article lists the following background information and several workarounds:

Cause

Due to new features with Hardware Version 8 for the WDDM driver, the vMotion display graphics memory requirement has increased. The default pre-allocated buffer may be too small for certain virtual machines with higher resolutions. The buffer size is not automatically increased to account for the requirements of those new features if mks.enable3d is set to FALSE (the default).

Resolution

To work around this issue, perform one of these options:

  • Change the resolution to a single screen of 1280×1024 or smaller before the vMotion.
  • Do not upgrade to Virtual Machine Hardware version 8.
  • Increase the base checkpoint cache size. Doubling it from its default 8MB to 16MB (16777216 byte) should be enough for every single display resolution. If you are using two displays at 1600×1200 each, increase the setting to 20MB (20971520 byte).To increase thebase checkpoint cache size:

    1. Power off the virtual machine.
    2. Click the virtual machine in the Inventory.
    3. On the Summary tab for that virtual machine, click Edit Settings.
    4. In the virtual machine Properties dialog box, click the Options tab.
    5. Under Advanced, select General and click Configuration Parameters.
    6. Click Add Row.
    7. In the new row, add migrate. baseCptCacheSize to the name column and add 16777216 to the value column.
    8. Click OK to save the change.

    Note: If you don’t want to power off your virtual machine to change the resolution, you can also add the parameter to the /etc/vmware/config file on the target host. This adds the option to every VMX process that is spawning on this host, which happens when vMotion is starting a virtual machine on the server.

  • Set mks.enable3d = TRUE for the virtual machine:
    1. Power off the virtual machine.
    2. Click the virtual machine in the Inventory.
    3. On the Summary tab for that virtual machine, click Edit Settings.
    4. In the virtual machine Properties dialog box, click the Options tab.
    5. Under Advanced, select General and click Configuration Parameters.
    6. Click Add Row.
    7. In the new row, add mks.enable3d to the name column and add True to the value column.
    8. Click OK to save the change.
Caution: This workaround increases the overhead memory reservation by 256MB. As such, it may have a negative impact on HA Clusters with strict Admission Control. However, this memory is only used if the 3d application is active. If, for example, Aero Basic and not Aero Glass is used as a window theme, most of the reservation is not used and the memory could be kept available for the ESX host. The reservation still affects HA Admission Control if large multi-monitor setups are used for the virtual machine and if the CPU is older than a Nehalem processor and does not have the SSE 4.1 instruction set. In this case, using 3d is not recommended. The maximum recommended resolution for using 3d, regardless of CPU type and SSE 4.1 support, is 1920×1200 with dual screens.

The permanent fix for this issue did not make it into the recent View 5.0 GA release but I expect it will be included in a future release or patch.

Update 12/23/11: VMware released five (5) non-critical patches last week.  One of those patches is ESXi500-201112401-SG which permanently resolves the issues described above.  Full patch details below:

Summaries and Symptoms

This patch updates the esx-base VIB to resolve the following issues:

  • Updates the glibc third party library to resolve multiple security issues.
    The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the names CVE-2010-0296, CVE-2011-0536, CVE-2011-1071, CVE-2011-1095, CVE-2011-1658 and CVE-2011-1659 to these issues.
  • When a hot spare disk that is added to a RAID group is accessed before the disk instance finishes initialization or if the disk is removed while an instance of it is being accessed, a race condition might occur causing the vSphere Client to not display information about the RAID controllers and the vSphere Client user interface might also not respond for a very long time.
  • vMotion fails with the A general system error occurred: Failed to flush checkpoint data!error message when:
    • The resolution of the virtual machines is higher than 1280×1024, or smaller if you are using a second screen
    • The guest operating system is using the WDDM driver (Windows 7, Windows 2008 R2, Windows 2008, Windows Vista)
    • The virtual machine is using Virtual Machine Hardware version 8.
  • Creating host profiles of ESX i 5.0 hosts might fail when the host profile creation process is unable to resolve the hostname and IP address of the host by relying on the DNS for hostname and IP address lookup. An error message similar to the following is displayed:
    Call"HostProfileManager.CreateProfile" for object "HostProfileManager" on vCenter Server"<Server_Name> failed.
    Error extracting indication configuation: [Errno- 2] Name or service not known.
  • In vSphere 5.0, Thin Provisioning is enabled by default on devices that adhere to T10 standards. On such thin provisioned LUNs, vSphere issues SCSI UNMAP commands to help the storage arrays reclaim unused space. Sending UNMAP commands might cause performance issues with operations such as snapshot consolidation or storage vMotion.
    This patch resolves the issue by disabling the space reclamation feature, by default.
  • If a user subscribes for an ESXi Server’s CIM indications from more that one client (for example, c1 and c2) and deletes the subscription from the first client (c1), the other clients (C2) might fail to receive any indication notification from the host.

This patch also provides you with the option of configuring the iSCSI initiator login timeout value for software iSCSI and dependent iSCSI adapters.
For example, to set the login timeout value to 10 seconds you can use commands similar to the following:

  • ~ # vmkiscsi-tool -W -a "login_timeout=10" vmhba37
  • ~ # esxcli iscsi adapter param set -A vmhba37 -k LoginTimeout -v 10

The default login timeout value is 5 seconds and the maximum value that you can set is 60 seconds.
We recommend that you change the login timeout value only if suggested by the storage vendor.

Professional VMware BrownBag Group Learning

September 19th, 2011

Snagit Capture

If you weren’t already aware, VMware vEXPERT Cody Bunch has been hosting a series of BrownBag learning sessions covering topics from VCP4, VCAP4-DCA, and VCAP4-DCD exam blueprints, in addition to VCDX topics.  A number of individuals from the VMware community have been lending Cody assistance in leading these sessions.  I’ll be stepping up to the plate this Wednesday evening, 9/21 at 7pm CDT to help out.  I’ll be covering VCAP4-DCD exam blueprint objectives:

  • 1.1 Gather and analyze business requirements
  • 1.2 Gather and analyze application requirements
  • 1.3 Determine Risks, Constraints, and Assumptions

If you’re thinking of attempting the VCAP4-DCD exam or if you’re preparing for the VCDX certification, this session is for you.  Again, details below, sign up today – it’s free!

Updated 9/21/11: The live session is complete but you can download the recorded version at the Professional VMware link above.  I’m also embedding a link to the SlideRocket presentation for as long as my trial account is active (through the beginning of October).