Cloning VMs, Guest Customization, & vDS Ephemeral Port Binding

November 25th, 2011 by jason Leave a reply »

I spent a lot of time in the lab over the past few days.  I had quite a bit of success but I did run into one issue in which the story does not have a very happy ending.

The majority of my work involved networking in which I decommissioned all legacy vSwitches in the vSphere 5 cluster and converted all remaining VMkernel port groups to the existing vNetwork Distributed Switch (vDS) where I was already running the majority of the VMs on Static binding port groups.  In the process, some critical infrastructure VMs were also moved to the vDS including the vCenter, SQL, and Active Directory domain controller servers.  Because of this, I elected to implement Ephemeral – no binding for the port binding configuration of the VM port group which all VMs were connected to, including some powered off VMs I used for cloning to new virtual machines.  This decision was made in case there was a complete outage in the lab.  Static binding presents issues where in some circumstances, VMs can’t power on when the vCenter Server (Control Plane of the vDS) is down or unavailable.  Configuring the port group for Ephemeral – no binding works around this issue by allowing VMs to power on and claim their vDS ports when the vCenter Server is down.  There’s a good blog article on this subject by Eric Gray which you can find here.

Everything was working well with the new networking configuration until the following day when I tried deploying new virtual machines by cloning powered off VMs which were bound to the Ephemeral port group.  After the cloning process completed, the VM powered on for the first time and Guest Customization was then supposed to run.  This is where the problems came up.  The VMs would essentially hang just after guest customization was invoked by the vCenter Server.  While watching the remote console of the VM, it was evident that Guest Customization wasn’t starting.  At this point, the VM can’t be powered off – an error is displayed:

Cannot power Off vm_name on host_name in datacenter_name: The attempted operation cannot be performed in the current state (Powered on).

DRS also starts producing occasional errors on the host:

Unable to apply DRS resource settings on host host_name in datacenter_name. The operation is not allowed in the current state.. This can significantly reduce the effectiveness of DRS.

VMware KB 1004667 speaks to a similar circumstance where a blocking task on a VM (in this case a VMware Tools installation) prevents any other changes to it.  This speaks to why the VM can’t be powered off until the VMware Tools installation or Guest Customization process either ends or times out.

Finally, the following error in the cluster Events is what put me on to the suspicion of Ephemeral binding as the source of the issues:

Error message on vm_name on host_name in datacenter_name: Failed to connect virtual device Ethernet0.

Error Stack:

Failed to connect virtual device Ethernet0.

Unable to get networkName or devName for ethernet0

Unable to get dvs.portId for ethernet0

I searched the entire vSphere 5 document library for issues or limitations related to the use of Ephemeral – no binding but came up empty.  This reinforced my assumption that Ephemeral binding across the board for all VMs was a supported configuration.  Perhaps it is for running virtual machines but in my case it fails when used in conjunction with cloning and guest customization.  In the interim, I’ve moved off Ephemeral binding back to Static binding.  Cloning problem solved.

Advertisement

5 comments

  1. Josh Atwell says:

    Great write-up Jason. I was curious if you saw similar behavior when deploying from a template rather than a clone. I know that templates in a DVS can also be quirksome and wonder if vcenter will treat the template differently. I suspect not though.

    I looked over my notes in my VCAP prep and it does say specifically that ephemeral is supported for all power states.

  2. Tom Miller says:

    Thanks Jason,
    Trying to get my head wrapped around Ephemeral ports in order to switch all port-groups to vDS. It gets ugly in a lab when vCenter does not start after a host outage or reboot. A lot of times the auto start VM’s gets “disabled” even though it was configured?

  3. jason says:

    Unfortunately VMs don’t maintain their autostart/autoshutdown properties when they are migrated from host to host via vMotion or other migration method, even within a cluster. I’ve had a few issues with vDS in the past (oh.. slight mistake there, it’s now VDS according to VMware Marketing) but since VMware offers the ability to forcefully swing a VDS VMNIC uplink over to a back door vSwitch in order to start a vCenter VM, a full outage is much less of an issue than it was in the past. I believe VMware introduced that “lifeboat” feature in vSphere 4.1. Of course the other option is the use of Ephemeral ports but based on my experience in the lab, they can’t be used reliably with templates or VMs that you might want to clone from (which is what this post was all about). I’m not sure if that’s by design – it seems more like a bug to me. I don’t really see dedicating a VDS solely for infrastructure VMs such as vCenter, SQL, and AD bound to Ephemeral ports, and than an addition VDS for everything else such as running VMs, templates, etc. If VMware were to get that resolved, I think most people would be in pretty good shape for using VDS across the board. VDS and 3rd party virtual switches such as the Nexus 1000V is the direction VMware is going.

  4. Jose says:

    I got the same issue using vCenter 5 upd1 and vSphere ESxi 5 upd1.
    Template/cloning provisioning from a distributed port group will work if PG is using static binding, but will fail on ephemeral mode.

  5. Ryan says:

    Interesting. We are using View 5 here with vSphere 5.1 (previously 5.0u1) and a VDS without any issues. It is a requirement in View that the VM be powered off when snapshotted (but it’s always off for us unless actively making a change). After being snapshotted, it is cloned to to a replica, then cloned again for each View desktop.

    In this configuration, VMware *requires* ephemeral ports if using VDS. If you use static binding, whenever the View desktops refresh, such as on every logout for us, it errors out because it can’t bind the NIC to the VDS. This is because the the port is seen as already being in use/reserved – static binding assigns the port at the time the VM (or NIC) is created. As such, it needs to be configured with ephemeral binding. However, we have never run into this issue despite doing clones on the powered off VM using ephemeral binding. Then again, we do “QuickPrep”, not the built-in guest customization when cloning. Maybe that’s where the bug is?

Leave a Reply