Virtualizing vCenter With vDS Catch-22

October 9th, 2009 by jason Leave a reply »

I’ve typically been a fan of virtualizing the vCenter management server in most situations. VMware vCenter and Update Manager both make fine virtualization candidates as long as the underlying infrastructure for vCenter stays up. Loss of vCenter in a blackout situation can make things a bit of a hassle, but one can work through it with the right combination of patience and knowledge.

A few nights ago I had decided to migrate my vCenter VM to my vSphere virtual infrastructure. Because my vCenter VM was on a standalone VMware Server 2.0 box, I had to shut down the vCenter VM in order to cold migrate it to one of the ESX4 hosts directly, transfer the files to the SAN, upgrade virtual hardware, etc. Once the files were migrated to the vSphere infrastructure, it was time to configure the VM for the correct network and power it up. This is where I ran into the problem.

vCenter was shut down and unavailable, therefore, I had connected my vSphere client directly to the ESX4 host in which I transferred the VM to. When trying to configure the vCenter VM to use the vNetwork Distributed Switch (vDS) port group I had set up for all VM traffic, it was unavailable in the dropdown list of networks. The vCenter server was powered down and thus the vDS Control Plane was unavailable, eliminating my view of vDS networks.

This is a dilemma. Without a network connection, the vCenter server will not be able to communicate with the back end SQL database on a different box running SQL. This will cause the vCenter server services to not start and thus I’ll never have visibility to the vDS. Fortunately I have a fairly flat network in the lab with just a few subnets. I was able to create a temporary vSwitch and port group locally on the ESX4 host which would grant the vCenter VM the network connectivity it needed so I could then modify the network, changing from a local to a vDS port group on the fly.

Once the vCenter server was back up, I further realized that vDS port groups are still unable to be seen when the vSphere client is connected directly to an ESX4 host. The ability configure a VM to utilize vDS networking requires both that the vCenter server be functional, as well as a vSphere client connected to said vCenter server and not a managed host.

The situation I explained above is the catch-22 – the temporary inability to configure VMs for vDS networking while the vCenter server is unavailable. One might call my situation a convergence of circumstances, but with an existing virtualized vCenter server that you’re looking to migrate to a vDS integrated vSphere infrastructure, the scenario is very real. I’d like to note all VMs that had been running on a vDS port continued to run without a network outage as the vDS Data Plane is maintained on each host and remained in tact.

Advertisement

No comments

  1. Jason, this sort of situation can and should be something we can anticipate. Thanks for discovering it. We need clear recovery plans for power outages too.

    How do you turn on the license server when it’s on a VM? Always have a stand alone ESX license.

    How do you know what ESX host the vCenter server is supposed to be on? Use affinity rules.

    And for your situation, you should always have a regular portgroup on a management vLan and maybe even a multihomed windows client with the VIC on it so you can RDP into it.

    Like we learned in the Boy Scouts, always be prepared!

    Great work! Keep it up!

    I b e n

  2. I’m really surprised/shocked/disappointed you had this experience. I don’t doubt it happened, just as you explained. There’s supposed to be local DB and the DvSfolder on shared volume – that should maintain the DvSwitch configuration even if vCenter is down. You wouldn’t have thought something as crude as patching a VM to a portgroup – would require vCenter to be up. The data plane is still there after all on the host. My worry about this – is that folks will use this an one of the many (generally flawed) reasons for not virtualizing vCenter. I’m always horrified by the numbers of folks you find physicalizing it…

    I guess the moral of this story – is lets hope its a bug that can fixed, rather than something by design. And this is great reason to keep your “infrastructure” VMs (the ones that make your vCenter management system work) on good old Standard vSwitches that we know and love…

  3. jason says:

    Iben and Mike, I encourage you to connect your vSphere client directly to an ESX(i)4 host, create a new VM “shell” (custom) and see if you are able to choose a vDS portgroup for the VM during the creation. My tests were on ESXi4 but it shouldn’t make a difference.

    As I had stated, the existing vDS network connections were maintained for running VMs while vCenter was down. What I had lost was the ability to establish new connections to the vDS.

    Thank you,
    Jas

  4. Rob says:

    these issues also exist on the Cisco Nexus 1000v DVS. unfortunately vCenter manages the entire vSwitch so if vCenter is not availiable you cannot assign VMs to DVS port groups, nor will vMotion (or HA) work correctly. vCenter needs to be up and connected to the network and DVS control networks. this is due to vCenter configuring the VEM on the ESX server on the fly. There is no way around these issues as yet, although i’m not sure why the whole cluster wide nework config isn’t replicated to all hosts.
    to minimise the impact i have a standalone vSwitch where my vCenter resides. this way even if the worst happens you can still boot the vCenter server and have network connectivity.

  5. Rob,

    Please clarify…

    – HA should still work even if vCenter is down.
    – HA is initiated during system power on and vCenter helps negotiate the HA pairing between ESX hosts. After that the ESX hosts can carry on HA protection without vCenter.

    Am I missing something?

    Jason,

    I agree with all your points. The conclusion to be drawn from your post is that a non-distributed portgroup should be created on certain key ESX hosts so you don’t have to do this on the fly in the event of a power failure or other sort of disaster.

    Would you agree with this as a “best practice”?

  6. jason says:

    Iben, once the vCenter VM is attached to the vDS port, it should remain there (for life) without issues. If the vCenter server goes down temporarily, it should come back up without issue on the vDS port. The problem arises when trying to adjust network settings of a VM when the vCenter server is unavailable, or when the vSphere client is attached directly to an ESX(i)4 host. It is at this time when a vDS port group is unavailable for selection.

    Mine was a one-off situation that can be planned for. I have a hard time making permanent design considerations around the migration of a vCenter server to vDS which is going to involve at least a dedicated vSwitch, port group, VLAN, and at least 1 PNIC. In my book, that’s kind of hard to justify but that’s just me and each design depends on the topology of environment and maybe the political climate as well.

    Starting with vSphere, we’re seeing a lot more VMware-based bolt ons that require high availability of vCenter. We need to make sure vCenter stays available as much as possible or the add-ons start breaking. In the case of vDS, we lose some native VMware functionality which as Mike Laverick puts it is a disappointment. I expect more out of an Enterprise Plus level feature. Other customers should demand this as well for the price that is paid for VMware’s top tier license. However, this is version 1.0 of vDS so I’ll cut VMware a little slack and give them time to address this. But this should not remain a long term achilles heel.

  7. It seems to me that this problem would have been avoided if you had a Linked VCenter instance. Is that right?

  8. jason says:

    A Linked vCenter would not have solved the problem.

  9. Rob says:

    Iben, I believe HA will work but the restarted VM will not have a network connection until vCenter has reconfigured the VEM on the host. obviously if the restarted VM is the vCenter server and its network is on a DVS it will not be able to reconfigure it.
    i may be wrong here but this is definitely the case with DRS / vMotion – if vCenter is not available or not properly connected to the VSM the VM will still move but will lose it’s network connection.

  10. Okay, I’m starting to get the problem(s) here…

    vCenter is on dvs
    ESX vCenter is on dies
    vCenter vm starts up on new ESX since it’s protected with HA
    VCenter won’t have network since dvs cannot be reconfigured on new ESX

    Is this correct?

    If it is then it seems like vCenter should have non-dvs portgroup to manage ESX hosts.

  11. gogogo5 says:

    Things can get even stickier if you have enabled Lockdown mode (and still rely on using root as your VIC login) and then your vCentre server goes down. Can you then point your vSphere client to your ESX(i) host that has been configured for Lockdown mode…hhmmmm…..

    Can you disable Lockdown Mode via the console on a per host basis to override this?

    Can someone validate Iben’s assessment above, is his theory correct i.e.:

    vCenter is on dvs
    ESX vCenter is on dies
    vCenter vm starts up on new ESX since itโ€™s protected with HA
    VCenter wonโ€™t have network since dvs cannot be reconfigured on new ESX

    Is this correct?

  12. gogogo5 says:

    Calling all vExperts, lots of good questions needing your attention ๐Ÿ˜‰

  13. Tom Howarth says:

    gogogo5:- I believe that the issue only relates to Hosts running Nexus 1000v switches, not basic vNDS switches. Here the data plane is shared across all hosts and highly available.

    Thanks to Rob as that is a Design issue I did not know about.

    To me the biggest issues regarding Virtual Infrastructure is the inability to start a server hosting a license server, this is very much the chicken and egg scenirio and some thing VMware need to get addressed.

    One thing I do not have access to a lab at the moment but how is licensing affected in pure vSphere environments?

  14. Jason,

    FYI I sort of experienced a similar catch22 when I tried (in the lab) to modify the EVC configuration of my 2 hosts cluster (vCenter was running as a VM on top of it). Off the top of my mind.. the problem was that, in order to change the EVC policy all VMs on the cluster were to be off…. but since the change of the policy is only available within VC.. and VC was running on the cluster…… catch22….

    Massimo.

  15. Andy Daniel says:

    For this reason and others, I’m recommending most customers use the “Hybrid” approach to vDS, or at least keep a backup SC on a classic standard vSwitch.

    Just ran across this note regarding the Nexus 1kv VEM install…

    “VUM will not install the VEM software on a host where the vCenter resides. The vCenter must be migrated to another host before installing VEM software.”

    I’m assumming that this is to avoid a similar situation to that which you describe. I’m planning on migrating vCenter, installing the VEM and migrating back.

  16. Linus T. says:

    Jason, the solution to this is to turn the port group from the default Static binding to Ephemeral. You’ll then be able to access it when connecting directly to the ESX host and thus, assign your vCenter to it. ๐Ÿ™‚

    Remember, however, that you need to change the binding BEFORE adding VMs to it (so creating 3 port groups, one of each binding, would help in this scenario)

  17. Doug Youd says:

    Can anyone confirm Linus T’s assertion that changing to Ephemeral binding on the portgroups will make them visible in the instance of vCenter being down?

  18. goldrush says:

    Hi Doug,

    at least I can confirm “seeing” the dvportgroup when connected directly to an ESX host when having enabled ephemeral binding before…

    Greetings

  19. Gary B says:

    Verified the migration of vcenter vm to a distributed switch using the Ephemeral port

  20. Dan says:

    With ESXi41 and vCenter41 when the vCenter is disconnected, a VM that was recovered by HA will respond at ping if it is configured to use vDS41 and Static binding.
    In ‘Edit settings’ the network adapter will display ‘invalid backing’ but the traffic is ok.

  21. Anshad says:

    the fix for this issue is
    1. Rename the exiting vCenter to a new name in the inventory
    2. Clone the vCenter to the old vcenter name (destination should be on a host which has vDS configured). Cloning destination datastore should not be same to avoid any VMFS VM name issues
    Note: You need two datastore in the ESX to do this.
    3. Change the cloned VM’s network settings to use the desired vDS portgroup
    4. Shutdown the existing vCenter
    5. Connect to the ESX host which has the cloned vCenter VM
    6. using vmkfstools -i command clone the latest disk from old vCenter to a new vCenter (use the same VMDK names for destination)
    7. Bring up the new vCenter and you are good to go

  22. Pawel says:

    Well, seems to be solved in vSphere 5.1 in two ways:

    – automatic rollback of changes made to mgmt network if the host cannot reach vCenter Server;

    – possibility to modify mgmt netowork on vDS localy on the host.

    More info: http://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Network-Technical-Whitepaper.pdf

  23. Also – vShield now has exclusion thing going on – which was another issue when using vCenter in VM with vShield/vCD…