Posts Tagged ‘vCloud Director’

vCloud Director web page portal fails to load

March 25th, 2019

Last week I went through the process to upgrade a vCloud Director for Service Providers environment to version 9.5.0.2. All seemed to go well with the upgrade. However, after all was said and done, the vCloud Director web page portal failed to open. It would partially load… but then failed.

I seem to recall this happening at some point in the past but couldn’t remember the root cause/fix nor could I find it documented on my blog. So… time to dig into the logs.

The watchdog log showed the cell services recyling over and over.

[root@vcdcell1 logs]# tail -f vmware-vcd-watchdog.log
2019-03-22 11:25:25 | WARN | Server status returned HTTP/1.1 404
2019-03-22 11:26:25 | ALERT | vmware-vcd-cell is dead but /var/run/vmware-vcd-cell.pid exists, attempting to restart it
2019-03-22 11:26:33 | INFO | Started vmware-vcd-cell (pid=10238)
2019-03-22 11:26:36 | WARN | Server status returned HTTP/1.1 404
2019-03-22 11:27:36 | ALERT | vmware-vcd-cell is dead but /var/run/vmware-vcd-cell.pid exists, attempting to restart it
2019-03-22 11:27:43 | INFO | Started vmware-vcd-cell (pid=10827)
2019-03-22 11:27:46 | WARN | Server status returned HTTP/1.1 404
2019-03-22 11:28:46 | ALERT | vmware-vcd-cell is dead but /var/run/vmware-vcd-cell.pid exists, attempting to restart it

The cell log showed a problem with Transfer Server Storage. Error starting application: Unable to create marker file in the transfer spooling area

[root@vcdcell1 logs]# tail -f cell.log
Application Initialization: ‘com.vmware.vcloud.networking-server’ 20% complete. Subsystem ‘com.vmware.vcloud.common-cell-impl’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 12% complete. Subsystem ‘com.vmware.vcloud.common-util’ started
Application Initialization: ‘com.vmware.vcloud.cloud-proxy-server’ 42% complete. Subsystem ‘com.vmware.vcloud.common-util’ started
Application Initialization: ‘com.vmware.vcloud.networking-server’ 40% complete. Subsystem ‘com.vmware.vcloud.common-util’ started
Application Initialization: ‘com.vmware.vcloud.cloud-proxy-server’ 57% complete. Subsystem ‘com.vmware.vcloud.cloud-proxy-services’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 16% complete. Subsystem ‘com.vmware.vcloud.api-framework’ started
Application Initialization: ‘com.vmware.vcloud.cloud-proxy-server’ 71% complete. Subsystem ‘com.vmware.vcloud.hybrid-networking’ started
Application Initialization: ‘com.vmware.vcloud.cloud-proxy-server’ 85% complete. Subsystem ‘com.vmware.vcloud.hbr-aware-plugin’ started
Application Initialization: ‘com.vmware.vcloud.cloud-proxy-server’ 100% complete. Subsystem ‘com.vmware.vcloud.cloud-proxy-web’ started
Application Initialization: ‘com.vmware.vcloud.cloud-proxy-server’ complete.
Application Initialization: ‘com.vmware.vcloud.common.core’ 20% complete. Subsystem ‘com.vmware.vcloud.common-vmomi’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 25% complete. Subsystem ‘com.vmware.vcloud.jax-rs-activator’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 29% complete. Subsystem ‘com.vmware.pbm.placementengine’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 33% complete. Subsystem ‘com.vmware.vcloud.vim-proxy’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 37% complete. Subsystem ‘com.vmware.vcloud.fabric.foundation’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 41% complete. Subsystem ‘com.vmware.vcloud.imagetransfer-server’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 45% complete. Subsystem ‘com.vmware.vcloud.fabric.net’ started
Application Initialization: ‘com.vmware.vcloud.networking-server’ 60% complete. Subsystem ‘com.vmware.vcloud.fabric.net’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 50% complete. Subsystem ‘com.vmware.vcloud.fabric.storage’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 54% complete. Subsystem ‘com.vmware.vcloud.fabric.compute’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 58% complete. Subsystem ‘com.vmware.vcloud.service-extensibility’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 62% complete. Subsystem ‘com.vmware.vcloud.backend-core’ started
Application Initialization: ‘com.vmware.vcloud.networking-server’ 80% complete. Subsystem ‘com.vmware.vcloud.backend-core’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 66% complete. Subsystem ‘com.vmware.vcloud.vapp-lifecycle’ started
Application Initialization: ‘com.vmware.vcloud.networking-server’ 100% complete. Subsystem ‘com.vmware.vcloud.networking-web’ started
Application Initialization: ‘com.vmware.vcloud.networking-server’ complete.
Application Initialization: ‘com.vmware.vcloud.common.core’ 70% complete. Subsystem ‘com.vmware.vcloud.content-library’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 75% complete. Subsystem ‘com.vmware.vcloud.presentation-api-impl’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 79% complete. Subsystem ‘com.vmware.vcloud.metrics-core’ started
Application Initialization: ‘com.vmware.vcloud.ui.h5cellapp’ 33% complete. Subsystem ‘com.vmware.vcloud.h5-webapp-provider’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 83% complete. Subsystem ‘com.vmware.vcloud.multi-site-core’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 87% complete. Subsystem ‘com.vmware.vcloud.multi-site-api’ started
Application Initialization: ‘com.vmware.vcloud.ui.h5cellapp’ 50% complete. Subsystem ‘com.vmware.vcloud.h5-webapp-tenant’ started
Application Initialization: ‘com.vmware.vcloud.ui.h5cellapp’ 66% complete. Subsystem ‘com.vmware.vcloud.h5-webapp-auth’ started
Application Initialization: ‘com.vmware.vcloud.ui.h5cellapp’ 83% complete. Subsystem ‘com.vmware.vcloud.h5-swagger-doc’ started
Application Initialization: ‘com.vmware.vcloud.ui.h5cellapp’ 100% complete. Subsystem ‘com.vmware.vcloud.h5-swagger-ui’ started
Application Initialization: ‘com.vmware.vcloud.ui.h5cellapp’ complete.
Application Initialization: ‘com.vmware.vcloud.common.core’ 91% complete. Subsystem ‘com.vmware.vcloud.rest-api-handlers’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 95% complete. Subsystem ‘com.vmware.vcloud.jax-rs-servlet’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ 100% complete. Subsystem ‘com.vmware.vcloud.ui-vcloud-webapp’ started
Application Initialization: ‘com.vmware.vcloud.common.core’ complete.
Successfully handled all queued events.
Error starting application: Unable to create marker file in the transfer spooling area: /opt/vmware/vcloud-director/data/transfer/cells/8a483603-43b8-4215-b33f-48270582f03d

To be honest, the NFS server which hosts Transfer Server Storage in this environment isn’t always reliable but upon checking, it was up and healthy. Furthermore, I was able to manually create a test file within this Transfer Server Storage space from the vCD cell itself.

Walking the directory structure and looking at permissions, a few things didn’t look right.

[root@vcdcell1 data]# ls -l -h
total 4.0K
drwx——. 3 vcloud vcloud 27 Mar 22 11:39 activemq
drwxr-x—. 2 vcloud vcloud 6 Mar 15 04:58 generated-bundles
drwxr-x—. 2 vcloud vcloud 4.0K Mar 15 04:58 transfer
[root@vcdcell1 data]# pwd
/opt/vmware/vcloud-director/data
[root@vcdcell1 data]#
[root@vcdcell1 data]#
[root@vcdcell1 data]#
[root@vcdcell1 data]# cd transfer/
[root@vcdcell1 transfer]# ls -l -h
total 1.0K
drwx——. 2 1002 1002 64 Mar 22 11:38 cells
-rw——-. 1 root root 386 Mar 21 11:51 responses.properties
[root@vcdcell1 transfer]# cd cells/
[root@vcdcell1 cells]# ls -l -h
total 512
-rw——-. 1 1002 1002 0 May 27 2018 8a483603-43b8-4215-b33f-48270582f03d.old
-rw-r–r–. 1 root root 6 Mar 22 11:38 jgbtest.txt

Looking at some of the pieces above, I seem to recall vcloud is supposed to be the owner and group for the vCD file and directory structure. I further verified this by restoring my old vCD cell from a previous snapshot and spot checking. So let’s fix it using the chown example on page 53 of the vCloud Director Installation and Upgrade Guide.

[root@vcdcell1 cells]# chown -R vcloud:vcloud /opt/vmware/vcloud-director
[root@vcdcell1 cells]#
[root@vcdcell1 cells]#
[root@vcdcell1 cells]# ls -l -h
total 512
-rw——-. 1 vcloud vcloud 0 May 27 2018 8a483603-43b8-4215-b33f-48270582f03d.old
-rw-r–r–. 1 vcloud vcloud 6 Mar 22 11:38 jgbtest.txt

The watchdog daemon followed up by restarting vCD cell. With the correct permissions now, a new cell file was successfully created and the vCD cell successfully started. I deleted the .old cell file and of course my jgbtest.txt file.

[root@vcdcell1 cells]# ls -l -h
total 512
-rw——-. 1 vcloud vcloud 0 Mar 22 12:23 8a483603-43b8-4215-b33f-48270582f03d
-rw——-. 1 vcloud vcloud 0 May 27 2018 8a483603-43b8-4215-b33f-48270582f03d.old
-rw-r–r–. 1 vcloud vcloud 6 Mar 22 11:38 jgbtest.txt

How did this happen? I’m pretty sure it was my own fault. Last week I was also doing some deployment testing with the vCD appliance. At the time I felt it was safe for this test cell to use the same Transfer Server Storage NFS mount (so that I wouldn’t have to go through the steps to create another one). Upon further investigation, the vCD appliance cell tattooed the folders and files with the 1002 owner and group seen above.

All is well with the vCD world now and I’ve got it documented so the next time my vCD web portal doesn’t load, I’ll know just where to look.

vCloud Director vdnscope-1 could not be found

August 15th, 2015

For whatever reason, I’ve been spending a pretty fair amount of time lately with vCloud Director both at home as well as at the office. It’s a great product. It always has been, beginning with its Lab Manager roots. Like my last blog post, this writing will exhibit another vCloud Director database editing exercise which stemmed from a problem I encountered in the lab.

I was attempting to get away from my VLAN-backed Network Pool by configuring vCloud Director’s Provider vDC-VXLAN-NP Network Pool which is much more dynamic and powerful in nature. The Provider vDC-VXLAN-NP Network Pool is installed by default in vCloud Director but to configure and use it for Organization and vApp networks, one must follow a set of instructions which basically involves configuring upstream physical switch(es) with jumbo frames, a transport VLAN, and multicast settings, preparing the hosts by installing an agent on each of them using vShield Manager, adding VMkernel ports, Network Scopes, Virtual Wires, and so on (Mike Laverick and Rawlinson Rivera both have easy to follow tutorials. The VMware VXLAN Deployment Guide is also a great read). Once it’s all set up and working, VXLAN is pretty effing cool. Anyway, it sounds like a lot of steps and admittedly it requires some reading and attention to detail, but much of it is automated by vCloud Director, with some bumps along the way.

I did run into a few snags which ultimately lead me to going through the configuration process start to finish a few times. In the end I had to configure the Network Scope in vShield Manager manually when normally this step is performed automatically by vCloud Director via the Enable VXLAN Provider VDC right-click menu item.

Once I got beyond the installation hurdles, there was some residual impact left in the vCloud Director database and vShield Manager such that it all looked to be working properly, except that at the very end I could not power on a vApp with an isolated vApp network which relied on the use of the VXLAN-backed Network Pool. The error message was:

Cannot deploy organization VDC network  (uuid for that network)
com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (202): The requested object : vdnscope-1 could not be found. Object identifiers are case sensitive.

[ bb505f5e-27f1-419e-9b05-da0d38a7788f ] Unable to deploy network “vApp net1(urn:uuid:7d813867-d3f1-420d-a0a8-a65263369327)”.

com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (202): The requested object : vdnscope-1 could not be found. Object identifiers are case sensitive.

– com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (202): The requested object : vdnscope-1 could

not be found. Object identifiers are case sensitive.

– VSM response error (202): The requested object : vdnscope-1 could not be found. Object identifiers are case sensitive.

An object named vdnscope-1 seems to be the obvious problem.

I was not able to make use of the Network Pool Repair function as it was unavailable:

Fortunately I was able to locate a related thread in the VMware Communities which more or less explained what might have happened and what I could try to fix the problem (credit to IamTHEvilONE). This is my interpretation.

Each time a Network Scope is created in the vShield Manager, an underlying object reference is tied to the Network Scope with a naming convention of vdnscope-x where x begins at 1 and is incremented at each create iteration. So the first Network Scope created in vShield Manager by vCloud Director is going to be called vdnscope-1. This object is stored in the vCloud Director database and is referenced each time an Org or vApp network is spun up which leans on the VXLAN-backed Network Pool. This is formally handled at vApp power on. The object is also stored somewhere in the vShield Manager although I was never able to locate it. What happened here is that Network Scope object known by both vCloud Director and vShield Manager were not sync and didn’t match. vCloud Direct dials up vShield Manager and says “I need that vdnscope-1 you have” and vShield Manager responds with “I have no idea what that object is”. Obvious problem.

The solution is fairly simple: Edit the vCloud Director database with the correct Network Scope object reference. But a small problem still remains: I was never able to locate the correct object name in vShield Manager. However, going back to the VMware Communities discussion, I’ll eventually be able to find the correct object name by incrementing the vdnscope-x object reference in the vCloud Director database by 1 until the two sides agree and the vApp powers on successfully.

I’ll borrow the same disclaimer from the previous blog post: An obligatory warning on vCloud database editing. Do as I say, not as I do. Editing the vCloud database should be performed only with the guidance of VMware support. Above all, create a point in time backup of the vCloud database with all vCloud Director cell servers stopped (service vmware-vcd stop). There are a variety of methods in which you can perform this database backup. Use the method that is most familiar to and works for you.

So after stopping the vCloud Director services and getting a vcloud database backup…

Step 1: Open Microsoft SQL Server Management Studio and navigate to the [vcloud].[dbo].[network_pool] table. Under the vdn_scope_id column, increment the vdnscope-1 value from 1 to 2.

Step 2: Start the vCloud Director service in all cell servers (service vmware-vcd start) and verify in vShield Manager the Virtual Wire has been created and the vApp can power on successfully. If it fails, stop vCloud services and repeat Step 1 above while incrementing the vdnscope value to 3, then 4, and so on. In my case, vdnscope-5 did the trick.

vCloud Director is awesome. VXLAN with 16 million networks capability kicks it up a notch.

Updated 8/22/15: I received a tip from Jon Hemming in the form of a blog comment. Jon states he has written a VMware KB article titled Creating an isolated network in VMware vCloud Director reports the error: vdnscope-x does not exist (2065485) which documents a process to get the correct VDN Scope ID via the REST API of vShield as well as update the vCloud Director database. Thank you Jon! I did find the syntax for the curl statement to be slightly off. The KB article calls for the following syntax:

curl -k -u admin:default -X GET https://vshield.boche.lab/api/2.0/vdn/scopes/

The result is HTTP Status 404 The requested resource is not available.

What did work was:

curl -k -u admin:default -X GET https://vshield.boche.lab/api/2.0/vdn/scopes

The only change was removing the trailing forward slash on the URL.

vCloud Director Error Cannot delete network pool

August 15th, 2015

I ran into a small problem this week in vCloud Director whereby I was unable to Delete a Network Pool. The error message stated Cannot delete network pool because It is still in use. It went on to list In use items along with a moref identifier. This was not right because I had verified there were no vApps tied to the Network Pool. Furthermore the item listed still in use was a dynamically created dvportgroup which also no longer existed on the vNetwork Distributed Switch in vCenter.

I suspect this situation came about due to running out of available storage space earlier in the week on the Microsoft SQL Server where the vCloud database is hosted. I was performing Network Pool work precisely when that incident occurred and I recall an error message at the time in vCloud Director regarding tempdb.

I tried removing state data from QRTZ tables which I blogged about here a few years ago and has worked for specific instances in the past but unfortunately that was no help here. Searching the VMware Communities turned up sparse conversations about roughly the same problem occurring with Org vDC Networks. In those situations, manually editing the vCloud Director database was required.

An obligatory warning on vCloud database editing. Do as I say, not as I do. Editing the vCloud database should be performed only with the guidance of VMware support. Above all, create a point in time backup of the vCloud database with all vCloud Director cell servers stopped (service vmware-vcd stop). There are a variety of methods in which you can perform this database backup. Use the method that is most familiar to and works for you.

Opening up Microsoft SQL Server Management Studio, there are rows in two different tables which I need to delete to fix this. This has to be done in the correct order or else a REFERENCE constraint conflict occurs in Microsoft SQL Server Management Studio and the statement will be terminated.

So after stopping the vCloud Director services and getting a vcloud database backup…

Step 1: Delete the row referencing the dvportgroup in the [vcloud].[dbo].[network_backing] table:

Step 2: Delete the row referencing the unwanted Network Pool in the [vcloud].[dbo].[network_pool] table:

That should take care of it. Start the vCloud Director service in all cell servers (service vmware-vcd start) and verify the Network Pool has been removed.

vCloud Director 5.6.4 Remote consoleproxy issues

June 12th, 2015

vCloud Director is a wonderful IaaS addition to any lab, development, or production environment. When it’s working properly, it is a very satisfying experience wielding the power of agility, consistency, and efficiency vCD provides. However, like many things tech with upstream and human dependencies, it can and does break. Particularly in lab or lesser maintained environments that don’t get all the care and feeding production environments benefit from. When it breaks, it’s not nearly as much fun.

This week I ran into what seemed like a convergence of issues with vCD 5.6.4 relating to the Remote Console functionality in conjunction with SSL certificates, various browser types, networking, and 32-bit Java. As is the case often, what I’m documenting here is really more for my future benefit as there were a number of sparse areas I covered which I won’t necessarily retain in memory long but as it goes with blogs and information sharing, sharing is caring.

The starting point was a functional vCD 5.6.4-2496071 environment on vSphere 5.5. Everything historically and to date working normally with the exception of the vCD console which had stopped working recently in Firefox and Google Chrome browsers. Opening the console in either browser from seemingly any client workstation yielded the pop out console window with toolbar buttons along the top, but there was no guest OS console painted in the main window area. It was blank. The status of the console would almost immediately change to Disconnected. I’ve dealt with permutations of this in the past and I verified all of the usual suspects: NTP, DNS, LDAP, storage capacity, 32-bit Java version, blocked browser plug-ins, etc. No dice here.

In Firefox, the vCD console status shows Disconnected while the Inspect Element console shows repeated failed attempts to connect to the consoleproxy address.

10:11:30.195 "10:11:30 AM [TRACE] mks-connection: Connecting to wss://172.16.21.151/902;cst-t3A6SwOSPRuUqIz18QAM1Wrz6jDGlWrrTlaxH8k6aYuBKilv/1mc7ap50x3sPiHiSJYoVhyjlaVuf6vKfvDPAlq2yukO7qzHdfUTsWvgiZISK56Q4r/4ZkD7xWBltn15s5AvTSSHKsVbByMshNd9ABjBBzJMcqrVa8M02psr2muBmfro4ZySvRqn/kKRgBZhhQEjg6uAHaqwvz7VSX3MhnR6MCWbfO4KhxhImpQVFYVkGJ7panbjxSlXrAjEUif7roGPRfhESBGLpiiGe8cjfjb7TzqtMGCcKPO7NBxhgqU=-R5RVy5hiyYhV3Y4j4GZWSL+AiRyf/GoW7TkaQg==--tp-B5:85:69:FF:C3:0A:39:36:77:F0:4F:7C:CA:5F:FE:B1:67:21:61:53--"1 debug.js:18:12

10:11:30.263 Firefox can't establish a connection to the server at wss://172.16.21.151/902;cst-t3A6SwOSPRuUqIz18QAM1Wrz6jDGlWrrTlaxH8k6aYuBKilv/1mc7ap50x3sPiHiSJYoVhyjlaVuf6vKfvDPAlq2yukO7qzHdfUTsWvgiZISK56Q4r/4ZkD7xWBltn15s5AvTSSHKsVbByMshNd9ABjBBzJMcqrVa8M02psr2muBmfro4ZySvRqn/kKRgBZhhQEjg6uAHaqwvz7VSX3MhnR6MCWbfO4KhxhImpQVFYVkGJ7panbjxSlXrAjEUif7roGPRfhESBGLpiiGe8cjfjb7TzqtMGCcKPO7NBxhgqU=-R5RVy5hiyYhV3Y4j4GZWSL+AiRyf/GoW7TkaQg==--tp-B5:85:69:FF:C3:0A:39:36:77:F0:4F:7C:CA:5F:FE:B1:67:21:61:53--.1 wmks.js:321:0

tail -f /opt/vmware/vcloud-director/logs/vcloud-container-debug.log |grep consoleproxy revealed:
2015-06-12 10:50:54,808 | DEBUG    | consoleproxy              | SimpleProxyConnectionHandler   | Initiated handling for channel 0x22c9c990 [java.nio.channels.SocketChannel[connected local=/172.16.21.151:443 remote=/172.31.101.6:61719]] |
2015-06-12 10:50:54,854 | DEBUG    | consoleproxy              | ReadOperation                  | IOException while reading data: java.io.IOException: Broken pipe |
2015-06-12 10:50:54,855 | DEBUG    | consoleproxy              | ChannelContext                 | Closing channel java.nio.channels.SocketChannel[connected local=/172.16.21.151:443 remote=/172.31.101.6:61719] |
2015-06-12 10:50:55,595 | DEBUG    | consoleproxy              | SimpleProxyConnectionHandler   | Initiated handling for channel 0xd191a58 [java.nio.channels.SocketChannel[connected local=/172.16.21.151:443 remote=/172.31.101.6:61720]] |
2015-06-12 10:50:55,648 | DEBUG    | pool-consoleproxy-4-thread-289 | SSLHandshakeTask               | Exception during handshake: java.io.IOException: Broken pipe |
2015-06-12 10:50:56,949 | DEBUG    | consoleproxy              | SimpleProxyConnectionHandler   | Initiated handling for channel 0x3f0c025b [java.nio.channels.SocketChannel[connected local=/172.16.21.151:443 remote=/172.31.101.6:61721]] |
2015-06-12 10:50:57,003 | DEBUG    | pool-consoleproxy-4-thread-301 | SSLHandshakeTask               | Exception during handshake: java.io.IOException: Broken pipe |
2015-06-12 10:50:59,902 | DEBUG    | consoleproxy              | SimpleProxyConnectionHandler   | Initiated handling for channel 0x1bda3590 [java.nio.channels.SocketChannel[connected local=/172.16.21.151:443 remote=/172.31.101.6:61723]] |
2015-06-12 10:50:59,959 | DEBUG    | pool-consoleproxy-4-thread-295 | SSLHandshakeTask               | Exception during handshake: java.io.IOException: Broken pipe |

In Google Chrome, the vCD console status shows Disconnected while the Inspect element console (F12) shows repeated failed attempts to connect to the consoleproxy address.

10:26:43 AM [TRACE] init: attempting ticket acquisition for vm vcdclient
10:26:44 AM [TRACE] plugin: Connecting vm
10:26:44 AM [TRACE] mks-connection: Connecting to wss://172.16.21.151/902;cst-f2eeAr8lNU6BTmeVelt9L8VKoe92kJJMxZCC2watauBV6/x…fmI8Xg==--tp-B5:85:69:FF:C3:0A:39:36:77:F0:4F:7C:CA:5F:FE:B1:67:21:61:53--
WebSocket connection to 'wss://172.16.21.151/902;cst-f2eeAr8lNU6BTmeVelt9L8VKoe92kJJMxZCC2watauBV6/x…fmI8Xg==--tp-B5:85:69:FF:C3:0A:39:36:77:F0:4F:7C:CA:5F:FE:B1:67:21:61:53--' failed: WebSocket opening handshake was canceled
10:26:46 AM [ERROR] mks-console: Error occurred: [object Event]
10:26:46 AM [TRACE] mks-connection: Disconnected [object Object]

tail -f /opt/vmware/vcloud-director/logs/vcloud-container-debug.log |grep consoleproxy revealed:
2015-06-12 10:48:35,760 | DEBUG    | consoleproxy              | SimpleProxyConnectionHandler   | Initiated handling for channel 0x55efffb3 [java.nio.channels.SocketChannel[connected local=/172.16.21.151:443 remote=/172.31.101.6:61675]] |
2015-06-12 10:48:39,754 | DEBUG    | consoleproxy              | SimpleProxyConnectionHandler   | Initiated handling for channel 0x3f123a13 [java.nio.channels.SocketChannel[connected local=/172.16.21.151:443 remote=/172.31.101.6:61677]] |
2015-06-12 10:48:42,658 | DEBUG    | consoleproxy              | SimpleProxyConnectionHandler   | Initiated handling for channel 0x7793f0a [java.nio.channels.SocketChannel[connected local=/172.16.21.151:443 remote=/172.31.101.6:61679]] |

If you have acute attention to detail, you’ll notice the time stamps from the cell logs don’t correlate closely with the time stamps from the browser Inspect element console. Normally this would indicate time skew or an NTP issue which does cause major headaches with functionality but that’s by design here for my various screen captures and log examples aren’t from the exact same point in time. So it’s safe to move on.

Looking at the most recent vCloud Director For Service Providers installation documentation, I noticed a few things.

  1. Although I did upgrade vCD a few months ago to the most current build at the time, there’s a newer build available: 5.6.4-2619597
  2. Through repetition, I’ve gotten quite comfortable with the use of Java keytool and its parameters. However, additional parameters have been added to the recommended use of the tool. Noted going forward.
  3. VMware self signed certificates expire within three (3) months. Self signed certificates were in use in this environment. I haven’t noticed this behavior in the past nor has it presented itself as an issue but after a quick review, the self signed certificates generated a few months ago with the vCD upgrade had indeed expired recently.

At this point I was quite sure the expired certificates was the problem although it seemed strange the vCD portal was still usable while only the consoleproxy was giving me fits.  So I went through the two minute process of regenerating and installing new self signed certificates for both http and the consoleproxy.  The vCD installation guide more or less outlines this process as it is the same for a new cell installation as it is for replacing certificates. VMware also has a few KB articles which address it as well (1026309, 2014237). For those going through this process, you should really note the keytool parameter changes/additions in the vCD installation guide.

While I was at it, I also built a new replacement cell on a newer version of RHEL 6.5, performed the database upgrades, extended the self signed certificate default expiration from three months to three years, and I retired the older RHEL 6.4 cell. Fresh new cell. New certs. Ready to rock and roll.

Not so much. I still had the same problem with the console showing Disconnected. However, the Inspection element console for each browser are now indicating some new error message which I don’t have handy at the moment but basically it can’t talk to the consoleproxy adddress at all. I tried to ping the address and it was dead from a remote station point of view although it was quite alive at a RHEL 6.5 command prompt. Peters Virtual Notes had this one covered thankfully. According to https://access.redhat.com/site/solutions/53031, a small change is needed for the file /etc/sysctl.conf.

net.ipv4.conf.default.rp_filter = 1

must be changed to

net.ipv4.conf.default.rp_filter = 2

Success. Surely consoleproxy will work now. Unfortunately it still does not want to work. Back to the java.io.IOException: Broken pipe SSL handshake issues although the new certificate for vCD’s http address is registered and working fine (remembering again each vCD cell has two IP addresses, one for http access and one for consoleproxy functionality – each requires a trusted SSL certificate or an exception).

The last piece of the puzzle was something I have never had to do in the past and that is to manually add an exception (Firefox) for the consoleproxy self signed certificate and install it (Google Chrome). For each browser, this is a slightly different process.

For Firefox, browse to the https:// address of the consoleproxy, don’t worry, nothing visible should be displayed when it does not receive a properly formatted request. The key here is to add an exception for the certificate associated specifically to the consoleproxy address.

Once this certificate exception is added, the consoleproxy certificate is essentially trusted and so is the IP address for the host and the console service it is supposed to provide.

To resolve the consoleproxy issue for Google Chrome, the process is slightly different. Ironically I found it easiest to use Internet Explorer for this. Open Internet Explorer and when you do so, be sure to right click on the IE shortcut and Run as administrator (this is key in a moment). Browse to the https:// address of the consoleproxy, don’t worry, nothing visible should be displayed when it does not receive a properly formatted request. Continue to this website and then use the Certificate Error status message in the address bar to view the certificate being presented. The self signed consoleproxy certificate needs to be installed. Start this task using the Install Certificate button. This button is typically missing when launching IE normally but it is revealed when launching IE with Run as administrator rights.

Browse for the location to install the self signed certificate. Tick the box Show physical stores. Drill down under Third-Party Root Certification Authorities. Install the certificate in the Local Computer folder. This folder is typically missing when launching IE normally but it is revealed when launching IE with Run as administrator rights.

Once this certificate is installed, the consoleproxy certificate is essentially trusted in Google Chrome. Just as with the Firefox remedy, the Java SSL handshake with the consoleproxy succeeds and the vCD remote console is rendered.

Note that for Google Chrome, there is another quick method to temporarily gain functional console access without installing the consoleproxy certificate via Internet Explorer.

  1. Open a Google Chrome browser and browse to the https:// address of the consoleproxy.
  2. When prompted with Your connection is not private, click the Advanced link.
  3. Click the Proceed to <console proxy IP address> (unsafe) link.
  4. Nothing will visibly happen except Google Chrome will now temporarily trust the consoleproxy certificate and the vCD remote console will function for as long as a Google Chrome tab remains open.
  5. Without closing Google Chrome, now continue into the vCD organization portal and resume business as usual with functional remote consoles.

On the topic of Google Chrome, internet searches will quickly reveal vCloud Director console issues with Google Chrome and NPAPI. VMware discusses this in the vCloud Director 5.5.2.1 Release Notes:

Attempts to open a virtual machine console on Google Chrome fail
When you attempt to open a virtual machine console on a Google Chrome browser, the operation fails. The occurs due to the deprication of NPAPI in Google Chrome. vCloud Director 5.5.2.1 uses WebMKS instead of the VMware Remote Console to open virtual machine consoles in Google Chrome, which resolves this issue.

I was working with vCD 5.6.x which leverages WebKMS in lieu of NPAPI so the NPAPI issue was not relevant in this case but if you are running into an NPAPI issue, Google offers How to temporarily enable NPAPI plugins here.

Update 8/8/15: Josiah points out a useful VMware forum thread which may help resolve this issue further when FQDNs are defined in DNS for remote console proxies or where multiple vCloud cells are installed in a cluster behind a front end load balancer, NAT/reverse proxy, or firewall.

Update 7/17/20: The VMware Cloud Director virtual appliance with embedded PostgreSQL database by default uses eth0 for the console proxy address along with port 8443. ie. https://100.88.144.13:8443. This is the URL that must be trusted in order to open a VMware Cloud Director remote console without the dreaded Disconnected message. Find this address and port combination to trust in a Disconnected console browser window by pressing SHIFT + CTRL + J or F12 which opens the Elements window. This information was previously published in VMware KB 2058496 Cannot connect to vCloud Director WebMKS console with Mozilla Firefox or Google Chrome which has been taken down but the cached version of the page still remains.

vCloud Director Database Migration

March 20th, 2015

This week I’ve been working on getting some lab infrastructure fitted with much needed updates. One of those components was an aging Microsoft SQL Server 2008 R2 server on Windows Server 2008 R2 which I had been using to host databases for various projects.  Since I had chosen to build the new SQL server in parallel, I’m benefiting with fresh and problem free builds of Microsoft SQL Server 2012 on Windows Server 2012 R2.  The downside is that I’m responsible for dealing with all of the SQL databases and logins and potentially scheduled jobs that must be migrated to the new SQL server.

vCloud Director is one of the last databases left to migrate and fortunately VMware has a KB article published which covers the step required to migrate a back end SQL database for vCloud Director.  The VMware KB article is 2092706 Migrating the VMware vCloud Director SQL database to another server.

Looking at the steps, the migration looks like it will be fairly simple.  VMware even provides the SQL queries to automate many of the tasks.  I’ll migrate my vCloud Director database using these steps in the following video.  I did run into a few issues which mostly boil down to copy/paste problems with the SQL queries as published in the KB article but I’ve provided the necessary corrections and workarounds in the video.

As shown in the video, I ran into a syntax issue with step four.

The SQL query provided by the KB article was:

USE master;
GO
EXEC sp_attach_db @dbname = N’vCD_DB_Name‘,
c:\Program Files\Microsoft SQL Server\MSSQL\Backup\vCD_DB_Name.mdf
c:\Program Files\Microsoft SQL Server\MSSQL\Backup\vCD_DB_Name.ldf
GO

The corrected SQL query syntax according to the Microsoft SQL Server Management Stuido appears to be:

USE [master]
GO
CREATE DATABASE [vCD_DB_Name] ON 
( FILENAME = N'c:\Program Files\Microsoft SQL Server\MSSQL\Backup\vCD_DB_Name.mdf' ),
( FILENAME = N'c:\Program Files\Microsoft SQL Server\MSSQL\Backup\vCD_DB_Name.ldf' )
 FOR ATTACH
GO

Another issue I’ll note that wasn’t captured in the video deals with step seven where the vCloud Director cell server is reconfigured to point to the new database.  The first time I ran that step, the process failed because the cell attempted to locate the SQL database in its original location which it actually found. When this occurred, the cell configuration script doesn’t prompt me to point to a new SQL instance.  In order for step seven to work correctly, I had to drop or delete the database on the SQL 2008 R2 server and rerun the vCloud Director configuration script.  What happens then is that the cell doesn’t automatically ‘find’ the old instance and so it correctly prompts for the new back end database details.  VMware’s KB article provides most of the steps required to migrate the database but it does need a step inserted prior to step seven which calls for the deletion of the original database instance.  Step two places the vCloud database in READ_ONLY mode but the vCloud cell configuration was still able to ‘see’ which causes step seven to fail.

Blake Garner (@trodemaster on Twitter) provided a helpful tip which will also work with step seven in lieu of dropping or deleting the database on the original SQL server:

You could also clear DB config from the /opt/vmware/vcloud-director/etc/global.properties and run configure again.

Overall the process was still fairly simple and painless thanks to VMware’s published documentation.

Microsoft Sysprep Change in vCloud Director 5.5

November 18th, 2013

If you’re like me, you still support legacy Windows operating systems from time to time.  Let’s face it, Windows Server 2003 was a great server operating system and will probably remain in some environments for quite a while.  I won’t at all be surprised if the Windows Server 2003 legacy outlasts that of Windows XP.  To that point, even the VCAP5-DCA  exam I sat a few weeks ago used Windows Server 2003 guests in the lab.

All of that being said in what is almost the year 2014, hopefully you are not still deploying Windows Server 2003 as a platform to deliver applications and services in a production environment.  However, if you are and you’re using VMware vCloud Director 5.5, you should be aware of subtle changes which I noticed while reading through the documentation.  Page 31 of the vCloud Director 5.5 Installation and Upgrade Guide to be exact.

In previous versions of vCloud Director including 5.1, Microsoft Sysprep files were placed in a temporary directory within operating system specific folders on the first cloud cell server in the cluster.  The next step was to invoke the /opt/vmware/vcloud-director/deploymentPackageCreator/createSysprepPackage.sh script which bundled all of the Sysprep files into a /opt/vmware/vcloud-director/guestcustomization/windows_deployment_package_sysprep.cab file.  At this point, Sysprep was installed and configured on the first cell server.  It could then optionally be distributed by way of copying the .cab file and the vcloud_sysprep.properties file to the guestcustomization directory of the other cell servers in the cluster.  I call this step optional because not all vCloud deployments will have multiple cell servers.  If multiple cell servers did exist, you’d likely want all of them to be able to perform guest customization duties for legacy Windows operating systems in the catalog and thus this optional step would be required.

So a few things have changed now in 5.5.  First, the Windows operating system specific folder names have changed to match the folder names which vCenter Server has always used historically (see VMware KB 1005593) and on this note, Windows 2000 Server support has been put out to pasture in vCD 5.5.

Version pre-vCD 5.5 vCD 5.5
Windows 2000 /win2000 unsupported
Windows Server 2003 (32-bit) /win2k3 /svr2003
Windows Server 2003 (64-bit) /win2k3_64 /svr2003-64
Windows XP (32-bit) /winxp /xp
Windows XP (64-bit) /winxp_64 /xp-64

Next, the method to create the Sysprep package and distribute it to the other cell servers has changed.  The createSysprepPackage.sh script no longer exists and as a result, a bundled .cab file is not created.  Instead, the Sysprep files are copied in their entirety to their new directory names within the directory /opt/vmware/vcloud-director/guestcustomization/default/windows/sysprep.  So what you need to do here is create the directory structure under $VCLOUD_HOME and SCP the Sysprep files to each of the cell servers.  I’ve provided the directory creation commands below:

mkdir -p /opt/vmware/vcloud-director/guestcustomization/default/windows/sysprep/svr2003

mkdir -p /opt/vmware/vcloud-director/guestcustomization/default/windows/sysprep/sv42003-64

mkdir -p /opt/vmware/vcloud-director/guestcustomization/default/windows/sysprep/xp

mkdir -p /opt/vmware/vcloud-director/guestcustomization/default/windows/sysprep/xp-64

As the documentation reminds us, the Sysprep files must be readable by the user vcloud.vcloud (this user is created on each cell server during the initial vCloud Director installation) and that can be ensured by running the following command:

chown -R vcloud.vcloud $VCLOUD_HOME/guestcustomization

These installation changes are important to note if you’re deploying a net new vCloud Director 5.5 environment and there is a need for legacy Windows OS vAPP guest customization.  A vCloud Director 5.5 upgrade from previous versions will perform the necessary Sysprep migration steps automatically.  Lastly, Sysprep won’t be needed in vCloud environments where guest customization isn’t required or legacy versions of Windows aren’t being deployed and customized (Beginning with Windows Vista and Windows Server 2008, Sysprep is bundled within the operating system).

vCloud Director, RHEL 6.3, and Windows Server 2012 NFS

July 16th, 2013

One of the new features introduced in vCloud Director 5.1.2 is cell server support on the RHEL 6 Update 3 platform (you should also know that cell server support on RHEL 5 Update 7 was silently removed in the recent past – verify the version of RHEL in your environment using cat /etc/issue).  When migrating your cell server(s) to RHEL 6.3, particularly from 5.x, you may run into a few issues.

First is the lack of the libXdmcp package (required for vCD installation) which was once included by default in RHEL 5 versions.  You can verify this at the RHEL 6 CLI with the following command line:

yum search libXdmcp

or

yum list |grep libXdmcp

Not to worry, the package is easily installable by inserting/mounting the RHEL 6 DVD or .iso, copying the appropriate libXdmcp file to /tmp/ and running either of the following commands:

yum install /tmp/libXdmcp-1.0.3-1.el6.x86_64.rpm

or

rpm -i /tmp/libXdmcp-1.0.3-1.el6.x86_64.rpm

Update 6/22/18: It is really not necessary to point to a package file location or a specific version (this overly complicates the task) when a YUM repository is created. Also… RHEL7 Infrastructure Server base environment excludes the following packages required by vCloud Director 9.1 for Service Providers:

  • libICE
  • libSM
  • libXdmcp
  • libXext
  • libXi
  • libXt
  • libXtst
  • redhat-lsb

If the YUM DVD repository has been created and the RHEL DVD is mounted, install the required packages with the following one liner:

yum install -y libICE libSM libXdmcp libXext libXi libXt libXtst redhat-lsb

Next up is the use of Windows Server 2012 (or Windows 8) as NFS for vCloud Transfer Server Storage in conjunction with the newly supported RHEL 6.3.  Creating the path and directory for the Transfer Server Storage is performed during the initial deployment of vCloud Director using the command mkdir -p /opt/vmware/vcloud-director/data/transfer. When mounting the NFS export for Transfer Server Storage (either manually or via /etc/fstab f.q.d.n:/vcdtransfer/opt/vmware/vcloud-director/data/transfer nfs rw 0 0 ), the mount command fails with error message mount.nfs: mount system call failed. I ran across this in one particular environment and my search turned up Red Hat Bugzilla – Bug 796352.  In the bug documentation, the problem is identified as follows:

On Red Hat Enterprise Linux 6, mounting an NFS export from a Windows 2012 server failed due to the fact that the Windows server contains support for the minor version 1 (v4.1) of the NFS version 4 protocol only, along with support for versions 2 and 3. The lack of the minor version 0 (v4.0) support caused Red Hat Enterprise Linux 6 clients to fail instead of rolling back to version 3 as expected. This update fixes this bug and mounting an NFS export works as expected.

Further down in the article, Steve Dickson outlines the workarounds:

mount -o v3 # to use v3

or

Set the ‘Nfsvers=3’ variable in the “[ Server “Server_Name” ]”
section of the /etc/nfsmount.conf file
An Example will be:
[ Server “nfsserver.lab.local” ]
Nfsvers=3

The first option works well at the command line but doesn’t lend itself to /etc/fstab syntax so I opted for the second option which is to establish a host name and NFS version in the /etc/nfsmount.conf file.  With this method, the mount is attempted as called for in /etc/fstab and by reading /etc/nfsmount.conf, the mount operation succeeds as desired instead of failing at negotiation.

There is a third option which would be to avoid the use of /etc/fstab and /etc/nfsmount altogether and instead establish a mount -o v3 command in /etc/rc.local which is executed at the end of each RHEL boot process.  Although this may work, it feels a little sloppy in my opinion.

Lastly, one could install the kernel update (Red Hat reports as being fixed in kernel-2.6.32-280.el6). The kernel package update is located here.

Update 5/27/18: See also http://www.boche.net/blog/2012/07/03/creating-vcloud-director-transfer-server-storage-on-nfs/ for other new requirements when trying to mount NFS exports with RHEL 7.5.