Archive for June, 2010

ESX and the Service Console Are Going Away

June 17th, 2010

ESX and the Service Console are going away.  Theories on this are evident – plastered all over the internet:  here, here, here, here, here, here, here, etc.

Go to Google and perform the following search:

esx “service console” “going away”

Better yet, let me Google that for you (Thank you Doug for the introduction to this wonderfully funny tool!)

ESXi was first introduced on December 10th, 2007.  We’ve had 2 1/2 years to get familiar with this hypervisor which is minimal in size but as feature rich, as powerful, and as fast as ESX.  The management tools for ESXi have evolved and the platform has been given its due time to prove its stability and viability as an enterprise bare metal hypervisor in the datacenter.

I conducted an informal poll on Twitter this week and a large number of respondents claimed to still be using ESX.  More alarming was the disposition of some who have no plans whatsoever to go to ESXi.  One person went so far as to say the Service Console would have to be pried from his cold dead hands.

If you have not yet broken your dependency on ESX and the Service Console, I suggest you do it soon.  Time is running out.  Don’t wait until the last minute.  Be sure you leave yourself enough time to architect and test a sound ESXi design for your datacenter, as well as get familiar with the tools you’ll be dependent on to manage the ESXi hypervisor.

VMware Tools install – A general system error occurred: Internal error

June 16th, 2010

When you invoke the VMware Tools installation via the vSphere Client, you may encounter the error “A general system error occurred: Internal error“.

6-16-2010 9-45-42 AM

One thing to check is that the VM shell has the correct operating system selected for the guest operating system type.  For example, a setting of “Other (32-bit)” will cause the error since VMware cannot determine the correct version of the tools to install in the guest operating system because the flavor of guest operating system is unknown (ie. Windows or Linux).

6-16-2010 10-45-49 AM

Other causes for this error can be found at VMware KB Article 1004718:

The virtual machine has CD-ROM configured.
The windows.iso is present under the /vmimages/tools-iso/ folder.
The virtual machine is powered on.
The correct guest operating system selected. For example, if the guest operating system is Windows 200, ensure you have chosen Windows 2000 and not Other.

Active Directory Problems

June 13th, 2010

I’ll borrow an introduction from a blog post I wrote a few days ago titled NFS and Name Resolution because it pretty much applies to this blog post as well:

Sometimes I take things for granted. For instance, the health and integrity of the lab environment. Although it is “lab”, I do run some workloads which are key to keep online on a regular basis. Primarily the web server which this blog is served from, the email server which is where I do a lot of collaboration, and the Active Directory Domain Controllers/DNS Servers which provide the authentication mechanisms, mailbox access, external host name resolution to fetch resources on the internet, and internal host name resolution.

The workloads and infrastructure in my lab are 100% virtualized. The only “physical” items I have are type 1 hypervisor hosts, storage, and network. By this point I’ll assume most are familiar with the benefits of consolidation. The downside is that when the wheels come off in a highly consolidated environment, the impacts can be severe as they fan out and tip over down stream dependencies like dominos.

Due to my focus on VMware virtualization, the Microsoft Active Directory Domain Controllers hadn’t been getting the care and feeding they needed.  Quite honestly, there have several “lights out” situations in the lab due to one reason or another.  The lab infrastructure VMs and their underlying operating systems have taken quite a beating but continued running.  Occassionally a Windows VM would detect a need for a CHKDSK .  Similarly, Linux VMs wanted an FSCK.  But they would faithfully return to a login prompt.

A week ago today, the DCs succumbed to the long term abuse.  Symptoms were immediately apparent in that I could not connect to the Exchange 2010 server to access my email and calendar.  In addtion, I had lost access to the network drives on the file server.  Given the symptoms, I knew the issue was Active Diriectory related, however, I quickly found out the typcal short term remedies weren’t working.  I looked at the Event Logs for both DCs.  Both were a disaster and looking at the history, they had been ill for quite a long time.  I was going to have to really dig in to resolve this problem.

I spent several of the following evenings trying to resolve the problem.  As each day passed, anxiety was building because I was lacking email which is where I do a lot of work out of.  I had cleaned up AD meta data on both DCs, I had removed DCs to narrow the problem down, I examined DNS checking the integrity of AD integrated SRV records.  I had restored the DCs to an isolated network from prior backups to no avail.  Although AD was performing some base authentication, there were a handful of symptoms remaining which would indicate AD was still not happy.  A few of the big ones were:

  1. Exchange Services would either not start or would hang on starting
  2. SYSVOL and NETLOGON shares were not online on the DCs
  3. NETDIAG and DCDIAG tests on the DCs both had major failures, primarily inability to locate any DCs, Global Catalog Servers, time servers, or domain names

All of these problems utlimately tied to an error in the File Replication Service log on the DCs:

Event Type: Warning
Event Source: NtFrs
Event Category: None
Event ID: 13566
Date: 6/10/2010
Time: 9:15:56 PM
User: N/A
Computer: OBIWAN
Description:
File Replication Service is scanning the data in the system volume. Computer OBIWAN cannot become a domain controller until this process is complete. The system volume will then be shared as SYSVOL. 

To check for the SYSVOL share, at the command prompt, type:
net share 

When File Replication Service completes the scanning process, the SYSVOL share will appear.

The initialization of the system volume can take some time. The time is dependent on the amount of data in the system volume.

I had waited a long period of time for the scan to complete, but it had become apprent that the scan was never going to complete on its own.  After quite a bit of searching, I came up with Microsoft KB Article 263532 How to perform a disaster recovery restoration of Active Directory on a computer with a different hardware configuration.  Specifically, step 3j provided the answer to solving the root cause of the problem.  There is a registry value called BurFlags located in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\
Backup/Restore\Process at Startup\
.  The value needs to be set to d4 to allow SYSVOL to be shared out.

 Once this registry value was set, all of the problems I was experiencing went away. Exchange services started and I had access to my Email after a four day inbox vacation.  I had been through a few instances of AD meta data cleanup but this turned out to be a more complex problem than that.  I am thankful for internet search engines because I probably would have never solved this problem without the MS KB Article.  I was actually coming close to wiping my current AD and starting over, although I knew that would be pretty painful considering the integration of other components like Exchange, SQL, Certificate Services, DNS, Citrix, etc. that was tied to it.

NFS and Name Resolution

June 11th, 2010

Sometimes I take things for granted.  For instance, the health and integrity of the lab environment.  Although it is “lab”, I do run some workloads which are key to keep online on a regular basis. Primarily the web server which this blog is served from, the email server which is where I do a lot of collaboration, and the Active Directory Domain Controllers/DNS Servers which provide the authentication mechanisms, mailbox access, external host name resolution to fetch resources on the internet, and internal host name resolution.

The workloads and infrastructure in my lab are 100% virtualized.  The only “physical” items I have are type 1 hypervisor hosts, storage, and network.  By this point I’ll assume most are familiar with the benefits of consolidation.  The downside is that when the wheels come off in a highly consolidated environment, the impacts can be severe as they fan out and tip over down stream dependencies like dominos.

A few weeks ago I had decided to recarve the EMC Celerra fibre channel SAN storage.  The VMs which were running on the EMC fibre channel block storage were all moved to NFS on the NetApp filer.  Then last week, the Gb switch which supports all the infrastructure died.  Yes it was a single point of failure – it’s a lab.  The timing for that to happen couldn’t have been worse since all lab workloads were running on NFS storage.  All VMs had lost their virtual storage and the NFS connections on the ESX(i) hosts eventually timed out.

The network switch was replaced later that day and since all VMs were down and NFS storage had disconnected, I took the opportunity to gracefully reboot the ESX(i) hosts; good time for a fresh start.  Not surprised, I had to use the vSphere Client to connect to each host by IP address since at that point I had no functional DNS name resolution in the lab whatsoever. When the hosts came back online, I was about to begin powering up VMs, but instead I encountered a situation which I hadn’t planned for – all the VMs were grayed out, esentially disconnected.  I discovered the cause of this was that after the host reboot, the NFS storage hadn’t come back online – both NetApp and EMC Celerra – on both hosts.  There’s no way both storage cabinets and/or both hosts were having a problem at the same time so I assumed it was a network or cabling problem. With the NFS mounts in the vSphere client staring back at me in their disconnected state, it dawned on me – lack of DNS name resolution was preventing the hosts from connecting to the storage.  The hosts could not resolve the FQDN name of the EMC Celerra or the NetApp filer storage.  I modified /etc/hosts on each ESX(i) host, adding the TCP/IP address and FQDN for the NetApp filer and Celerra Data Movers.  Shortly after I was back in business.

What did I learn?  Not much.  It was more a reiteration of important design considerations which I was already aware of:

  1. 100% virtualization/consolidation is great – when it works.  The web of upstream/downstream dependencies makes it a pain when something breaks.  Consolidated dependencies which you might consider leaving physical or placing in a separate failure domain:
    • vCenter Management
    • Update Manager
    • SQL/Oracle back ends
    • Name Resolution (DNS/WINS)
    • DHCP
    • Routing
    • Active Directory/FSMO Roles/LDAP/Authentication/Certification Authorities
    • Mail
    • Internet connectivity
  2. Hardware redundancy is always key but expensive.  Perform a risk assessment and make a decision based on the cost effectiveness.
  3. When available, diversify virtualized workload locations to reduce failure domain, particularly to split workloads which provide redundant infrastructure support such as Active Directory Domain Controllers, DNS servers.  This can mean placing workloads on separate hosts, separate clusters, separate datastores, separate storage units, maybe even separate networks depending on the environment.
  4. Static entires in /etc/hosts isn’t a bad idea as a fallback if you plan on using NFS in an environment with unreliable DNS but I think the better point to discuss is the risk and pain which will be realized in deploying virtual infrastructure in an unreliable environment. Garbage In – Garbage Out.  I’m not a big fan of using IP addresses to mount NFS storage unless the environment is small enough.

New Microsoft .NET Framework Update Breaks vSphere Client

June 10th, 2010

Just a quick heads up to bring attention to an issue which I caught on Twitter.  VMware published KB 1022611 today which describes a new issue that is introduced by a recent Microsoft .NET Framework 2.0 SP2 & 3.5 SP1 update.  Upon installing the update, the vSphere Client stops working.  According to the article, the issue impacts ESX(i)3.5, 4.0, and vCenter 4.0.  Contrary to the topic of this blog post, I am not placing blame on Microsoft.  It remains unclear to me which company’s development staff is responsible for the software incompatibility.  Microsoft obviously issued the udpate which revealed the problem, but VMware has some skin in this as well in that they need to make sure they are following Microsoft .NET Framework development standards and best practices for their enterprise hypervisor management.

Key details from the VMware KB article:

The vSphere Clients, prior to the Update 1 release, cannot be used to access the vCenter Server or ESX hosts. A Microsoft update that targets the .NET Framework, released on June 9th 2010 is causing this issue. The update http://support.microsoft.com/kb/980773 causes the vSphere Client to stop working.    To correct the issue there are two options that can be performed:

  • Remove the MS update from your Windows operating system. The vSphere Client works after the update is removed.

Note: This affects Windows XP, Windows 2003, Windows 2008, Windows Vista, and Windows 7.

Win A Free VMworld Pass From boche.net

June 6th, 2010

6-6-2010 12-31-26 PMThe economy has been rough.  Individuals and businesses have felt the impacts in various ways.  Reduction of income or revenue.  Increased operational expenses.  Reduction in valuation of homes or assets.  Downsizing of staff.  The slashing of budgets, including training, conferences, and travel.  Those who are in verticals which tail economic trends by a year or two will begin feeling the impacts soon.

As a reader of this blog, you already know VMworld 2010 in San Francisco is just a few months away.  If you’re like me, you’re wondering “How am I going to get there this year?”  Due to the reasons I’ve outlined above, details are sketchy on whether or not you’ll get to go.  Management says “Ask again in August, we’ll let you know.”  It doesn’t sound promising.  Or maybe you’ve already been told “It’s just too expensive given the economy, sorry.”

boche.net would like to help.  If you can get yourself to the door of the Moscone Center, boche.net will get you in.  This is a $1,895 value if you were to purchase a conference pass at the door.  There is no purchase necessary for this contest other than your own T&E (transportation, hotel, van down by the river, etc.)  On Friday June 18th, 2010, one random and lucky winner who has followed the contest rules completely (detailed below) will be revealed.

The intent here is not to save a company money.  Rather, to make the difference between someone going to VMworld versus not going.  Therefore, I would appreciate it if entries would be limited to those who do not already have budget approval for the VMworld conference pass.  At the same time, should you win, you owe it to yourself and the other contestants to follow through and attend the conference.  It would be a shame for the pass to go to waste.  Perhaps another blogger or vendor would like to co-sponsor airfare or hotel for the winner.  Consider this an open invitation for co-sponsorship.

Be sure to read the VMworld 2010 FAQ so that you thoroughly understand the conference logistics, ensuring you are an eligible candidate to attend.

Update 6/6/10: I’m happy to announce that Gestalt IT has graciously offered to pay for the airfare.  In addition to the VMworld conference pass, Gestalt IT will provide the winner with round trip airfare, up to $500.  All we ask in return is that the recipient provide a post-VMworld write-up of what they learned from attending the conference.  This could be a written document, a blog post, a video, you choose.  Thank you Gestalt IT for your donation!

Contest Rules:

  1. Post one comment/reply and only one comment/reply to this blog article below.
    • Include your first and last name.
    • Provide a valid email address when completing the comment form.
    • Include a short bio about yourself and how you use VMware currently or how you would like to leverage VMware products.
    • Include three (3) things you are looking to gain from attending VMworld 2010 (ie. Why do you want to go?)
    • Contest entry must be recieved by noon CST Thursday June 17th, 2010.
  2. One (1) random winner will be chosen Thursday evening June 17th, 2010.
    • Winner will be contacted via email address provided above.
    • Winner will recieve a VMworld 2010 San Francisco conference pass.
    • Winner will receive airfare up to $500 from Gestalt IT.
    • Winner will provide a post-VMworld write-up of what they learned from attending the conference to Gestalt IT.
  3. Contest results will posted Friday June 18th, 2010.
  4. The conference pass is non-transferrable and non-refundable.
  5. Hotel, meals, and other expenses are not covered by boche.net.
  6. No purchase necessary.

Good Luck!

Update 6/17/10: 

WooHoo!

A name has been randomly drawn and we have a winner! Congratulations to contest winner Greg Stuart who will be receiving a VMworld 2010 conference pass and round trip airfare (up to $500).  Greg’s winning entry and BIO is listed below:

I currently work for an organization that has begun to leverage VMware more and more. I’m new to virtualization and would like to gain a better knowledge of the VMware products, attend some hands on sessions and come back with solutions that I can employ in our environment. The ability to discuss scenarios and solutions with vendors in person would be awesome.

I’m pleased with the outcome of the contest.  Greg is new to virtualization and I think there is a lot of valuable information he will be able to pick up at VMworld.  Better yet, VMworld is a 4 day event this year – Monday thru Thursday instead of 3 days as it was prior years.  This affords Greg the opportunity to take in a whole extra day of content.

Thank you to all who participated in the contest including Gestalt IT for contributed the round trip airfare.  Although there could ultimately be only one grand prize contest winner, my hope is that you all will make the show this year somehow.  There are nearly 90 comments/replies to this post explaining the values which VMworld can provide. Much of this content could be borrowed to write or improve your own compelling justification, hopefully earning you a trip to VMworld.

Everyone have a great weekend!

SRM Survey – Free SRM Book

June 1st, 2010

The VMware SRM team is conducting a formal survey on the SRM product and they’d like to hear your feedback.  VMware values your time and suggestions – in return for completing the survey, VMware will donate $10 to UNICEF (for the first 1,000 respondants) and you’ll be eligible to download an electronic copy of Mike Laverick’s Administring VMware Site Recoveyr Manager 4.0 book.

You can read more about this event here.

Complete the survey here.