VMware VCAP4-DCA BETA Exam Experience

June 21st, 2010 by jason No comments »

6-21-2010 7-09-54 PMThis morning, I sat the VMware VCAP4-DCA BETA exam at a VUE testing facility in Eau Claire, WI – a 110 mile drive from my normal area.  Today was the last day to take the exam and the Wisconsin location was the only available facility as of last week when I scheduled the exam.  This is the first time I had traveled extensively to take an exam.  Although it was not my first preference, I did so for the following reasons:

  1. The exam price was discounted $300 off since it was beta.  For this price it was worth a shot to pass.
  2. Declining the location would have meant declining the exam since today was the deadline; hence I’d have to wait a few months when the exam went live.
  3. I wanted to get the exam out of the way (hopefully) and help others prep once I had the experience.
  4. I’d never written a beta exam before.
  5. This was my 1st beta invitation from VMware.  I probably wouldn’t receive a 2nd if I had refused the 1st.  <– Godfather reference

I used the exam blueprint as a guide for what to study.  I was a bothered by a few of the technologies on the exam blueprint which I didn’t have much experience with:  vShield Zones, Orchestrator, and vCenter Heartbeat.  Might as well add PVLANs to the list too.  I was also a bit bothered by lack of study time.  VMware had just scheduled this exam for me late last week.  Thursday or Friday.

The VCAP is an Advanced Professional certification.  As such, I came into the exam expecting it to be similar to the VI3 Enterprise Administration exam and tougher than the VCP exam.  From a challenge aspect, the VCAP-DCA exam did not disappoint.  It covered several features which were new to vSphere leaving little room for overlap from previous exams.  Obviously, I cannot go into details on specific questions due to the standard NDA policy around certification exams.  Suffice to say, the exam blueprint mentioned earlier is a good resource.  The blueprint covers broad objectives.  Expect to dig deeper for each objective listed.  Those who complain about the VCP exam being “too easy” should enjoy the VCAP series of exams if the beta exam is a relevant indicator.

Like the Enterprise Administration exam, the VCAP4-DCA exam has a live lab environment which is used to accumulate points for questions asked.  Unlike the EA exam which had 11 lab questions and the remainder written/multiple choice, the DCA exam is 100% lab and no multiple choice.  The exam tests working knowledge of the products and not as much memorization.  The beta exam was 41 questions in length with an alotted time of 4 1/2 hours.  I liked the EA exam from the perspective that the lab questions quickly made sense to me and I think I scored a lot of points in the lab.  For this reason, I felt the DCA exam would be right up my alley, being 100% lab.  I was half right.  The DCA exam is very challenging.  If there is something in the lab you don’t understand or did not study for, there’s no multiple choice correct answer staring you in the face so you at least have a statistical chance of getting the answer correct.  To use a made up example, you either know how to enable root SSH access on a Service Console, or you don’t.  If you had to guess, you’d never get it right, thus you lose points on the question.  Working in the lab was a fun approach, but the flip side is not knowing enough of the content will kill you for lack of multiple choice guess.  Some of the community laughed at the VCP exam.  VMware has answered with the VCAP.

Now for the bad news.  The lab testing environment, in my experience, was riddled with issues.  Most notably, “glyphs” painted randomly about the screen due to screen refresh/repaint issues.  They are an incredible distraction and in many cases, they covered up buttons and hyperlinks in the vSphere client such that if you didn’t know the buttons were supposed to be there, you’d never find them to complete your task.  Since I know the vSphere Client fairly well, I knew where to blindly click in an area to force a repaint of the screen.  I had other issues as well which prohibited me from answering questions.  I notified the proctor who called support while I continued with the exam.  About 30 minutes later, someone rudely took remote control over my screen and logged me out while I was in the middle of a lab.  I was then logged back in and told to continue, problem solved.  Problem was not solved as it had nothing to do with the VUE equipment, rather it was internal to the remote lab.  I had the proctors open an incident case with VMware.  At one point later I was pulled out of the testing room and put on the phone with VMware support.  Suffice to say, the problem didn’t get resolved and several questions will have been impacted.  In addition, for the time spent troubleshooting the lab, the clock was ticking.  I’m not sure if I was losing time while on the phone with VMware.

The combination of struggling with the previously mentioned issues, coupled with poor time management on some other questions, resulted in me running out of exam time before completing the last question. I wasn’t even close to finishing.  I needed about another hour.  Part of the key to this exam, other than obviously knowing the content, is to be able to digest the information in the questions quickly and accurately.  This is good because it’s a fundamental core competency in the VCDX process as well as in the life of an Architect.  The anal person that I am, I found myself going back and forth between test question and lab to be sure I was doing everything PERFECTLY.  In the long run, I think it cost me.  I noted in a few of my previous exam blog posts that I found myself struggling with time issues on certification exams lately.  This was no exception.  I need to move faster, but not at the expense of accuracy.

I left the exam facility in a stunned zombie state.  I wasn’t pissed.  I was disappointed in my own self on several questions – like any exam, it revealed my weaknesses.  The exam was a lot more challenging than I expected.  Lab issues aside, I think VMware did a good job with the difficulty of the questions.  Now I just need to wait a few weeks for the results.  Nothing I’ve experienced compares to the drama and anxiety created by the VCDX defense process and grading period.  If by chance I do not pass the DCA exam, it will be an ego crush but I will survive, retake, and the result will be a sharper skillset – which is my primary reason for certification in the first place.  Retaking an electronic exam after a 10 day wait is not a big deal compared to the consequence, wait, and expense of not passing the VCDX defense process.  Knowing this consoles me.  Now that the beta period is over for the DCA exam, others will get their chance at this exam hopefully in a month or two, and perhaps I will  again as well.  I haven’t felt positive about my last few exams and I passed.  We’ll see about this one.

Update 6/22/10:  I failed to mention William Lam and Chris Dearden also have great summaries of their VCAP4-DCA BETA exam experiences.  Be sure to check them out.

Update 10/14/10:  I passed.

vEXPERT 2010

June 20th, 2010 by jason No comments »

Friday June 4th, 2010

Hello Jason,

I am pleased to announce that you have been designated as a VMware vExpert 2010 and I invite you to participate in our program this year. This award is based on your advocacy of VMware solutions, your contributions to the community of VMware users, and your willingness to share your expertise with others.

On behalf of everyone here at VMware, thank you.

The excerpt above says it all.  Thank you for this award VMware.  For vEXPERT last year I received a very nice vEXPERT pen, folder, and lapel pin. 

This year, I would like to once again add VMware NFR licenses to the wish list for vEXPERTs.  Access to products such as vSphere, SRM, vCenter Heartbeat, Chargeback, Lab Manager, View, etc. without having to request new licenses and rebuild every 30-60 days would be much appreciated.

Thanks again and I look forward to another great year working with VMware’s products and excellent staff.

ESX and the Service Console Are Going Away

June 17th, 2010 by jason No comments »

ESX and the Service Console are going away.  Theories on this are evident – plastered all over the internet:  here, here, here, here, here, here, here, etc.

Go to Google and perform the following search:

esx “service console” “going away”

Better yet, let me Google that for you (Thank you Doug for the introduction to this wonderfully funny tool!)

ESXi was first introduced on December 10th, 2007.  We’ve had 2 1/2 years to get familiar with this hypervisor which is minimal in size but as feature rich, as powerful, and as fast as ESX.  The management tools for ESXi have evolved and the platform has been given its due time to prove its stability and viability as an enterprise bare metal hypervisor in the datacenter.

I conducted an informal poll on Twitter this week and a large number of respondents claimed to still be using ESX.  More alarming was the disposition of some who have no plans whatsoever to go to ESXi.  One person went so far as to say the Service Console would have to be pried from his cold dead hands.

If you have not yet broken your dependency on ESX and the Service Console, I suggest you do it soon.  Time is running out.  Don’t wait until the last minute.  Be sure you leave yourself enough time to architect and test a sound ESXi design for your datacenter, as well as get familiar with the tools you’ll be dependent on to manage the ESXi hypervisor.

VMware Tools install – A general system error occurred: Internal error

June 16th, 2010 by jason No comments »

When you invoke the VMware Tools installation via the vSphere Client, you may encounter the error “A general system error occurred: Internal error“.

6-16-2010 9-45-42 AM

One thing to check is that the VM shell has the correct operating system selected for the guest operating system type.  For example, a setting of “Other (32-bit)” will cause the error since VMware cannot determine the correct version of the tools to install in the guest operating system because the flavor of guest operating system is unknown (ie. Windows or Linux).

6-16-2010 10-45-49 AM

Other causes for this error can be found at VMware KB Article 1004718:

The virtual machine has CD-ROM configured.
The windows.iso is present under the /vmimages/tools-iso/ folder.
The virtual machine is powered on.
The correct guest operating system selected. For example, if the guest operating system is Windows 200, ensure you have chosen Windows 2000 and not Other.

Active Directory Problems

June 13th, 2010 by jason No comments »

I’ll borrow an introduction from a blog post I wrote a few days ago titled NFS and Name Resolution because it pretty much applies to this blog post as well:

Sometimes I take things for granted. For instance, the health and integrity of the lab environment. Although it is “lab”, I do run some workloads which are key to keep online on a regular basis. Primarily the web server which this blog is served from, the email server which is where I do a lot of collaboration, and the Active Directory Domain Controllers/DNS Servers which provide the authentication mechanisms, mailbox access, external host name resolution to fetch resources on the internet, and internal host name resolution.

The workloads and infrastructure in my lab are 100% virtualized. The only “physical” items I have are type 1 hypervisor hosts, storage, and network. By this point I’ll assume most are familiar with the benefits of consolidation. The downside is that when the wheels come off in a highly consolidated environment, the impacts can be severe as they fan out and tip over down stream dependencies like dominos.

Due to my focus on VMware virtualization, the Microsoft Active Directory Domain Controllers hadn’t been getting the care and feeding they needed.  Quite honestly, there have several “lights out” situations in the lab due to one reason or another.  The lab infrastructure VMs and their underlying operating systems have taken quite a beating but continued running.  Occassionally a Windows VM would detect a need for a CHKDSK .  Similarly, Linux VMs wanted an FSCK.  But they would faithfully return to a login prompt.

A week ago today, the DCs succumbed to the long term abuse.  Symptoms were immediately apparent in that I could not connect to the Exchange 2010 server to access my email and calendar.  In addtion, I had lost access to the network drives on the file server.  Given the symptoms, I knew the issue was Active Diriectory related, however, I quickly found out the typcal short term remedies weren’t working.  I looked at the Event Logs for both DCs.  Both were a disaster and looking at the history, they had been ill for quite a long time.  I was going to have to really dig in to resolve this problem.

I spent several of the following evenings trying to resolve the problem.  As each day passed, anxiety was building because I was lacking email which is where I do a lot of work out of.  I had cleaned up AD meta data on both DCs, I had removed DCs to narrow the problem down, I examined DNS checking the integrity of AD integrated SRV records.  I had restored the DCs to an isolated network from prior backups to no avail.  Although AD was performing some base authentication, there were a handful of symptoms remaining which would indicate AD was still not happy.  A few of the big ones were:

  1. Exchange Services would either not start or would hang on starting
  2. SYSVOL and NETLOGON shares were not online on the DCs
  3. NETDIAG and DCDIAG tests on the DCs both had major failures, primarily inability to locate any DCs, Global Catalog Servers, time servers, or domain names

All of these problems utlimately tied to an error in the File Replication Service log on the DCs:

Event Type: Warning
Event Source: NtFrs
Event Category: None
Event ID: 13566
Date: 6/10/2010
Time: 9:15:56 PM
User: N/A
Computer: OBIWAN
Description:
File Replication Service is scanning the data in the system volume. Computer OBIWAN cannot become a domain controller until this process is complete. The system volume will then be shared as SYSVOL. 

To check for the SYSVOL share, at the command prompt, type:
net share 

When File Replication Service completes the scanning process, the SYSVOL share will appear.

The initialization of the system volume can take some time. The time is dependent on the amount of data in the system volume.

I had waited a long period of time for the scan to complete, but it had become apprent that the scan was never going to complete on its own.  After quite a bit of searching, I came up with Microsoft KB Article 263532 How to perform a disaster recovery restoration of Active Directory on a computer with a different hardware configuration.  Specifically, step 3j provided the answer to solving the root cause of the problem.  There is a registry value called BurFlags located in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\
Backup/Restore\Process at Startup\
.  The value needs to be set to d4 to allow SYSVOL to be shared out.

 Once this registry value was set, all of the problems I was experiencing went away. Exchange services started and I had access to my Email after a four day inbox vacation.  I had been through a few instances of AD meta data cleanup but this turned out to be a more complex problem than that.  I am thankful for internet search engines because I probably would have never solved this problem without the MS KB Article.  I was actually coming close to wiping my current AD and starting over, although I knew that would be pretty painful considering the integration of other components like Exchange, SQL, Certificate Services, DNS, Citrix, etc. that was tied to it.

NFS and Name Resolution

June 11th, 2010 by jason No comments »

Sometimes I take things for granted.  For instance, the health and integrity of the lab environment.  Although it is “lab”, I do run some workloads which are key to keep online on a regular basis. Primarily the web server which this blog is served from, the email server which is where I do a lot of collaboration, and the Active Directory Domain Controllers/DNS Servers which provide the authentication mechanisms, mailbox access, external host name resolution to fetch resources on the internet, and internal host name resolution.

The workloads and infrastructure in my lab are 100% virtualized.  The only “physical” items I have are type 1 hypervisor hosts, storage, and network.  By this point I’ll assume most are familiar with the benefits of consolidation.  The downside is that when the wheels come off in a highly consolidated environment, the impacts can be severe as they fan out and tip over down stream dependencies like dominos.

A few weeks ago I had decided to recarve the EMC Celerra fibre channel SAN storage.  The VMs which were running on the EMC fibre channel block storage were all moved to NFS on the NetApp filer.  Then last week, the Gb switch which supports all the infrastructure died.  Yes it was a single point of failure – it’s a lab.  The timing for that to happen couldn’t have been worse since all lab workloads were running on NFS storage.  All VMs had lost their virtual storage and the NFS connections on the ESX(i) hosts eventually timed out.

The network switch was replaced later that day and since all VMs were down and NFS storage had disconnected, I took the opportunity to gracefully reboot the ESX(i) hosts; good time for a fresh start.  Not surprised, I had to use the vSphere Client to connect to each host by IP address since at that point I had no functional DNS name resolution in the lab whatsoever. When the hosts came back online, I was about to begin powering up VMs, but instead I encountered a situation which I hadn’t planned for – all the VMs were grayed out, esentially disconnected.  I discovered the cause of this was that after the host reboot, the NFS storage hadn’t come back online – both NetApp and EMC Celerra – on both hosts.  There’s no way both storage cabinets and/or both hosts were having a problem at the same time so I assumed it was a network or cabling problem. With the NFS mounts in the vSphere client staring back at me in their disconnected state, it dawned on me – lack of DNS name resolution was preventing the hosts from connecting to the storage.  The hosts could not resolve the FQDN name of the EMC Celerra or the NetApp filer storage.  I modified /etc/hosts on each ESX(i) host, adding the TCP/IP address and FQDN for the NetApp filer and Celerra Data Movers.  Shortly after I was back in business.

What did I learn?  Not much.  It was more a reiteration of important design considerations which I was already aware of:

  1. 100% virtualization/consolidation is great – when it works.  The web of upstream/downstream dependencies makes it a pain when something breaks.  Consolidated dependencies which you might consider leaving physical or placing in a separate failure domain:
    • vCenter Management
    • Update Manager
    • SQL/Oracle back ends
    • Name Resolution (DNS/WINS)
    • DHCP
    • Routing
    • Active Directory/FSMO Roles/LDAP/Authentication/Certification Authorities
    • Mail
    • Internet connectivity
  2. Hardware redundancy is always key but expensive.  Perform a risk assessment and make a decision based on the cost effectiveness.
  3. When available, diversify virtualized workload locations to reduce failure domain, particularly to split workloads which provide redundant infrastructure support such as Active Directory Domain Controllers, DNS servers.  This can mean placing workloads on separate hosts, separate clusters, separate datastores, separate storage units, maybe even separate networks depending on the environment.
  4. Static entires in /etc/hosts isn’t a bad idea as a fallback if you plan on using NFS in an environment with unreliable DNS but I think the better point to discuss is the risk and pain which will be realized in deploying virtual infrastructure in an unreliable environment. Garbage In – Garbage Out.  I’m not a big fan of using IP addresses to mount NFS storage unless the environment is small enough.

New Microsoft .NET Framework Update Breaks vSphere Client

June 10th, 2010 by jason No comments »

Just a quick heads up to bring attention to an issue which I caught on Twitter.  VMware published KB 1022611 today which describes a new issue that is introduced by a recent Microsoft .NET Framework 2.0 SP2 & 3.5 SP1 update.  Upon installing the update, the vSphere Client stops working.  According to the article, the issue impacts ESX(i)3.5, 4.0, and vCenter 4.0.  Contrary to the topic of this blog post, I am not placing blame on Microsoft.  It remains unclear to me which company’s development staff is responsible for the software incompatibility.  Microsoft obviously issued the udpate which revealed the problem, but VMware has some skin in this as well in that they need to make sure they are following Microsoft .NET Framework development standards and best practices for their enterprise hypervisor management.

Key details from the VMware KB article:

The vSphere Clients, prior to the Update 1 release, cannot be used to access the vCenter Server or ESX hosts. A Microsoft update that targets the .NET Framework, released on June 9th 2010 is causing this issue. The update http://support.microsoft.com/kb/980773 causes the vSphere Client to stop working.    To correct the issue there are two options that can be performed:

  • Remove the MS update from your Windows operating system. The vSphere Client works after the update is removed.

Note: This affects Windows XP, Windows 2003, Windows 2008, Windows Vista, and Windows 7.