VMware vSphere 4.1 HA and DRS technical deepdive arrival

December 7th, 2010 by jason 4 comments »

IMG01201-20101207-1659

I think Eric “Scoop” Sloof was the first to announce this yesterday, complete with a video and everything! Come on Eric, let some of the other bloggers have your scraps. 8-)

I received a copy of a brand new book hot off the presses titled VMware vSphere 4.1 HA and DRS technical deepdive by Duncan Epping and Frank Denneman.  Having just received it tonight, of course I haven’t had time to finish reading it yet.  This is the pre-game party blog post.  Just by thumbing through the pages, I’m going to draw a few conclusions.  I’ll see if I’m right by the time I actually finish reading the book.

  1. 224 pages and 18 chapters in length.  I’ve seen entire virtual infrastructure books which have been written in as many or less pages than this.  And this book covers just HA and DRS.
    Conclusion: Even factoring in a fair amount of diagrams, this will be the most comprehensive HA and DRS handbook in existence.
  2. HA and DRS are perhaps two of the most misunderstood and misinterpreted technologies in VMware’s suite of virtual infrastructure offerings.  What exactly is confusing about these tools?  First, they are both set-it-and-forget-it automation.  The technologies will more or less “just work” out of the box.  This simplicity bestows an overwhelming amount of confidence in cluster configuration because the complexity is masked by an easy to use interface.
    Conclusion: There’s a lot going on under the hood in both HA and DRS that administrators should know about to properly configure and tune their environment.  The detail this book goes into should rock your world.
  3. This book covers DPM.
    Conclusion: That is good.
  4. There are many great looking diagrams and flowcharts.
    Conclusion: Very helpful in reinforcing what’s written in detail.

I look forward to relaxing with this book while on vacation the rest of this week.  Nice job from what I’ve seen so far guys!

You can read a review, write a review, or purchase this book on Amazon’s web site here.

Old Games Revisited

December 1st, 2010 by jason 2 comments »

I got the bug tonight to try one of my old PC games.  I still have several of them on my hard drive dating back to the early to mid 1990′s.  Each time I re-image PC, I make sure that I preserve these games by backing up and restoring their directory structures. 

I wasn’t sure if they would work under Windows 7 but I decided to give it a try.  I made a few attempts to get Doom II launched using various compatibility mode settings but none worked. 

When that failed, I quickly stumbled on skulltag.com.  It’s a free Windows download which lets you play Doom and Doom II on modern Windows platforms.  Not only that, you can play online with other players from the internet.  I downloaded and installed the software and I was literally playing online with another player within a minute.

The following videos bring back a lot of great memories of modem and LAN gaming with old friends in my 20′s and are nothing short of amazing!

Doom II finished in 14:41

Quake finished in 17:38

Quake 2 finished in 21:06

Flow Control

November 29th, 2010 by jason 23 comments »

Thanks to the help from blog sponsorship, I’m able to maintain a higher performing lab environment than I ever had been up until this point.  One area which I hadn’t invested much in, at least from a lab standpoint, is networking.  In the past, I’ve always had some sort of small to mid density unmanageable Ethernet switch.  And this was fine.  Household name brand switches like Netgear and SMC from Best Buy and NewEgg performed well enough and survived for years in the higher temperature lab environment.  Add to that, by virtue of being unmanaged, they were plug and play.  No time wasted fighting a mis configured network. 

I recently picked up a 3Com SuperStack 3 Switch 3870 (48 1GbE ports).  It’s not 10GbE but it does fit my budget along with a few other networking nice-to-haves like VLANs and Layer 3 routing.  Because this switch is managed, I can now apply some best practices from the IP based storage realm.  One of those best practices is configuring Flow Control for VMware vSphere with network storage.  This blog post is mainly to record some pieces of information I’ve picked up along the way and to open a dialog with network minded readers who may have some input.

So what is network Flow Control? 

NetApp defines Flow Control in TR-3749 as “the process of managing the rate of data transmission between two nodes to prevent a fast sender from over running a slow receiver.”  NetApp goes on to advise that Flow Control can be set at the two endpoints (ESX(i) host level and the storage array level) and at the Ethernet switch(es) in between.

Wikipedia is in agreement with the above and adds more meat to the discussion including the following “The overwhelmed network element will send a PAUSE frame, which halts the transmission of the sender for a specified period of time. PAUSE is a flow control mechanism on full duplex Ethernet link segments defined by IEEE 802.3x and uses MAC Control frames to carry the PAUSE commands. The MAC Control opcode for PAUSE is 0X0001 (hexadecimal). Only stations configured for full-duplex operation may send PAUSE frames.

What are network Flow Control best practices as they apply to VMware virtual infrastructure with NFS or iSCSI network storage?

Both NetApp and EMC agree that Flow Control should be enabled in a specific way at the endpoints as well as at the Ethernet switches which support the flow of traffic:

  • Endpoints (that’s the ESX(i) hosts and the storage arrays) should be configured with Flow Control send/tx on, and receive/rx off.
  • Supporting Ethernet switches should be configured with Flow Control “Desired” or send/tx off and receive/rx on.

One item to point out here is that although both mainstream storage vendors recommend these settings for VMware infrastructures as a best practice, neither of their multi protocol arrays ship configured this way.  At least not the units I’ve had my hands on which includes the EMC Celerra NS-120 and the NetApp FAS3050c.  The Celerra is configured out of the box with Flow Control fully disabled and I found the NetApp configured for Flow Control set to full (duplex?).

Here’s another item of interest.  VMware vSphere hosts are configured out of the box to auto negotiate Flow Control settings.  What does this mean?  Network interfaces are able to advertise certain features and protocols which they were purpose built to understand following the OSI model and RFCs of course.  One of these features is Flow Control.  VMware ESX ships with a Flow Control setting which adapts to its environment.  If you plug an ESX host into an unmanaged switch which doesn’t advertise Flow Control capabilities, ESX sets its tx and rx flags to off.  These flags tie specifically to PAUSE frames mentioned above.  When I plugged in my ESX host into the new 3Com managed switch and configured the ports for Flow Control to be enabled, I subsequently found out using the ethtool -a vmnic0 command that both tx and rx were enabled on the host (the 3Com switch has just one Flow Control toggle: enabled or disabled).  NetApp provides a hint to this behavior in their best practice statement which says “Once these [Flow Control] settings have been configured on the storage controller and network switch ports, it will result in the desired configuration without modifying the flow control settings in ESX/ESXi.”  Jase McCarty pointed out back in January a “feature” of the ethtool in ESX.  Basically, ethtool can be used to display current Ethernet adapter settings (including Flow Control as mentioned above) and it can also be used to configure settings.  Unfortunately, when ethtool is used to hard code a vmnic for a specific Flow Control configuration, that config lasts until the next time ESX is rebooted.  After reboot, the modified configuration does not persist and it reverts back to auto/auto/auto.  I tested with ESX 4.1 and the latest patches and the same holds true.  Jase offers a workaround in his blog post which allows the change to persist by embedding it in /etc/rc.local.

Third item of interest.  VMware KB 1013413 talks about disabling Flow Control using esxcfg-module for Intel NICs and ethtool for Broadcom NICs.  This article specifically talks about disabling Flow Control when PAUSE frames are identified on the network.  If PAUSE frames are indicative of a large amount of traffic which a receiver isn’t able to handle, it would seem to me we’d want to leave Flow Control enabled (by design to mediate the congestion) and perform root cause analysis on exactly why we’ve hit a sustained scaling limit (and what do we do about it long term).

Fourth.  Flow Control seems to be a simple mechanism which hinges on PAUSE frames to work properly.  If the Wikipedia article is correct in that only stations configured for full-duplex operation may send PAUSE frames, then it would seem to me that both network endpoints (in this case ESX(i) and the IP based storage array) should be configured with Flow Control set to full duplex, meaning both tx and rx ON.  This conflicts with the best practice messages from EMC and NetApp although it does align with the FAS3050 out of box configuration.  The only reasonable explanation is that I’m misinterpreting the meaning of full-duplex here.

Lastly, I’ve got myself all worked up into a frenzy over the proper configuration of Flow Control because I want to be sure I’m doing the right thing from both a lab and infrastructure design standpoint, but in the end Flow Control is like the Shares mechanism in VMware ESX(i):  The values or configurations invoked apply only during periods of contention.  In the case of Flow Control, this means that although it may be enabled, it serves no useful purpose until a receiver on the network says “I can’t take it any more” and sends the PAUSE frames to temporarily suspend traffic.  I may never reach this tipping point in the lab but I know I’ll sleep better at night knowing the lab is configured according to VMware storage vendor best practices.

More VCDX Insight and a New Blog

November 17th, 2010 by jason 3 comments »

Yuri Semenikhin, a Systems Engineer from Georgia, Tbilisi, has recently launched a virtualization blog by the name of vEra of the Virtual Revolution.  Yuri published his VCDX certification attempt experience in his blog post VCDX “be or not to be”. not YET !  His writing is not in English, however, he offers an English translator on the right hand edge of his blog. 

While some compare the VCDX to the Cisco CCIE certification, Yuri contrasts the two by saying the CCIE is a technical certification mapping closer to the VCAP4-DCA while the VCDX is an architect certification.  I would agree the VCAP-DCA exam compares to the CCIE from a hands on lab approach and the VCAP-DCA was plenty difficult, but I don’t think the VCAP-DCA requires near the level of training, preparation, and expense (or investment, depending on your view) that the CCIE lab exam does.  This is merely a difference in opinion and I’m not saying either is right or wrong.

The purpose of my blog post is to provide some exposure to Yuri and his blog.  Yuri tells his story in great length and detail.  I wish him the best of luck with his blog and his next VCDX attempt!

Submit a VMware Feature Request

November 13th, 2010 by jason 1 comment »

SnagIt Capture

If you have a suggestion for how to improve or enhance VMware software, VMware always welcomes your input. Please submit your suggestions through the Feature Request form on VMware’s website. Unless additional information is needed, you will not receive a personal response. Any suggestions for enhancements to VMware software that you submit will become the property of VMware. VMware may use this information for any VMware business purposes, without restriction, including for product support and development. VMware will not use the information in a form that personally identifies you.

http://www.vmware.com/support/policies/feature.html

Provide your input to VMware and help them maintain status as the most innovative, flexible, and scalable hypervisor on the planet.

VMware VCAP4-DCD BETA Exam Experience

November 10th, 2010 by jason 14 comments »

The ink is still wet on a new chapter in the certification treadmill as I wrote the VCAP4-DCD BETA exam this morning.  Unlike the VCAP-DCA exam, I was able to take this exam locally in Eagan, MN which is where I took both of the VCDX3 written exams last year.  It’s close to both my office and home and therefore it is convenient. 

I was a little fired up this morning and playfully gave the VUE testing center staff a hard time for not allowing coffee in the exam room for a nearly 4 hour exam.  I had arrived to the test site early and I used the spare time to fully read testing center code of conduct.  It does not say food and drinks are not allowed in the exam room.  What it says is food, drinks, gum, and other things are not allowed to distract other test takers.  My argument was that I’m a quiet coffee drinker – let me in.  They wouldn’t budge and I suspect the person I was talking to was not in authority to make her own decision anyway.  I used to be able to take coffee in at an exam center in Bloomington but those days are gone I guess.  But I digress…

So the exam.. much better experience this time compared to the VCAP4-DCA BETA.  The interface felt polished and I felt there was a 300% improvement with the Visio-like tool.  As stated in the blueprint (which was updated in late October), there three types of exam question interfaces used in the testing engine:

  1. Traditional multiple choice (select one) or multiple select (select many)
  2. Use of a GUI tool to match answers to questions
  3. Use of a Visio-like tool to assemble architecture drawings

There were 131 questions to be answered and an exam duration of 3h 45m (I’m a native English speaking candidate).  There was also a brief survey at the beginning.  The time spent in the survey doesn’t count against actual exam time.  This is an opportunity to get a few notes or formulas written on the dry erase board before formally starting the real exam.  I knew from reading Chris Dearden’s experience that time management would be critical.  I used this insight to cruise through questions as swiftly as possible without getting caught up in deep thought like I have on my past few written exams.  Although I didn’t manage the time as well in the first half of the exam, I got progressively better.  I was able to get through most of the questions with a reasonable amount of thought.  There were some easier questions and due to the time constraint, my approach for those was to blow through them with quick answers to regain valuable time for other questions.  Hopefully I didn’t miss any small details which would change the nature of the question.  The Visio tool was pretty solid, no major complaints on usability (you really do have to have experience with the old VCDX3 Design exam to appreciate the improvement made), but it is easy to get sucked into spending way too much time on architecture drawings for the sake of 100% accuracy.  There were a few design drawings which I was somewhat comfortable with but had to give up and move on in the interest of time.  Completing all questions in the allotted time is a significant challenge with this exam.  I did run run out of time so I had to quickly guess answers for the last two or three items.  One other test engine item to note which Chris Dearden first highlighted is that there was no ability to mark questions or to go back to questions once reaching the end of the exam.

For study materials, I used the exam blueprint referenced above, a few white papers, as well as the VMware vSphere Design training class I sat a few weeks ago.  Some of the information carried over word for word to the exam.  The vSphere Design classroom training won’t cover it all as some exam questions were specific to vSphere 4.1 whereas the class covered 4.0.  There are some differences which you’ll need to compare and contrast.  I also used vCalendar tips – there was a vCalendar entry from the past few days which applied directly to the exam.  Experience and knowledge gained throughout the VCDX3 process also contributed to preparation.

The difficulty of the exam didn’t disappoint but I felt better and more confident walking out of the testing center this time than I did for the VCAP-DCA BETA which stunned me.  How the different types of questions in this exam are graded is anyone’s guess.  I’m particularly curious on the Visio tool vs. multiple choice weighting.  I’m hoping for a pass which will give me both the VCAP4-DCD as well as VCDX4 (upgrade) certifications.  With any luck, I’ll see results within a few months. 

I’m looking forward to what others have to say about their experience with this test.  In addition, I’m curious as to why the cost of the VCAP-DCD BETA exam ($200) was twice that of the VCAP-DCA BETA exam ($100).  For that matter, why the $400 fee to sit the live VCAP exam when comparable exams from other vendors such as Microsoft and Citrix are significantly less?  However I or any other candidate feels about the BETA exam, it’s important to not lose sight that it IS a BETA exam.  The BETA exam process assists VMware in developing a quality and consistent exam experience. Due to the time constraint, I was only able to leave about five individual question comments where I saw issues.  Hopefully my exam results along with the comments were of value to VMware and I am thankful that VMware invited me.

Updated 11/11/10:  A VMTN forum discussion on the exam has broken out at http://communities.vmware.com/message/1645177.  You’ll find some helpful tips from others here.  One thing I wanted to point out from the thread dealing with the Visio tool to make sure others aren’t tripped up by this:

Issue:

…never thought I’d long for Visio, my main issue being getting finnished up only to realise some lines didn’t go where I wanted, but the only way to move them was to click ‘Start Over’

Response:

If you put diagram connectors in the wrong place, you didn’t have to “Start Over”. There’s a scissors tool in the lower right corner of the Visio tool which “cuts” individual connectors. I figured that out on my first diagram after running into the same trouble you did. It would have been helpful for Jon Hall of VMware to point that out in his most excellent Flash demo of the Visio tool.

Update 1/11/11:  I passed.

Update 8/18/11:  No VCDX4 certificate or welcome kit received yet.

Q: What’s your Windows template approach?

November 7th, 2010 by jason 7 comments »

Once upon a time, I was a Windows Server administrator.  Most of my focus was on Windows Server deployment and management. VMware virtualization was a large interest but my Windows responsibilities dwarfed the amount of time I spent with VMware.  One place where these roads intersect is Windows templates.  Because a large part of my job was managing the Windows environment, I spent time maintaining “the perfect Windows template”.  Following were the ingredients I incorporated:

Applications    
Adobe Acrobat Reader Advanced Find & Replace Beyond Compare
Diskeeper MS Network Monitor MS Resource Kits
NTSEC Tools Latest MS RDP Client Symantec Anti-Virus CE
MS UPHClean VMware Tools Windows Admin Pack
Windows Support Tools Winzip Pro Sysinternals Suite
Windows Command Console BGINFO CMDHERE
Windows Perf Advisor MPS Reports GPMC
SNMP    

 

Tweaks    
Remote Desktop enabled Remote Assistance disabled Pagefile
Complete memory dump DIRCMD=/O env. variable PATH tweaks
taskmgr.exe in startup, run minimized SNMP Desktop prefs.
Network icon in System Tray Taskbar prefs.  
C: 12GB D: 6GB  
Display Hardware acceleration to Full*    
     
* = if necessary    

 

VMware virtualization is now and has been my main focus going on two years.  By title, I’m no longer a Windows Server administrator and I don’t care to spend a lot of time worrying about what’s in my templates.  I don’t have to worry about keeping several applications up to date.  In what I do now, it’s actually more important to consistently work with the most generic Windows template as possible.  This is to ensure that projects I’m working with on the virtualization side of things aren’t garfed up by any of the 30+ changes made above.  Issues would inevitably appear and each time I’d need to counter productively deal with the lists above as possible culprits.  As such, I now take a minimalist approach to Windows templates as follows:

Applications
VMware Tools

 

Tweaks    
C: 20GB VMXNET3 vNIC Activate Windows
wddm_video driver* Disk Alignment Display Hardware acceleration to Full*
     
* = if necessary    

 

In large virtualized environments, templates may be found in various repositories due to network segmentation, firewalls, storage placement, etc.  As beneficial as templates are, keeping them up to date can become a significant chore and the time spent doing so eats away at the time savings benefit which they provide.  Deployment consistency is key in reducing support and incident costs but making sure templates in distributed locations are consistent is not only a chore, but it is of paramount importance.  If this is the scenario you’re fighting, automated template and/or storage replication is needed.  Another solution might be to get away from templates altogether and adopt a scripted installation which is another tried and true approach which provides automation and consistency, but without the hassle of maintaining templates.  The hassle in this case isn’t eliminated completely.  It’s shifted into other areas such as maintaining PXE boot services, maintaining PXE images, and maintaining post build/application installation scripts.  I’ve seen large organizations go the scripted route in lieu of templates.  One reason could simply be that scripted virtual builds are strategically consistent with the organization’s scripted physical builds.  Another could be the burden of maintaining templates as I discussed earlier.  Is this a hint that templates don’t scale in large distributed environments?

Do you use templates and if so, what is your approach in comparison to what I’ve written about?