Archive for November, 2010

Flow Control

November 29th, 2010

Thanks to the help from blog sponsorship, I’m able to maintain a higher performing lab environment than I ever had been up until this point.  One area which I hadn’t invested much in, at least from a lab standpoint, is networking.  In the past, I’ve always had some sort of small to mid density unmanageable Ethernet switch.  And this was fine.  Household name brand switches like Netgear and SMC from Best Buy and NewEgg performed well enough and survived for years in the higher temperature lab environment.  Add to that, by virtue of being unmanaged, they were plug and play.  No time wasted fighting a mis configured network. 

I recently picked up a 3Com SuperStack 3 Switch 3870 (48 1GbE ports).  It’s not 10GbE but it does fit my budget along with a few other networking nice-to-haves like VLANs and Layer 3 routing.  Because this switch is managed, I can now apply some best practices from the IP based storage realm.  One of those best practices is configuring Flow Control for VMware vSphere with network storage.  This blog post is mainly to record some pieces of information I’ve picked up along the way and to open a dialog with network minded readers who may have some input.

So what is network Flow Control? 

NetApp defines Flow Control in TR-3749 as “the process of managing the rate of data transmission between two nodes to prevent a fast sender from over running a slow receiver.”  NetApp goes on to advise that Flow Control can be set at the two endpoints (ESX(i) host level and the storage array level) and at the Ethernet switch(es) in between.

Wikipedia is in agreement with the above and adds more meat to the discussion including the following “The overwhelmed network element will send a PAUSE frame, which halts the transmission of the sender for a specified period of time. PAUSE is a flow control mechanism on full duplex Ethernet link segments defined by IEEE 802.3x and uses MAC Control frames to carry the PAUSE commands. The MAC Control opcode for PAUSE is 0X0001 (hexadecimal). Only stations configured for full-duplex operation may send PAUSE frames.

What are network Flow Control best practices as they apply to VMware virtual infrastructure with NFS or iSCSI network storage?

Both NetApp and EMC agree that Flow Control should be enabled in a specific way at the endpoints as well as at the Ethernet switches which support the flow of traffic:

  • Endpoints (that’s the ESX(i) hosts and the storage arrays) should be configured with Flow Control send/tx on, and receive/rx off.
  • Supporting Ethernet switches should be configured with Flow Control “Desired” or send/tx off and receive/rx on.

One item to point out here is that although both mainstream storage vendors recommend these settings for VMware infrastructures as a best practice, neither of their multi protocol arrays ship configured this way.  At least not the units I’ve had my hands on which includes the EMC Celerra NS-120 and the NetApp FAS3050c.  The Celerra is configured out of the box with Flow Control fully disabled and I found the NetApp configured for Flow Control set to full (duplex?).

Here’s another item of interest.  VMware vSphere hosts are configured out of the box to auto negotiate Flow Control settings.  What does this mean?  Network interfaces are able to advertise certain features and protocols which they were purpose built to understand following the OSI model and RFCs of course.  One of these features is Flow Control.  VMware ESX ships with a Flow Control setting which adapts to its environment.  If you plug an ESX host into an unmanaged switch which doesn’t advertise Flow Control capabilities, ESX sets its tx and rx flags to off.  These flags tie specifically to PAUSE frames mentioned above.  When I plugged in my ESX host into the new 3Com managed switch and configured the ports for Flow Control to be enabled, I subsequently found out using the ethtool -a vmnic0 command that both tx and rx were enabled on the host (the 3Com switch has just one Flow Control toggle: enabled or disabled).  NetApp provides a hint to this behavior in their best practice statement which says “Once these [Flow Control] settings have been configured on the storage controller and network switch ports, it will result in the desired configuration without modifying the flow control settings in ESX/ESXi.”  Jase McCarty pointed out back in January a “feature” of the ethtool in ESX.  Basically, ethtool can be used to display current Ethernet adapter settings (including Flow Control as mentioned above) and it can also be used to configure settings.  Unfortunately, when ethtool is used to hard code a vmnic for a specific Flow Control configuration, that config lasts until the next time ESX is rebooted.  After reboot, the modified configuration does not persist and it reverts back to auto/auto/auto.  I tested with ESX 4.1 and the latest patches and the same holds true.  Jase offers a workaround in his blog post which allows the change to persist by embedding it in /etc/rc.local.

Third item of interest.  VMware KB 1013413 talks about disabling Flow Control using esxcfg-module for Intel NICs and ethtool for Broadcom NICs.  This article specifically talks about disabling Flow Control when PAUSE frames are identified on the network.  If PAUSE frames are indicative of a large amount of traffic which a receiver isn’t able to handle, it would seem to me we’d want to leave Flow Control enabled (by design to mediate the congestion) and perform root cause analysis on exactly why we’ve hit a sustained scaling limit (and what do we do about it long term).

Fourth.  Flow Control seems to be a simple mechanism which hinges on PAUSE frames to work properly.  If the Wikipedia article is correct in that only stations configured for full-duplex operation may send PAUSE frames, then it would seem to me that both network endpoints (in this case ESX(i) and the IP based storage array) should be configured with Flow Control set to full duplex, meaning both tx and rx ON.  This conflicts with the best practice messages from EMC and NetApp although it does align with the FAS3050 out of box configuration.  The only reasonable explanation is that I’m misinterpreting the meaning of full-duplex here.

Lastly, I’ve got myself all worked up into a frenzy over the proper configuration of Flow Control because I want to be sure I’m doing the right thing from both a lab and infrastructure design standpoint, but in the end Flow Control is like the Shares mechanism in VMware ESX(i):  The values or configurations invoked apply only during periods of contention.  In the case of Flow Control, this means that although it may be enabled, it serves no useful purpose until a receiver on the network says “I can’t take it any more” and sends the PAUSE frames to temporarily suspend traffic.  I may never reach this tipping point in the lab but I know I’ll sleep better at night knowing the lab is configured according to VMware storage vendor best practices.

More VCDX Insight and a New Blog

November 17th, 2010

Yuri Semenikhin, a Systems Engineer from Georgia, Tbilisi, has recently launched a virtualization blog by the name of vEra of the Virtual Revolution.  Yuri published his VCDX certification attempt experience in his blog post VCDX “be or not to be”. not YET !  His writing is not in English, however, he offers an English translator on the right hand edge of his blog. 

While some compare the VCDX to the Cisco CCIE certification, Yuri contrasts the two by saying the CCIE is a technical certification mapping closer to the VCAP4-DCA while the VCDX is an architect certification.  I would agree the VCAP-DCA exam compares to the CCIE from a hands on lab approach and the VCAP-DCA was plenty difficult, but I don’t think the VCAP-DCA requires near the level of training, preparation, and expense (or investment, depending on your view) that the CCIE lab exam does.  This is merely a difference in opinion and I’m not saying either is right or wrong.

The purpose of my blog post is to provide some exposure to Yuri and his blog.  Yuri tells his story in great length and detail.  I wish him the best of luck with his blog and his next VCDX attempt!

Submit a VMware Feature Request

November 13th, 2010

SnagIt Capture

If you have a suggestion for how to improve or enhance VMware software, VMware always welcomes your input. Please submit your suggestions through the Feature Request form on VMware’s website. Unless additional information is needed, you will not receive a personal response. Any suggestions for enhancements to VMware software that you submit will become the property of VMware. VMware may use this information for any VMware business purposes, without restriction, including for product support and development. VMware will not use the information in a form that personally identifies you.

http://www.vmware.com/support/policies/feature.html

Provide your input to VMware and help them maintain status as the most innovative, flexible, and scalable hypervisor on the planet.

VMware VCAP4-DCD BETA Exam Experience

November 10th, 2010

The ink is still wet on a new chapter in the certification treadmill as I wrote the VCAP4-DCD BETA exam this morning.  Unlike the VCAP-DCA exam, I was able to take this exam locally in Eagan, MN which is where I took both of the VCDX3 written exams last year.  It’s close to both my office and home and therefore it is convenient. 

I was a little fired up this morning and playfully gave the VUE testing center staff a hard time for not allowing coffee in the exam room for a nearly 4 hour exam.  I had arrived to the test site early and I used the spare time to fully read testing center code of conduct.  It does not say food and drinks are not allowed in the exam room.  What it says is food, drinks, gum, and other things are not allowed to distract other test takers.  My argument was that I’m a quiet coffee drinker – let me in.  They wouldn’t budge and I suspect the person I was talking to was not in authority to make her own decision anyway.  I used to be able to take coffee in at an exam center in Bloomington but those days are gone I guess.  But I digress…

So the exam.. much better experience this time compared to the VCAP4-DCA BETA.  The interface felt polished and I felt there was a 300% improvement with the Visio-like tool.  As stated in the blueprint (which was updated in late October), there three types of exam question interfaces used in the testing engine:

  1. Traditional multiple choice (select one) or multiple select (select many)
  2. Use of a GUI tool to match answers to questions
  3. Use of a Visio-like tool to assemble architecture drawings

There were 131 questions to be answered and an exam duration of 3h 45m (I’m a native English speaking candidate).  There was also a brief survey at the beginning.  The time spent in the survey doesn’t count against actual exam time.  This is an opportunity to get a few notes or formulas written on the dry erase board before formally starting the real exam.  I knew from reading Chris Dearden’s experience that time management would be critical.  I used this insight to cruise through questions as swiftly as possible without getting caught up in deep thought like I have on my past few written exams.  Although I didn’t manage the time as well in the first half of the exam, I got progressively better.  I was able to get through most of the questions with a reasonable amount of thought.  There were some easier questions and due to the time constraint, my approach for those was to blow through them with quick answers to regain valuable time for other questions.  Hopefully I didn’t miss any small details which would change the nature of the question.  The Visio tool was pretty solid, no major complaints on usability (you really do have to have experience with the old VCDX3 Design exam to appreciate the improvement made), but it is easy to get sucked into spending way too much time on architecture drawings for the sake of 100% accuracy.  There were a few design drawings which I was somewhat comfortable with but had to give up and move on in the interest of time.  Completing all questions in the allotted time is a significant challenge with this exam.  I did run run out of time so I had to quickly guess answers for the last two or three items.  One other test engine item to note which Chris Dearden first highlighted is that there was no ability to mark questions or to go back to questions once reaching the end of the exam.

For study materials, I used the exam blueprint referenced above, a few white papers, as well as the VMware vSphere Design training class I sat a few weeks ago.  Some of the information carried over word for word to the exam.  The vSphere Design classroom training won’t cover it all as some exam questions were specific to vSphere 4.1 whereas the class covered 4.0.  There are some differences which you’ll need to compare and contrast.  I also used vCalendar tips – there was a vCalendar entry from the past few days which applied directly to the exam.  Experience and knowledge gained throughout the VCDX3 process also contributed to preparation.

The difficulty of the exam didn’t disappoint but I felt better and more confident walking out of the testing center this time than I did for the VCAP-DCA BETA which stunned me.  How the different types of questions in this exam are graded is anyone’s guess.  I’m particularly curious on the Visio tool vs. multiple choice weighting.  I’m hoping for a pass which will give me both the VCAP4-DCD as well as VCDX4 (upgrade) certifications.  With any luck, I’ll see results within a few months. 

I’m looking forward to what others have to say about their experience with this test.  In addition, I’m curious as to why the cost of the VCAP-DCD BETA exam ($200) was twice that of the VCAP-DCA BETA exam ($100).  For that matter, why the $400 fee to sit the live VCAP exam when comparable exams from other vendors such as Microsoft and Citrix are significantly less?  However I or any other candidate feels about the BETA exam, it’s important to not lose sight that it IS a BETA exam.  The BETA exam process assists VMware in developing a quality and consistent exam experience. Due to the time constraint, I was only able to leave about five individual question comments where I saw issues.  Hopefully my exam results along with the comments were of value to VMware and I am thankful that VMware invited me.

Updated 11/11/10:  A VMTN forum discussion on the exam has broken out at http://communities.vmware.com/message/1645177.  You’ll find some helpful tips from others here.  One thing I wanted to point out from the thread dealing with the Visio tool to make sure others aren’t tripped up by this:

Issue:

…never thought I’d long for Visio, my main issue being getting finnished up only to realise some lines didn’t go where I wanted, but the only way to move them was to click ‘Start Over’

Response:

If you put diagram connectors in the wrong place, you didn’t have to “Start Over”. There’s a scissors tool in the lower right corner of the Visio tool which “cuts” individual connectors. I figured that out on my first diagram after running into the same trouble you did. It would have been helpful for Jon Hall of VMware to point that out in his most excellent Flash demo of the Visio tool.

Update 1/11/11:  I passed.

Update 8/18/11:  No VCDX4 certificate or welcome kit received yet.

Q: What’s your Windows template approach?

November 7th, 2010

Once upon a time, I was a Windows Server administrator.  Most of my focus was on Windows Server deployment and management. VMware virtualization was a large interest but my Windows responsibilities dwarfed the amount of time I spent with VMware.  One place where these roads intersect is Windows templates.  Because a large part of my job was managing the Windows environment, I spent time maintaining “the perfect Windows template”.  Following were the ingredients I incorporated:

Applications    
Adobe Acrobat Reader Advanced Find & Replace Beyond Compare
Diskeeper MS Network Monitor MS Resource Kits
NTSEC Tools Latest MS RDP Client Symantec Anti-Virus CE
MS UPHClean VMware Tools Windows Admin Pack
Windows Support Tools Winzip Pro Sysinternals Suite
Windows Command Console BGINFO CMDHERE
Windows Perf Advisor MPS Reports GPMC
SNMP    

 

Tweaks    
Remote Desktop enabled Remote Assistance disabled Pagefile
Complete memory dump DIRCMD=/O env. variable PATH tweaks
taskmgr.exe in startup, run minimized SNMP Desktop prefs.
Network icon in System Tray Taskbar prefs.  
C: 12GB D: 6GB  
Display Hardware acceleration to Full*    
     
* = if necessary    

 

VMware virtualization is now and has been my main focus going on two years.  By title, I’m no longer a Windows Server administrator and I don’t care to spend a lot of time worrying about what’s in my templates.  I don’t have to worry about keeping several applications up to date.  In what I do now, it’s actually more important to consistently work with the most generic Windows template as possible.  This is to ensure that projects I’m working with on the virtualization side of things aren’t garfed up by any of the 30+ changes made above.  Issues would inevitably appear and each time I’d need to counter productively deal with the lists above as possible culprits.  As such, I now take a minimalist approach to Windows templates as follows:

Applications
VMware Tools

 

Tweaks    
C: 20GB VMXNET3 vNIC Activate Windows
wddm_video driver* Disk Alignment Display Hardware acceleration to Full*
     
* = if necessary    

 

In large virtualized environments, templates may be found in various repositories due to network segmentation, firewalls, storage placement, etc.  As beneficial as templates are, keeping them up to date can become a significant chore and the time spent doing so eats away at the time savings benefit which they provide.  Deployment consistency is key in reducing support and incident costs but making sure templates in distributed locations are consistent is not only a chore, but it is of paramount importance.  If this is the scenario you’re fighting, automated template and/or storage replication is needed.  Another solution might be to get away from templates altogether and adopt a scripted installation which is another tried and true approach which provides automation and consistency, but without the hassle of maintaining templates.  The hassle in this case isn’t eliminated completely.  It’s shifted into other areas such as maintaining PXE boot services, maintaining PXE images, and maintaining post build/application installation scripts.  I’ve seen large organizations go the scripted route in lieu of templates.  One reason could simply be that scripted virtual builds are strategically consistent with the organization’s scripted physical builds.  Another could be the burden of maintaining templates as I discussed earlier.  Is this a hint that templates don’t scale in large distributed environments?

Do you use templates and if so, what is your approach in comparison to what I’ve written about?

EMC Celerra Network Server Documentation

November 6th, 2010

EMC has updated their documentation library for the Celerra to version 6.0.  If you work with the Celerra or the UBER VSA, this is good reference documentation to have.  The updated Celerra documentation library on EMC’s Powerlink site is here: Celerra Network Server Documentation (User Edition) 6.0 A01.  The document library includes the following titles:

  • Celerra Network Server User Documents
    • Celerra CDMS Version 2.0 for NFS and CIFS
    • Celerra File Extension Filtering
    • Celerra Glossary
    • Celerra MirrorView/Synchronous Setup on CLARiiON Backends
    • Celerra Network Server Command Reference Manual
    • Celerra Network Server Error Messages Guide
    • Celerra Network Server Parameters Guide
    • Celerra Network Server System Operations
    • Celerra Security Configuration Guide
    • Celerra SMI-S Provider Programmer’s Guide
    • Configuring and Managing CIFS on Celerra
    • Configuring and Managing Celerra Network High Availability
    • Configuring and Managing Celerra Networking
    • Configuring Celerra Events and Notifications
    • Configuring Celerra Naming Services
    • Configuring Celerra Time Services
    • Configuring Celerra User Mapping
    • Configuring iSCSI Targets on Celerra
    • Configuring NDMP Backups on Celerra
    • Configuring NDMP Backups to Disk on Celerra
    • Configuring NFS on Celerra
    • Configuring Standbys on Celerra
    • Configuring Virtual Data Movers for Celerra
    • Controlling Access to Celerra System Objects
    • Getting Started with Celerra Startup Assistant
    • Installing Celerra iSCSI Host Components
    • Installing Celerra Management Applications
    • Managing Celerra for a Multiprotocol Environment
    • Managing Celerra Statistics
    • Managing Celerra Volumes and File Systems Manually
    • Managing Celerra Volumes and File Systems with Automatic Volume Management
    • Problem Resolution Roadmap for Celerra
    • Using Celerra AntiVirus Agent
    • Using Celerra Data Deduplication
    • Using Celerra Event Enabler
    • Using Celerra Event Publishing Agent
    • Using Celerra FileMover
    • Using Celerra Replicator (V2)
    • Using EMC Utilities for the CIFS Environment
    • Using File-Level Retention on Celerra
    • Using FTP on Celerra
    • Using International Character Sets with Celerra
    • Using MirrorView Synchronous with Celerra for Disaster Recovery
    • Using MPFS on Celerra
    • Using Multi-Protocol Directories with Celerra
    • Using NTMigrate with Celerra
    • Using ntxmap for Celerra CIFS User Mapping
    • Using Quotas on Celerra
    • Using SnapSure on Celerra
    • Using SNMPv3 on Celerra
    • Using SRDF/A with Celerra
    • Using SRDF/S with Celerra for Disaster Recovery
    • Using TFTP on Celerra Network Server
    • Using the Celerra nas_stig Utility
    • Using the Celerra server_archive Utility
    • Using TimeFinder/FS, NearCopy, and FarCopy with Celerra
    • Using Windows Administrative Tools with Celerra
    • Using Wizards to Configure Celerra
  • NS-120
    • Celerra NS-120 System (Single Blade) Installation Guide
    • Celerra NS-120 System (Dual Blade) Installation Guide
  • NS-480
    • Celerra NS-480 System (Dual Blade) Installation Guide
    • Celerra NS-480 System (Four Blade) Installation Guide
  • NS20
    • Celerra NS20 Read Me First
    • Setting Up the EMC Celerra NS20 System
    • Celerra NS21 Cabling Guide
    • Celerra NS21FC Cabling Guide
    • Celerra NS22 Cabling Guide
    • Celerra NS22FC Cabling Guide
    • Celerra NS20 System (Single Blade) Installation Guide
    • Celerra NS20 System (Single Blade with FC Option Enabled) Installation Guide
    • Celerra NS20 System (Dual Blade) Installation Guide
    • Celerra NS20 System (Dual Blade with FC Option Enabled) Installation Guide
  • NX4
    • Celerra NX4 System Single Blade Installation Guide
    • Celerra NX4 System Dual Blade Installation Guide
  • Regulatory Documents
    • C-RoHS HS/TS Substance Concentration Chart Technical Note

If you’re looking for more Celerra documentation, check out the Celerra Network Server General Reference page.

Performance charts fail after Daylight Savings changes are applied

November 5th, 2010

Daylight savings changes this weekend allow many folks to get an extra hour of sleep.  However, a VMware vSphere 4.1 bug has surfaced which may spoil the fun. 

VMware has published KB 1030305 (Performance charts fail after Daylight Savings changes are applied) which serves as a reminder that the pitfalls and treachery of mixing daylight savings changes and million dollar datacenters are not behind us yet.  Those who are on vSphere 4.1 and observe the weekend time change will run into problems come Sunday morning:

After Daylight Savings settings are applied:

  • Performance charts do not display data
  • Past week, month, and year performance overview charts are not displayed
  • Datastore performance/space data charts are not displayed
  • You receive the error: The chart could not be loaded
  • This occurs when clocks are set back 1 hour from Daylight Savings Time to Standard Time

VMware offers the following workaround:

Use Advanced Chart Options:

  1. Click Performance
  2. Click Advanced
  3. Click Chart Options and then choose the chart you want to review

Use a custom time range when viewing performance charts after clocks are set back:

  1. Click Performance
  2. Click the Time Range dropdown
  3. Choose Custom
  4. Specify From and To options that exclude the hours for when the time change occurred

For example:

If Standard Time settings were applied on November 7, at 01:00 AM, you could use these ranges:
Before the time change:
From 1/11/2010 12:00 AM To 7/11/2010 12:00 AM
After the time change:
From 7/11/2010 03:00 AM To 8/11/2010 15:00 PM

Have a great weekend!