VMware vSphere Design Workshop – Day 2

October 28th, 2010 by jason No comments »

Day 2 of 3 is in the books.  We started the morning on Module 4 VMware vSphere Virtual Datacenter Design.  The discussion included topics such as:

  • vCenter Server requirements, sizing, placement, and high availability
  • vCenter and VUM database sizing and placement
  • Clusters
    • Size
    • HA, failover, isolation, design
    • DRS
    • FT
    • DPM
    • Resource Pools
    • Shares
    • Reservations
  • A lot of networking, including the standard vSwitch, vNetwork Distributed Switch, and the Cisco Nexus 1000V
    • FCoE
    • VLANs
    • PVLANs
    • Load balancing policies
    • Link State and Beacon Probing network failure detection (beacons are sent once per second per pNIC per VLAN; beacons are sent whether or not beacon probing is enabled – an advanced VMkernel setting permanently disables beacons)
    • 1Gb/10Gb Ethernet
    • Security
    • Firewalls
    • Port communication
    • Spanning Tree/PortFast
    • Jumbo frames
    • IPv6
    • DNS
  • VM DirectPath I/O (1 VM per PCI slot = no sharing multi port adapters between VMs, no vMotion DRS HA hot-add)

We accomplished a lab or two today as well.  We made some design decisions around vCenter, databases, and networking.  Along with those design decisions, we provided justifications and impacts.  This process is very familiar to me as I spent a lot of time providing information like this when filling out the VCDX defense application.

One thing I noticed tonight which I hadn’t seen before is that VMware posted an Adobe Flash demonstration of the new VCAP-DCD exam.  Take a look.  This will help candidates be better prepared overall for the exam experience.  Exam time is valuable – you don’t want to waste it trying to learn the UI.

Tomorrow we start with Module 6 VMware vSphere Storage Design.  I expect a lot of time spent here as the options for storage are vast.  The instructor hails from EMC and I’m sure he has plenty to say about storage.  On the subject of storage, the instructor passed along some tidbits on a NAS device offerings from QNAP.  In particular, take a look at the TS-239 PRO II Turbo NAS.  At 83.6MB/s throughput, it beats the pants off any other consumer based NAS appliance on the market (even Iomega).  Cisco also rebrands this NAS model as the NSS322 so you can find it there as well.  Lastly, take a look at smallnetbuilder.com.  This site reviews wireless equipment as well as NAS appliances for the public.  They have a nice chart rating most of the NAS appliances out there.  It is here where you can see how fast the QNAP unit above screams compared to the competition.

Reducing FT logging traffic for disk read intensive workloads

October 28th, 2010 by jason No comments »

I was researching FT documentation to find out more about asymmetric logging traffic between primary and secondary FT VMs when I stumbled onto a KB article which the document mentioned. VMware KB 1011965 talks about changing the traffic pattern on the FT logging network. This is particularly helpful for a high read disk I/O FT protected VM. Normally, all disk I/O is going to traverse the FT logging network from primary to secondary VM. For FT protected VMs which have disk I/O read patterns, the FT logging network may become saturated depending on the bandwidth (1Gb vs. 10Gb) and depending on the number of protected VMs on that network, not only between two hosts, but between all the hosts in the cluster or perhaps spanning clusters, depending on how far the FT network is stretched. What the workaround does is it makes the secondary VM issue disk reads directly to the shared disk (out of band) instead of getting that data over the FT logging network to stay within vLockstep tolerances.

Given the many restrictions for FT, particularly the 1vCPU requirement, you may not have run into FT logging network saturation. However, when some of these FT restrictions are lifted, I expect disk I/O to scale up on FT protected VMs. FT will become more popular and I can see where this tweak may come in handy, particularly for those who are looking to get more mileage out of 1Gb network infrastructure which FT networks are tied to.

The workaround in the KB article is applied at the VM level by adding the following line to the .vmx configuration:

replay.logReadData = checksum

In addition, the VM must be powered off before making the .vmx change, andthen unregistered and re-registered on the host after making the .vmx change.

I wouldn’t call the configuration itself t very scalable as it’s hidden and could become an administrative burden to document and track. Perhaps we’ll see this tweak move to a spot somewhere in the GUI and maybe the option of making it a host/cluster level configuration.

VMware vSphere Design Workshop – Day 1

October 27th, 2010 by jason No comments »

Today was day 1 of 3 for my VMware vSphere Design Workshop training.  I’ve been looking forward to this training since spring of this year when I scheduled it.  The timing couldn’t be better since I’m scheduled to sit the VMware VCAP-DCD BETA exam in November.  I’m told by the instructor, an EMC employee of eight years as well as a CLARiiON and SRM specialist, that this is the VMware recommended classroom training for the VCAP-DCD exam.  To be perfectly honest, I haven’t looked at the exam blueprint yet but I intend to tomorrow.  My hope and expectation at this point is that the class is going to cover the blueprint objectives.  Beyond the introductions, I don’t think we were 30 minutes into the class and conversation had already turned to Duncan Epping and Chad Sakac, along with their respective blogs.  By then, I knew I was in for a great three days.

The scope of the course covers vSphere 4.0 Update 1.  I was slightly disappointed by this in that it’s covering a release that is nearly one year old, however, if the exam objectives and the exam itself is based on 4.0 Update 1, then the training is appropriate.  That said, the instructor is willing to notify the class of any changes through the current version – 4.1.  Looking more closely at the scope, the following areas will be covered:

  • ESX
  • ESXi
  • Storage
  • Networks
  • Virtual Machines
  • vCenter Server and related databases
  • DRS
  • HA
  • FT
  • Resource Pools
  • Design Process
  • Design Decisions
  • Best Practices
  • Two comprehensive design case studies to apply knowledge in the lab:
    • SMB
    • Enterprise

Design is a different discipline than Administration.  Administration focuses on tactical things like installation, configuration, tools, CLI, Service Console, clients, etc.  Having said that, there is ample opportunity for working in a vSphere lab to master the various administrative tasks covered by the VCAP-DCA blueprint.  In fact, as most may know by now, the DCA exam is lab based.  Design is different.  It’s a step higher than the tools and the CLI which are generally abstracted from the logical design discussion.  The focus is shifted to the big virtual datacenter picture and the components involved to architect a solution which meets customer requirements and other variables used as design criteria input for the engagement.  As mentioned above, there are a series of paper-based labs which follow 2 design case studies: SMB and Enterprise.

It is just a three day class and we covered quite a bit of ground today.  Much of the time was spent on Design Methodology, Criteria, Approach, and VMware’s Five-Step Design Process:

  1. Initial Design Meeting
  2. Current-State Analysis
  3. Stakeholder and SME training
  4. Design Sessions
  5. Design Deliverables

Having years of consulting experience under his belt, the instructor volunteered helpful insight toward what he often referred to as the consultative based approach/discovery.  We talked about phases of the engagement, design meetings to hold, who to invite, who not to invite, and the value and persuasion power of food.  We got into some conversations about hypervisor choices (ESX vs. ESXi), with a sprinkling of hardware tangents (NUMA, PCIe, processors, storage, etc.) We closed the day with discussions on resource planning, peaks, and averages, as well as our first lab exercise which was to decide on a hardware standard (blade versus rack mount) and plan for capacity in terms of number of hosts and cluster sizes given data from customer interviews.

I’ll close here with an infrastructure design growth formula and practical application:

The scenario:  Contoso, Inc. has a consolidation ratio of 30:1 on an existing cluster.  Contoso expects 25 percent annual growth of a 200 VM cluster over the next four years.

The growth formula:  % Growth Rate x # VMs Growing x Term ÷ Consolidation Ratio = Growth Hosts Needed

The growth formula applied:  25% x 200 x 4 ÷ 30/1 = Growth Hosts Needed

The growth formula applied:  50 x 4 ÷ 30 = Growth Hosts Needed

The growth formula applied:  200 ÷ 30 = Growth Hosts Needed

The growth formula applied:  7 Growth Hosts Needed (round up)

I’m looking forward to Thursday!

Request for UI Consistency

October 25th, 2010 by jason No comments »

Sometimes it’s the little things that can make life easier.  This post is actually a fork from the one which I originally wrote previously on a DPM issue.  Shortly after that post, it was pointed out that I was reading DPM Priority Recommendations incorrectly.  Indeed that was the case.

Where did I go wrong?  Look at the priority descriptions between DRS and DPM, where both slide bars are configured with the same aggressiveness:

 “Apply priority 1, priority 2, priority 3, and priority 4 recommendations”

SnagIt Capture

 “Apply priority 4 or higher recommendations”

SnagIt Capture

My brief interpretation was that a higher recommendation meant a higher number (ie. priority 4 is a higher recommendation than priority 3).  It’s actually the opposite that is true.  A higher recommendation is a lower number.

I believe there is too much left open to interpretation on the DPM screen as to what is higher and what is lower.  The DRS configuration makes sense because it’s clear as to what is going to be applied; no definition of high or low to be (mis-)interpreted.  The fix?  Make the DPM configuration screen mirror DRS configuration screen.  Development consistency goes a long way.  As a frequent user of the tools, I expect it.  I view UI inconsistency as sloppy.

If you are a VMware DPM product manager, please see my previous post VMware DPM Issue.

VMware DPM Issue

October 24th, 2010 by jason No comments »

I’ve been running into a DPM issue in the lab recently.  Allow me briefly describe the environment:

  • 3 vCenter Servers 4.1 in Linked Mode
  • 1 cluster with 2 hosts
    • ESX 4.1, 32GB RAM, ~15% CPU utilization, ~65% Memory utilization, host DPM set for disabled meaning the host should never be placed in standby by DPM.
    • ESXi 4.1, 24GB RAM, ~15% CPU utilization, ~65% Memory utilization, host DPM set for automatic meaning the host is always a candidate to be placed in standby by DPM.
  • Shared storage
  • DRS and DPM enabled for full automation (both configured at Priority 4, almost the most aggressive setting)

Up until recently, the ESX and ESXi hosts weren’t as loaded and DPM was working reliably.  Each host had 16GB RAM installed.  When aggregate load was light enough, all VMs were moved to the ESX host and the ESXi host was placed in standby mode by DPM.  Life was good.

There has been much activity in the lab recently.  The ESX and ESXi host memory was upgraded to 32GB and 24GB respectively.  Many VMs were added to the cluster and powered on for various projects.  The DPM configuration remained as is.  Now what I’m noticing is that with a fairly heavy memory load on both hosts in cluster, DPM moves all VMs to the ESX host and places the ESXi host in standby mode.  This places a tremendous amount of memory pressure and over commit on the solitary ESX host.  This extreme condition is observed by the cluster and nearly as quickly, the ESXi host is taken back out of standby mode to balance the load.  Then maybe about an hour later, the process repeats itself again.

I then configured DPM to manual mode so that I could examine the recommendations being made by the calculator.  The VMs were being evacuated for the purposes of DPM via a Priority 3 recommendation which is half way between Conservative and Aggressive recommendations.

SnagIt Capture

What is my conclusion?  I’m surprised at the perceived increase in aggressiveness of DPM.  In order to avoid the extreme memory over commit, I’ll need to configure DPM slide bar for Priority 2.  In addition, I’d like to get a better understanding of the calculation.  I have a difficult time believing the amount of memory over commit being deemed acceptable in a neutral configuration (Priority 3) which falls half way between conservative and aggressive.  In addition to that, I’m not a fan of a host continuously entering and exiting standby mode, along with the flurry of vMotion activity which results.  This tells me that the calculation isn’t accounting for the amount of memory pressure which is actually occurring once a host goes into standby mode, or coincidentally there are significant shifts in the workload patterns shortly after each DPM operation.

If you are a VMware DPM product manager, please see my next post Request for UI Consistency.

vCenter Server Linked Mode Configuration Error

October 23rd, 2010 by jason No comments »

As of vCenter Server 4.1, VMware supports Windows Server 2008 R2 as a vCenter platform (remember 2008 R2 is 64-bit only).  With this, I expect many environments will be configured with vCenter Server on Microsoft’s newest Server operating system.

While working in the lab with vCenter Server 4.1 on Windows Server 2008 R2, I ran into an issue configuring Linked Mode via the vCenter Server Linked Mode Configuration shortcut.  Error 28035. Setup failed to copy LDIFDE.EXE from System folder to ‘%windir%\ADAM’ folder.

SnagIt Capture

After no success in relaxing Windows NTFS permissions, I remembered it’s a Windows Server 2008 R2 permissions issue.  The resolution is quite simple and is often the solution when running into similar errors on Windows 7 and Windows Server 2008 R2.  In addition, I found the workaround documented in VMware KB 1025637.  Rather than launching the vCenter Server Linked Mode Configuration as you normally would by clicking on the icon, instead, right click the shortcut and choose Run as administrator.

SnagIt Capture 

You should find that launching the shortcut in the administrator context grants the installer the permissions necessary to complete Linked Mode configuration.

VCDX Talk with Jason Boche and David Davis

October 22nd, 2010 by jason No comments »

Train Signal was nice enough to invite me to an on camera chat at VMworld 2010 regarding VMware’s VCDX certification.  The video, along with a flattering introduction, is located here.  David Davis asked some good questions and I had a great time talking with him.  Train Signal provided me with the right to reproduce the video and you’ll find it embedded below:

Update 10/23/10: Just one day later, a video was released where David Davis interviews VCDX001 John Arrasjid at VMworld Europe.  This is an outstanding resource for those going down the path of the VCDX.  Follow the link below:

VIDEO: VMware’s John Arrasjid, VCDX001, Interviewed by David Davis at VMworld

Update 10/26/10: Here’s a link to another great video which has emerged from VMworld Europe where John Troyer interviews John Arrasjid, VCDX001:

VCDX Program – John Arrasjid, Principal Architect, VMware, Inc.