Archive for May, 2011

Co-scheduling Visualized

May 21st, 2011

I stumbled onto this time lapse video of 51 airplanes taking off (and others taxiing) at Boston’s Logan International Airport.  One thought immediately popped into my mind: co-scheduling, which is a function of The VMware vSphere CPU Scheduler.  The accelerated speed of the video really pronounced the importance of precision the scheduler is responsible for, which in this case is the air traffic controller (or controllers).

httpv://www.youtube.com/watch?v=3k-xG8XX1EM

How does this video relate to co-scheduling?

  • Imagine the planes represent CPU execution (or more accurately CPU execution requests).
  • Imagine the various runways & taxiways represent the number of vCPUs in a VM.

The scheduler is responsible for managing the traffic, making sure there’s a clear path for each plane to move forward and to be on time. 

  • With less runways and taxiways (vCPUs in a VM), scheduling complexity is reduced.
  • Adding runways and taxiways (vCPUs in a VM) increases scheduling complexity but with a limited number of planes (guest OS CPU execution requests), scheduling will still be manageable and planes will arrive on time.
  • Now add a significant number of planes (4 vCPU, 8 vCPU) to our multitude of criss/crossing runways and taxiways.  The precision required to avoid accidents and maintain fairness becomes extremely complex.  The result is high %RDY time for VMs on the host.

How do we deal with scheduling complexity?

  1. Right size VMs whether they are new builds or P2V.  A minimalist approach to resource guarantees is the best place to start when we’re working with consolidated infrastructure and shared resources.
  2. If you’ve already right sized VMs and you’re running into high %RDY times:
    • Balance workloads by mixing VMs having both lower and higher number of vCPUs on the same host/cluster
    • Add cores to the host/cluster by:
      • Scaling up (increasing the core count in the host)
      • Scaling out (increasing the number of hosts in the cluster)

(Video source: @GuyKawasaki‘s Holy Kaw!)

He is serious, and don’t call him Scott

May 20th, 2011

5-20-2011 10-47-54 AMHappy Friday!  Today’s treat is the announcement of a new tech blog by my friend in VMware virtualization, Microsoft SQL,  and the occassional fine cigar, Todd Scalzott (@tscalzott).  I love the title of his blog: Don’t Call Me Scott.  Content focus will be Tech ramblings from a guy named Todd, too often called Scott.  I’m looking forward to what you have to share Todd!

Howdy Partner

May 17th, 2011

I started my IT career working as a contractor in both short and long term engagements at medium to large customer sites.  Since then, and for the past 13+ years, I’ve grown my career in a customer role.  Along the way, I’ve picked up a tremendous amount of experience and expertise across several technologies.  VMware virtualization came onto the scene and I was drawn to specialize in… well, you know the story there. 

At present, I work for a great company and on a daily basis I’m at the helm of the largest vSphere implementation I’ve ever seen and possibly one of the largest in the region.  I’ve networked, made a lot of friends, maybe a few enemies, and I’ve been the recipient of an unmeasurable amount of opportunity, kindness, and generosity available only to customers in the VMware community.  However, from a role and operational aspect, I feel I’ve reached the peak of the mountain and I’ve seen and experienced all of the challenges that this mountain has to offer.  It’s time to try another mountain.

I’m hanging up my customer hat.  On Monday of next week, I begin a new role with Dell Compellent, a VMware Technology Alliance Partner.  I’ll have two titles:  Tactical Marketing Senior Advisor and Virtualization Product Specialist.  Each speaks to a degree of what my various responsibilities will entail.  My VMware experience will be leveraged continuously as I provide SME technical expertise to Storage Architects, Business Partners, and Customers on design, planning, and integration.  In addition, I’ll be involved with consulting, product demos, solution certification, white papers, and reference architectures.  In summary, I’ll be splitting my time between colleagues, customers, and more lab infrastructure than I might know what to do with, and at the same time exercising more of my design muscles.

So what does all this mean and how is it going to change Jason?  Let’s go through the list of things which come to my mind:

  • The VMware Virtualization Evangelist stays, though independent of this news I have been thinking about shortening the title to VMware vEvangelist (thoughts?).  That said, I’ll need to provide extra thought in what and how I write.  It is my underlying intent to deliver this news not from the standpoint of “hey, I got a new job”, but more importantly to instantiate the necessary transparency and disclosure from this point on.  This blog (and my twitter account @jasonboche) has always been and will continue to be mine.  I’ve made it quite clear in the past that my writing is my own and not the opinion or view of my employer.  This carries forward and I will continue to be an independent voice as much as possible but the fact that I work for a VMware Partner in the future will be inescapable.  Which brings me to the next point…
  • VMware’s policy is that, other than a few people which were grandfathered in, VMware Partners cannot be VMware User Group (VMUG) leaders.  I’ve been the Minneapolis VMUG leader for close to 5 years.  I’ve been involved with the group since the beginning when it was founded by @tbecchetti.  Although Dell Compellent was allowing me to continue carrying the VMUG torch, VMware forbids it.  It’s a fair policy and I agree 100% with it.  The Minneapolis VMUG members own and operate the group and this is clearly what’s best for the charter and its members.  A few weeks ago, I began the transition plan with the help of VMware and have talked with several potential candidates for taking over the VMUG leader role.  If I haven’t talked to you yet and you’re interested in leading or co-leading the group, please contact me via email expressing your interest.  Be sure to leave your name and contact information.  Our group has a quarterly meeting coming up this Friday which I’ll be conducting business as usual.  Our Q3 meeting in September is where I’ll likely be stepping down and introducing the new leader(s).
  • I’m still attending Gestalt Tech Field Day 6 evening activities in Boston 6/8 – 6/11, but I will not formally be a delegate nor will I be a delegate going forward as I’m no longer considered independent.  Again, Gestalt IT guidelines and I completely get it, it’s what is best for the group.  I’m looking forward to seeing some old friends as well as new faces from **I can’t let the cat out of the bag just yet, area locals will find out soon**.
  • I’m going to get my hands on kit which I’ve not had the chance to work with in the past.  Don’t be completely surprised if future discussion involves Dell Compellent.  At the same time, don’t automatically jump to a conclusion that I’ve transformed into a puppet.  Cool technology motivates me and is ultimately responsible for where I am at today.  I enjoy sharing the knowledge with peers when and where I can.  I believe that by sharing, everyone wins.
  • VMworld – you’ll probably see me at the booth.
  • Partner Exchange – I may be there as well.
  • VMworld Europe – I hope but not counting on it.  I didn’t ask.

I think that covers everything.  Compellent is a local (to me) storage company which I like.  I think Dell will add a lot of strength, opportunity, and growth.  I’m excited to say the least!

Jas

Cisco Discovery Protocol (CDP) Tag Team

May 15th, 2011

For this blog post, I collaborated with Dawn Theirl (@KokopeIIi on Twitter) who is a Network Engineer in the San Francisco Bay Area.  Dawn performs a  lot of hands-on work in her day to day role as a wired and wireless network guru.  We understand that CDP provides benefits for both the network and virtualization platform teams.  However, in larger or siloed environments, our two teams don’t necessarily know what the other is seeing in their dashboard.  Curiosity prevailed and here we are.  In this writing, Dawn and I will discuss CDP, its implementation, and what exactly is seen in each of our siloed roles using our respective management tools, as well as the benefits provided by both having and sharing this information..

CDP is a useful troubleshooting tool in networking…. When given an IP of a host that someone has questions about and tracing the IP and MAC from a distribution layer switch down to the access layer, CDP info can tell you what switch to look at next. It is also useful if you don’t have an accurate network map to get an idea of how a network is physically laid out by learning what devices are physically connected to each other.  CDP operates at Layer 2 (Datalink) of the OSI model.  CDP packets are non-routable.

By default, CDP is enabled (and advertising) on Cisco switches and routers.  CDP is enabled and effectively configured as listen on ESX(i) vSwitches.  The value added by CDP benefits VMware administrators.  Looking at the CDP properties of each vmnic from the vSphere Client, CDP information is provided.  The most useful information is highlighted in yellow.  The name of the switch which the vmnic is cabled to as well as the port number on the switch that the network cable is connected to.  In access port configurations where 802.1Q VLANs are enabled, the VLAN field will also contain useful information:

SnagIt Capture

From the Cisco switch point of view in the default configuration, we don’t see any information about the ESXi host or its vmnics.  This is because the vSwitch tied to the vmnic uplinks is in listen mode only (no advertising).  # show cdp neighbors is the command which would display information about other devices advertising information by way of CDP:

SnagIt Capture

So out of the box, ESXi is configured to pull CDP information about the upstream network and this is quite valuable to have for implementation and troubleshooting.  However, there is an additional configuration which can be made on the ESXi host which will allow it to provide its own intrinsic data to the Cisco switch via CDP and that is by enabling CDP advertising.  This information is useful for troubleshooting which benefits both the network and virtual infrastructure teams by providing a method for close collaboration.  Let’s make the additional configuration change and note the additional information which is exposed by the ESXi host.

At the ESXi host DCUI, we can examine the CDP status of a vSwitch by issuing the command # esxcfg-vswitch -b vSwitch0.  Shown here, vSwitch0 is in listen only mode:

SnagIt Capture

Now let’s change the CDP mode for vSwitch0 to both (meaning both listen and advertise) and then verify the configuration change:

5-15-2011 11-30-24 AM

At this point, both the Cisco switch and the ESXi host are listening and advertising which is mutually beneficial to the network and virtual infrastructure teams.  Nothing changes visibility wise on the ESXi side.  However, the network team is now able to receive and view CDP advertisements on their Cisco gear from the ESXi hosts.  Let’s take a look by issuing the > show cdp neighbors command on the Cisco switch.  Note a difference from when I ran this command earlier that we can view CDP neighbor information in either user or privileged mode on the switch.  With CDP advertisements enabled on the ESXi host, we’re able to see ESXi host information as well as the host vmnic uplinks and the respective ports they’re cabled to on the Cisco switch:

5-15-2011 11-42-58 AM

From the switch side I can see what ports the VMs are on. This can be useful as unless you put a description on a port with the host name every time something gets installed (and then moved), you don’t know what is connected on any given port without a lot effort to backtrack a mac address to a IP to a hostname.  Lots of information… you get the host name, what port it’s connected to on the switch and which nic the host is using for that connection. Very useful for troubleshooting when a systems admin is questioning if there are problems on the network when a particular host is having issues. Usually the most the sys admin can tell you is what network the host is on and the network admin has to trace the IP and then the MAC address to find what port the host is on. With the CDP exchange once you narrow down what switch the host is on just issuing the “show CDP neighbor” command will tell you what port to focus on. One interesting note is the Host advertises itself as a switch instead of a host.

> show cdp neighbors detail provides some additional information about the host such as the build number and CDP version.  This detail is not quite as valuable for troubleshooting but nonetheless could come in handy for either a large enterprise or a smaller environment with consolidated roles:

5-15-2011 11-43-56 AM

Looking at the [advertised] Cisco Discovery Protocol output from the VM, important information seen is the switch name, IP address, vlan and port the host is connected to. Other things I can see are that the port is set to full duplex, and that it’s a switch vs. a router (don’t laugh, I’ve seen a router with a blade with a small number of ports used for a very small office.)

With the implementation details and benefits out of the way, let’s focus a bit on CDP strategy.  There are a few approaches to CDP which can be evaluated from labor, change management, and security primitives:

  1. Infrastructure implementation with default configurations – No changes required at implementation time providing the easiest and fastest deployment of ESXi in addition to providing CDP listen mode benefits from the virtual platform point of view.  The virtual platform remains secure while upstream network information is advertised to neighbors.
  2. Disable CDP globally, enable only as needed for the short term – Requires disabling CDP at implementation time in addition to change management time spent temporarily enabling and disabling CDP later on to aid troubleshooting.  Most secure from the network and virtual platform standpoint.
  3. Enable bidirectional CDP globally, always on – Requires enabling CDP both (listen and advertise) at implementation time thereby providing comprehensive information for troubleshooting later on.  Least secure; both network and virtual platform information is exposed by CDP advertisements to neighbors.

I’ve worked with organizations who implement one, of or a combination of all three.  As with many design decisions, philosophy and justifications will vary.  A decision here could be made based on the size of the datacenter, distribution of roles, security approach, or the vertical which the business operates in (think regulatory compliance).  CDP is of course beneficial to network and virtual platform owners but it can also aid a hacker who has penetrated the environment thereby becoming a sharing recipient of the same network information.  Speaking for myself, I’ve gotten a lot of operational benefits while leveraging CDP for troubleshooting.  Network engineers often ask me to configure CDP for advertising on the host side.  What helps them ultimately helps me in a troubleshooting scenario and can ultimately shorten the time we spend focusing on an issue.  In customer facing or production environments, every minute of downtime costs and therefore counts.  My preference is to operate with CDP configured for listen on the host side.  This configuration provides the most bang for the buck as it the default out-of-box configuration on both the Cisco and VMware side.  In other words, if you do nothing at all, you can reap major benefits with the native configuration when it comes time to troubleshoot or provide capacity and/or SPOF planning for network resources.  That’s my preference.  That said, I get the security side of the discussion and of course I’m not opposed to disabling CDP when compelling requirements or constraints exist.

Aside from the design decisions above, I would be remiss if I did not also mention a potential stability issue (categorize as potential risk in your design) I came across from Cisco. When enabling CDP or leaving CDP enabled in an environment, there is a known CDP issue which should be taken into consideration because it can cause a disruption of the network.  CDP Can Consume All Router Memory.  When a large amount of CDP neighbor announcements are sent, it is possible to consume all memory of an available device. This causes a crash or other abnormal behavior. Refer to Cisco’s Response to the CDP Issue (Document ID: 13621) for more details.  This issue is quite old and may no longer be a threat with modern versions of IOS and NX-OS.

CDP is wonderful tool.  However, one obvious weakness in the heterogeneous datacenter is that it is vendor specific to Cisco switches and routers.  Other networking vendors don’t support CDP and therefore cannot integrate with it.  A newer and similar vendor neutral protocol called LLDP (Link Layer Discovery Protocol) appears to fill the need for the other vendors which choose support it.  At this time however VMware is not supporting LLDP though at least one source claims it is on the VMware roadmap which is a good thing.

In closing, I’d like to leave the audience with an Appendix style list of VMware and Cisco CDP commands, as well as a few links to additional Cisco resources on the web.  I would also like to thank Dawn for her contribution and eager willingness to collaborate with me on this article.

Update 11/17/11: Link Layer Discovery Protocol (LLDP) has been published

Appendix A: ESX(i) esxcfg-vswitch (or vicfg-vswitch) parameters:

-B or –set-cdp Set the CDP status for a given virtual switch. To set, pass one of “down”, “listen”, “advertise”, “both”.
-b or –get-cdp Print the current CDP setting for this switch.

Appendix B: Cisco switch commands (some require privileged mode):

cdp run Enables CDP globally (on by default).
cdp enable Enables CDP on an interface.
cdp advertise-v2 Enables CDP Version-2 advertising functionality on a device.
clear cdp counters Resets the traffic counters to zero.
clear cdp table Deletes the CDP table of information about neighbors.
debug cdp adjacency Monitors CDP neighbor information.
show cdp Displays global CDP information such as the interval between transmissions of CDP advertisements, the number of seconds the CDP advertisement is valid for a given port, and the version of the advertisement.
show cdp neighbors  Displays information about neighbors.
show cdp neighbors detail  Displays more detail about neighboring devices.
show cdp entry * Displays information about all devices.
show cdp interface [type number] Displays information about interfaces on which CDP is enabled.
show cdp traffic Displays CDP counters, including the number of packets sent and received and checksum errors.
cdp timer seconds Specifies frequency of transmission of CDP updates.
cdp holdtime seconds Specifies the amount of time a receiving device should hold the information sent by your device before discarding it.
no cdp run Turns off CDP globally.

Appendix C: Helpful CDP resources from Cisco and VMware:

Configuring Cisco Discovery Protocol (CDP)

Configuring Cisco Discovery Protocol on Cisco Routers and Switches Running Cisco IOS (Document ID: 43485)

Cisco Discovery Protocol (CDP) network information

Configuring the Cisco Discovery Protocol (CDP) with ESX

Application Troubleshooting Tools and Tips for VMware ThinApp

May 11th, 2011

Well over a year ago, I was introduced to a fantastic repository of VMware ThinApp tools, tips, and troubleshooting methods.  While some of the content may be dated (it was created nearly a year before I came across it), I suspect the bulk of it is still relevant to some degree today.  It comes to us from VMware’s ThinApp blog by a gentleman named Dean Flaming.

Application Troubleshooting Tools and Tips for VMware ThinApp

See also:

How to Make a ThinApp Application Package

Application virtualization is an integral VDI component with the encapsulation power, streaming, and flexibility it has to offer.  VMware was not first to market with the technology but they recognize these benefits and have integrated ThinApp application delivery into VMware View, strengthening the desktop portfolio.  Give it a try – it’s pretty cool stuff!

Performance Overview charts fail with STATs Report Service internal error

May 11th, 2011

A few months ago I was troubleshooting a problem with the Overview charts in the Performance tab of the vSphere Client.  This was a vSphere 4.0 Update 1 environment but I believe the root cause will impact other vSphere versions as well.

Instead of displaying the dashboard of charts in the Overview display, an error was displayed:

STATs Report service internal error
or
STATs Report application initialization is not completed successfully

One unique aspect of this environment was that the vCenter database was hosted on a Microsoft SQL Server which used a port other than the default of TCP 1433.  VMware KB Article 1012812 identified this as the root cause of the issue.

To resolve the issue, I was required to stop the vCenter Server service and modify the statsreport.xml file located on the vCenter Server in the \Program Files\VMware\Infrastructure\tomcat\conf\Catalina\localhost\ directory by inserting the line in bold.  Note the italicized components will vary and are environment specific based on the SQL server name, database name, alternate TCP port in use, and authentication method (SQL/false or Windows integrated/true):

   name=”jdbc/StatsDS”
   type=”javax.sql.DataSource”
   factory=”org.apache.tomcat.dbcp.dbcp.BasicDataSourceFactory”
   initialSize=”3″
  maxActive=”10″
  maxIdle=”3″
  maxWait=”10000″
  defaultReadOnly=”true”
  defaultTransactionIsolation=”READ_COMMITTED”
  removeAbandoned=”true”
  removeAbandonedTimeout=”60″
  url=”jdbc:sqlserver://sqlservername:1601;instanceName=sqlservername;
     databaseName=sqldatabasename;integratedSecurity=false;”
/>

Don’t forget to restart the vCenter Server service after saving the statsreport.xml file.

VMware vSphere SiteSurvey Plug-in

May 10th, 2011

VMware SiteSurvey is a free add-on utility which analyzes vSphere ESX and ESXi hosts for VMware Fault Tolerance (FT) compatibility.  My good friend Eric Siebert wrote in depth about this piece of software and its capabilities just after the GA launch of VMware vSphere in 2009.

In June of 2010, VMware released SiteSurvey version 2.5.0.  What was unique about this particular release was that VMware transformed it from a standalone Windows application to a vSphere Client Plug-in.  Today, version 2.5.2 (released 12/10/10) of this SiteSurvey Plug-in is available as a free download from VMware’s site.

Installation of the plug-in is as simple as they come.  Exit the vSphere Client if it is currently running and launch the SiteSurvey-2.5.2.msi executable file.  SiteSurvey is a client side plug-in and as such needs to be installed on each machine which has a vSphere Client in order to use the plug-in.

Click Next:

SnagIt Capture

Accept the license agreement and click Next:

SnagIt Capture

Click Next:

SnagIt Capture

After the installation routine completes, click Close:

SnagIt Capture

Now open the vSphere Client and choose Plug-ins | Manage Plug-ins.  Note the new SiteSurvey Plugin and VMware’s inconsistent spelling of the Plug-in phrase:

SnagIt Capture

With the plug-in installed and enabled, you’ll now see a SiteSurvey tab in the cluster and host inventory views which will help you identify the FT capabilities of both hosts and virtual machines.  Remember, there is a lengthy list of requirements which must be met for hosts, VMs, clusters, and vCenter to enable FT.  Information about FT requirements can be found here, here, and here:

SnagIt Capture