Posts Tagged ‘Networking’

Howdy Partner

May 17th, 2011

I started my IT career working as a contractor in both short and long term engagements at medium to large customer sites.  Since then, and for the past 13+ years, I’ve grown my career in a customer role.  Along the way, I’ve picked up a tremendous amount of experience and expertise across several technologies.  VMware virtualization came onto the scene and I was drawn to specialize in… well, you know the story there. 

At present, I work for a great company and on a daily basis I’m at the helm of the largest vSphere implementation I’ve ever seen and possibly one of the largest in the region.  I’ve networked, made a lot of friends, maybe a few enemies, and I’ve been the recipient of an unmeasurable amount of opportunity, kindness, and generosity available only to customers in the VMware community.  However, from a role and operational aspect, I feel I’ve reached the peak of the mountain and I’ve seen and experienced all of the challenges that this mountain has to offer.  It’s time to try another mountain.

I’m hanging up my customer hat.  On Monday of next week, I begin a new role with Dell Compellent, a VMware Technology Alliance Partner.  I’ll have two titles:  Tactical Marketing Senior Advisor and Virtualization Product Specialist.  Each speaks to a degree of what my various responsibilities will entail.  My VMware experience will be leveraged continuously as I provide SME technical expertise to Storage Architects, Business Partners, and Customers on design, planning, and integration.  In addition, I’ll be involved with consulting, product demos, solution certification, white papers, and reference architectures.  In summary, I’ll be splitting my time between colleagues, customers, and more lab infrastructure than I might know what to do with, and at the same time exercising more of my design muscles.

So what does all this mean and how is it going to change Jason?  Let’s go through the list of things which come to my mind:

  • The VMware Virtualization Evangelist stays, though independent of this news I have been thinking about shortening the title to VMware vEvangelist (thoughts?).  That said, I’ll need to provide extra thought in what and how I write.  It is my underlying intent to deliver this news not from the standpoint of “hey, I got a new job”, but more importantly to instantiate the necessary transparency and disclosure from this point on.  This blog (and my twitter account @jasonboche) has always been and will continue to be mine.  I’ve made it quite clear in the past that my writing is my own and not the opinion or view of my employer.  This carries forward and I will continue to be an independent voice as much as possible but the fact that I work for a VMware Partner in the future will be inescapable.  Which brings me to the next point…
  • VMware’s policy is that, other than a few people which were grandfathered in, VMware Partners cannot be VMware User Group (VMUG) leaders.  I’ve been the Minneapolis VMUG leader for close to 5 years.  I’ve been involved with the group since the beginning when it was founded by @tbecchetti.  Although Dell Compellent was allowing me to continue carrying the VMUG torch, VMware forbids it.  It’s a fair policy and I agree 100% with it.  The Minneapolis VMUG members own and operate the group and this is clearly what’s best for the charter and its members.  A few weeks ago, I began the transition plan with the help of VMware and have talked with several potential candidates for taking over the VMUG leader role.  If I haven’t talked to you yet and you’re interested in leading or co-leading the group, please contact me via email expressing your interest.  Be sure to leave your name and contact information.  Our group has a quarterly meeting coming up this Friday which I’ll be conducting business as usual.  Our Q3 meeting in September is where I’ll likely be stepping down and introducing the new leader(s).
  • I’m still attending Gestalt Tech Field Day 6 evening activities in Boston 6/8 – 6/11, but I will not formally be a delegate nor will I be a delegate going forward as I’m no longer considered independent.  Again, Gestalt IT guidelines and I completely get it, it’s what is best for the group.  I’m looking forward to seeing some old friends as well as new faces from **I can’t let the cat out of the bag just yet, area locals will find out soon**.
  • I’m going to get my hands on kit which I’ve not had the chance to work with in the past.  Don’t be completely surprised if future discussion involves Dell Compellent.  At the same time, don’t automatically jump to a conclusion that I’ve transformed into a puppet.  Cool technology motivates me and is ultimately responsible for where I am at today.  I enjoy sharing the knowledge with peers when and where I can.  I believe that by sharing, everyone wins.
  • VMworld – you’ll probably see me at the booth.
  • Partner Exchange – I may be there as well.
  • VMworld Europe – I hope but not counting on it.  I didn’t ask.

I think that covers everything.  Compellent is a local (to me) storage company which I like.  I think Dell will add a lot of strength, opportunity, and growth.  I’m excited to say the least!

Jas

Cisco Discovery Protocol (CDP) Tag Team

May 15th, 2011

 

For this blog post, I collaborated with Dawn Theirl (@KokopeIIi on Twitter) who is a Network Engineer in the San Francisco Bay Area.  Dawn performs a  lot of hands-on work in her day to day role as a wired and wireless network guru.  We understand that CDP provides benefits for both the network and virtualization platform teams.  However, in larger or siloed environments, our two teams don’t necessarily know what the other is seeing in their dashboard.  Curiosity prevailed and here we are.  In this writing, Dawn and I will discuss CDP, its implementation, and what exactly is seen in each of our siloed roles using our respective management tools, as well as the benefits provided by both having and sharing this information..

CDP is a useful troubleshooting tool in networking…. When given an IP of a host that someone has questions about and tracing the IP and MAC from a distribution layer switch down to the access layer, CDP info can tell you what switch to look at next. It is also useful if you don’t have an accurate network map to get an idea of how a network is physically laid out by learning what devices are physically connected to each other.  CDP operates at Layer 2 (Datalink) of the OSI model.  CDP packets are non-routable.

By default, CDP is enabled (and advertising) on Cisco switches and routers.  CDP is enabled and effectively configured as listen on ESX(i) vSwitches.  The value added by CDP benefits VMware administrators.  Looking at the CDP properties of each vmnic from the vSphere Client, CDP information is provided.  The most useful information is highlighted in yellow.  The name of the switch which the vmnic is cabled to as well as the port number on the switch that the network cable is connected to.  In access port configurations where 802.1Q VLANs are enabled, the VLAN field will also contain useful information:

SnagIt Capture

 

 

From the Cisco switch point of view in the default configuration, we don’t see any information about the ESXi host or its vmnics.  This is because the vSwitch tied to the vmnic uplinks is in listen mode only (no advertising).  # show cdp neighbors is the command which would display information about other devices advertising information by way of CDP:

SnagIt Capture

So out of the box, ESXi is configured to pull CDP information about the upstream network and this is quite valuable to have for implementation and troubleshooting.  However, there is an additional configuration which can be made on the ESXi host which will allow it to provide its own intrinsic data to the Cisco switch via CDP and that is by enabling CDP advertising.  This information is useful for troubleshooting which benefits both the network and virtual infrastructure teams by providing a method for close collaboration.  Let’s make the additional configuration change and note the additional information which is exposed by the ESXi host.

At the ESXi host DCUI, we can examine the CDP status of a vSwitch by issuing the command # esxcfg-vswitch -b vSwitch0.  Shown here, vSwitch0 is in listen only mode:

SnagIt Capture

Now let’s change the CDP mode for vSwitch0 to both (meaning both listen and advertise) and then verify the configuration change:

5-15-2011 11-30-24 AM

At this point, both the Cisco switch and the ESXi host are listening and advertising which is mutually beneficial to the network and virtual infrastructure teams.  Nothing changes visibility wise on the ESXi side.  However, the network team is now able to receive and view CDP advertisements on their Cisco gear from the ESXi hosts.  Let’s take a look by issuing the > show cdp neighbors command on the Cisco switch.  Note a difference from when I ran this command earlier that we can view CDP neighbor information in either user or privileged mode on the switch.  With CDP advertisements enabled on the ESXi host, we’re able to see ESXi host information as well as the host vmnic uplinks and the respective ports they’re cabled to on the Cisco switch:

5-15-2011 11-42-58 AM

From the switch side I can see what ports the VMs are on. This can be useful as unless you put a description on a port with the host name every time something gets installed (and then moved), you don’t know what is connected on any given port without a lot effort to backtrack a mac address to a IP to a hostname.  Lots of information… you get the host name, what port it’s connected to on the switch and which nic the host is using for that connection. Very useful for troubleshooting when a systems admin is questioning if there are problems on the network when a particular host is having issues. Usually the most the sys admin can tell you is what network the host is on and the network admin has to trace the IP and then the MAC address to find what port the host is on. With the CDP exchange once you narrow down what switch the host is on just issuing the “show CDP neighbor” command will tell you what port to focus on. One interesting note is the Host advertises itself as a switch instead of a host.

> show cdp neighbors detail provides some additional information about the host such as the build number and CDP version.  This detail is not quite as valuable for troubleshooting but nonetheless could come in handy for either a large enterprise or a smaller environment with consolidated roles:

5-15-2011 11-43-56 AM

Looking at the [advertised] Cisco Discovery Protocol output from the VM, important information seen is the switch name, IP address, vlan and port the host is connected to. Other things I can see are that the port is set to full duplex, and that it’s a switch vs. a router (don’t laugh, I’ve seen a router with a blade with a small number of ports used for a very small office.)

With the implementation details and benefits out of the way, let’s focus a bit on CDP strategy.  There are a few approaches to CDP which can be evaluated from labor, change management, and security primitives:

  1. Infrastructure implementation with default configurations – No changes required at implementation time providing the easiest and fastest deployment of ESXi in addition to providing CDP listen mode benefits from the virtual platform point of view.  The virtual platform remains secure while upstream network information is advertised to neighbors.
  2. Disable CDP globally, enable only as needed for the short term – Requires disabling CDP at implementation time in addition to change management time spent temporarily enabling and disabling CDP later on to aid troubleshooting.  Most secure from the network and virtual platform standpoint.
  3. Enable bidirectional CDP globally, always on – Requires enabling CDP both (listen and advertise) at implementation time thereby providing comprehensive information for troubleshooting later on.  Least secure; both network and virtual platform information is exposed by CDP advertisements to neighbors.

I’ve worked with organizations who implement one, of or a combination of all three.  As with many design decisions, philosophy and justifications will vary.  A decision here could be made based on the size of the datacenter, distribution of roles, security approach, or the vertical which the business operates in (think regulatory compliance).  CDP is of course beneficial to network and virtual platform owners but it can also aid a hacker who has penetrated the environment thereby becoming a sharing recipient of the same network information.  Speaking for myself, I’ve gotten a lot of operational benefits while leveraging CDP for troubleshooting.  Network engineers often ask me to configure CDP for advertising on the host side.  What helps them ultimately helps me in a troubleshooting scenario and can ultimately shorten the time we spend focusing on an issue.  In customer facing or production environments, every minute of downtime costs and therefore counts.  My preference is to operate with CDP configured for listen on the host side.  This configuration provides the most bang for the buck as it the default out-of-box configuration on both the Cisco and VMware side.  In other words, if you do nothing at all, you can reap major benefits with the native configuration when it comes time to troubleshoot or provide capacity and/or SPOF planning for network resources.  That’s my preference.  That said, I get the security side of the discussion and of course I’m not opposed to disabling CDP when compelling requirements or constraints exist.

Aside from the design decisions above, I would be remiss if I did not also mention a potential stability issue (categorize as potential risk in your design) I came across from Cisco. When enabling CDP or leaving CDP enabled in an environment, there is a known CDP issue which should be taken into consideration because it can cause a disruption of the network.  CDP Can Consume All Router Memory.  When a large amount of CDP neighbor announcements are sent, it is possible to consume all memory of an available device. This causes a crash or other abnormal behavior. Refer to Cisco’s Response to the CDP Issue (Document ID: 13621) for more details.  This issue is quite old and may no longer be a threat with modern versions of IOS and NX-OS.

CDP is wonderful tool.  However, one obvious weakness in the heterogeneous datacenter is that it is vendor specific to Cisco switches and routers.  Other networking vendors don’t support CDP and therefore cannot integrate with it.  A newer and similar vendor neutral protocol called LLDP (Link Layer Discovery Protocol) appears to fill the need for the other vendors which choose support it.  At this time however VMware is not supporting LLDP though at least one source claims it is on the VMware roadmap which is a good thing.

In closing, I’d like to leave the audience with an Appendix style list of VMware and Cisco CDP commands, as well as a few links to additional Cisco resources on the web.  I would also like to thank Dawn for her contribution and eager willingness to collaborate with me on this article.

Update 11/17/11: Link Layer Discovery Protocol (LLDP) has been published

Appendix A: ESX(i) esxcfg-vswitch (or vicfg-vswitch) parameters:

-B or –set-cdp Set the CDP status for a given virtual switch. To set, pass one of “down”, “listen”, “advertise”, “both”.
-b or –get-cdp Print the current CDP setting for this switch.

Appendix B: Cisco switch commands (some require privileged mode):

cdp run Enables CDP globally (on by default).
cdp enable Enables CDP on an interface.
cdp advertise-v2 Enables CDP Version-2 advertising functionality on a device.
clear cdp counters Resets the traffic counters to zero.
clear cdp table Deletes the CDP table of information about neighbors.
debug cdp adjacency Monitors CDP neighbor information.
show cdp Displays global CDP information such as the interval between transmissions of CDP advertisements, the number of seconds the CDP advertisement is valid for a given port, and the version of the advertisement.
show cdp neighbors  Displays information about neighbors.
show cdp neighbors detail  Displays more detail about neighboring devices.
show cdp entry * Displays information about all devices.
show cdp interface [type number] Displays information about interfaces on which CDP is enabled.
show cdp traffic Displays CDP counters, including the number of packets sent and received and checksum errors.
cdp timer seconds Specifies frequency of transmission of CDP updates.
cdp holdtime seconds Specifies the amount of time a receiving device should hold the information sent by your device before discarding it.
no cdp run Turns off CDP globally.

Appendix C: Helpful CDP resources from Cisco and VMware:

Configuring Cisco Discovery Protocol (CDP)

Configuring Cisco Discovery Protocol on Cisco Routers and Switches Running Cisco IOS (Document ID: 43485)

Cisco Discovery Protocol (CDP) network information

Configuring the Cisco Discovery Protocol (CDP) with ESX

Q2 2011 Minneapolis Area VMware Users Group meeting

May 9th, 2011

Event: Q2 2011 Minneapolis Area VMware Users Group meeting

Spring is upon us and Minnesotan’s know what that means.. Q2 Minneapolis VMUG time! 

Friday May 20th, 2011 1:00 – 5:00 PM
Jason Boche, Minneapolis area VMUG leader – Email:  jason@boche.net

1:00 – 1:30 General business, Updates, Open Floor Discussions
   
1:30 – 2:15 Presentation:  Jeff Whitman, Senior SE and Tony MacDonald, Senior SE, VMware, Inc.“VMware Update, vCenter Operations presentation and demo”
   
2:15 – 2:25 Break
   
2:25 – 3:10 Presentation:  Greg Schmidt, Storage Architect, Hewlett Packard, Inc.: “Leveraging HP Storage Technology in a VMware Environment
   
3:10 – 3:20 Break
   
3:20 – 4:05 Presentation:  Matt Urbanowicz, Senior Systems Engineer and Josh Verhelst, Technical Architect, N’compass, Inc.: “Do You know Your Cost to Compute?”
   
4:05 – 4:15 Break
   
4:15 – 5:00 Q & A, Door Prizes, Closing

 

Stick around to win great door prizes!
(Please bring business cards to enter your name in the door prize drawings)

Meeting Location, Snacks, and door prizes provided by Hewlett Packard, Inc., N’compass, Inc., and VMware, Inc.

 

Meeting Location:
Hilton Hotel 494 & France Ave.
3900 American Boulevard West
Bloomington, MN  55437
952-893-9500
Map: http://mapq.st/dIyDOF

VMware User Group Event Registration:
http://www.myvmug.org/e/in/eid=36&source=5

VMware User Group Membership Registration (subscribe):
http://info.vmware.com/forms/UserGroupSubscribe?session=Minneapolis

network bandwidth transfer.xlsx

March 19th, 2011

SnagIt CaptureMany years ago, before I got involved with VMware, before VMware existed in fact, I was a Systems Engineer supporting Microsoft Windows Servers.  I also dabbled in technology related things such as running game servers like Quake II and Half-Life Counter-Strike on the internet.  One area where these responsibilities intersected was the need to know the rate at which data could traverse a rated network segment in addition to the amount of time it would take for said data to travel from point A to point B. 

At that point in time, there wasn’t half a dozen free web based calculators which could be found via Google search.  As a result, I started an Excel spreadsheet.  It started out as a tool which would allow me to enter a value in KiloBytes, MegaBytes, or GigaBytes.  From there, it would calculate the amount of time it would take that data to travel across the wire.  This data was useful in telling me how many players the Counter-Strike could scale to, and it would provide an estimate for how much the bandwidth utilization was going to cost me per month.  I also used this information in the office to plan backup strategies, data transfer, and data replication.

I’ve expanded its capabilities slightly over the years as well as scaled it up to handle the volume of data we deal which has increased exponentially.  In addition to the functions it performed in the past, I added a data conversion section which translates anything to anything within the range of bits to YottaBytes.  It performs both Base 2 (binary) and Base 10 (decimal) calculations which are maintained on their own respective worksheet tabs.  I prefer to work with Base 2 because it’s old school and I believe it is the most accurate measure of data and conversion.  To this point, WikiPedia explains:

The relative difference between the values in the binary and decimal interpretations increases, when using the SI prefixes as the base, from 2.4% for kilo to over 20% for the yotta prefix.  This chart shows the growing percentage of the shortfall of decimal interpretations from binary interpretations of the unit prefixes plotted against the logarithm of storage size.

SnagIt Capture

However, Base 10 is much easier for the human brain to work with as the numbers are nice and round.  I believe this is how and why Base 10 became known as “Salesman Bytes” way back when.  I’ll be darned if I can find a reference to this term any longer in Google.

Long boring story short, this is a handy storage/network data conversion tool I still use from time to time today when working with large or varying numbers.  For those who don’t have a preferred tool for whatever use case, you’re welcomed to use the one I created.  A few notes:

  • Due to the extreme length of two of the formulas in the workbook, I had to upgrade it to Excel 2007 format at a minimum which is the reason for the file extension of .xlsx.
  • The data transfer section assumes the most optimal of conditions, no latency, etc.

Download network bandwidth transfer.xlsx (22.6KB)

Tiny Core Linux and Operational Readiness

February 28th, 2011

When installing, configuring, or managing VMware virtual infrastructure, one of the steps which should be performed before releasing a host (back) to production is to perform operational readiness tests.  One test which is quite critical is that of testing virtual infrastructure networking.  After all, what good is a running VM if it has no connectivity to the rest of the network?  Each ESX or ESXi host pNIC should be individually tested for internal and upstream connectivity, VLAN tagging functionality if in use (quite often it is), in addition to proper failover and fail back, and jumbo frames at the guest level if used.

There are several types of VMs or appliances which can be used to generate basic network traffic for operational readiness testing.  One that I’ve been using recently (introduced to me by a colleague) is Tiny Core Linux.  To summarize:

Tiny Core Linux is a very small (10 MB) minimal Linux GUI Desktop. It is based on Linux 2.6 kernel, Busybox, Tiny X, and Fltk. The core runs entirely in ram and boots very quickly. Also offered is Micro Core a 6 MB image that is the console based engine of Tiny Core. CLI versions of Tiny Core’s program allows the same functionality of Tiny Core’s extensions only starting with a console based system.

TCL carries with it a few of benefits, some of which are tied to its small stature:

  • The minimalist approach makes deployment simple.
  • At just 10MB, it’s extremely portable and boots fast.
  • As a Linux OS, it’s freely distributable without the complexities of licensing or activation.
  • It’s compatible with VMware hardware 7 and the Flexible or E1000 vNIC making it a good network test candidate.
  • No installation is required.  It runs straight from an .ISO file or can boot from a USB drive.
  • Point and click GUI interface provides ease of use and configuration for any user.
  • When deployed with internet connectivity, it has the ability to download and install useful applications from an online repository such as Filezilla or Firefox.  There are tons of free applications in the repository.

As I mentioned before, deployment of TCL is pretty easy.  Create a VM shell with the following properties:

  • Other Linux (32-bit)
  • 1 vCPU
  • 256MB RAM
  • Flexible or E1000 vNIC
  • Point the virtual CD/DVD ROM drive to the bootable .ISO
  • No HDD or SCSI storage controller required

First boot splash screen.  Nothing real exciting here other than optional boot options which aren’t required for the purposes of this article.  Press Enter to continue the boot process:

SnagIt Capture

After pressing Enter, the boot process is briefly displayed:

SnagIt Capture

Once booted, the first step would be to configure the network via the Panel applet at the bottom of the Mac like menu:

SnagIt Capture

If DHCP is enabled on the subnet, an address will be automatically acquired by this point.  Otherwise, give eth0 a static TCP/IP configuration.  Name Servers are optional and not required for basic network connectivity unless you would like to test name resolution in your virtual infrastructure:

SnagIt Capture

Once TCP/IP has been configured, a Terminal can be opened up and a basic ping test can be started.  Change the IP address and vNIC portgroup to test different VLANs but my suggestion would be to spawn multiple TCL instances, one per each VLAN to test because you’ll need to vMotion the TCL VMs to each host being tested.  You don’t want to continuously be modifying the TCP/IP configuration:

SnagIt Capture

What else of interest is in the Panel applet besides Network configuration?  Some ubiquitous items such as date/time configuration, disk and terminal services tools, and wallpaper configuration:

SnagIt Capture

The online application repository is packed with what seems like thousands of apps:

SnagIt Capture

After installing FileZilla, it’s available as an applet:

SnagIt Capture

FileZilla is fully functional:

SnagIt Capture

So I’ve only been using Tiny Core Linux as a network testing appliance, but clearly it has some other uses when paired with extensible applications.  A few other things that I’ll point out is:

  1. TCL can be Suspended in order to move it to other clusters (with compatible CPUs) so that both a host and a storage migration can be performed in a single step.  Once TCL reaches its destination cluster, Unsuspend.
  2. During my tests, TCL will continue to run without issue after being severed from its boot .ISO.  This is possible because it is booted into RAM where it continues to run from that point on.

I’ve been watching Tiny Core Linux for several months and the development efforts appear fairly aggressive and backed by an individual or group with a lot of talent and energy which is good to see.  As of this writing, version 3.5 is available.  Give Tiny Core Linux a try.

Jumbo Frames Comparison Testing with IP Storage and vMotion

January 24th, 2011

Are you thinking about implementing jumbo frames with your IP storage based vSphere infrastructure?  Have you asked yourself why or thought about the guaranteed benefits? Various credible sources discuss it (here’s a primer).  Some will highlight jumbo frames as a best practice but the majority of what I’ve seen and heard talk about the potential advantages of jumbo frames and what the technology might do to make your infrastructure more efficient.  But be careful to not interpret that as an order of magnitude increase in performance for IP based storage.  In almost all cases, that’s not what is being conveyed, or at least, that shouldn’t be the intent.  Think beyond SPEED NOM NOM NOM.  Think efficiency and reduced resource utilization which lends itself to driving down overall latency.  There are a few stakeholders when considering jumbo frames.  In no particular order:

  1. The network infrastructure team: They like network standards, best practices, a highly performing and efficient network, and zero down time.  They will likely have the most background knowledge and influence when it comes to jumbo frames.  Switches and routers have CPUs which will benefit from jumbo frames because processing less frames but more payload overall makes the network device inherently more efficient while using less CPU power and consequently producing less heat.  This becomes increasingly important on 10Gb networks.
  2. The server and desktop teams: They like performance and unlimited network bandwidth provided by magic stuff, dark spirits, and friendly gnomes.  These teams also like a postive end user experience.  Their platforms, which include hardware, OS, and drivers, must support jumbo frames.  Effort required to configure for jumbo frames increases with a rising number of different hardware, OS, and driver combinations.  Any systems which don’t support network infrastructure requirements will be a showstopper.  Server and desktop network endpoints benefit from jumbo frames much of the same way network infrastructure does: efficiency and less overhead which can lead to slightly measurable amounts of performance improvement.  The performance gains more often than not won’t be noticed by the end users except for process that historically take a long amount of time to complete.  These teams will generally follow infrastructure best practies as instructed by the network team.  In some cases, these teams will embark on an initiative which recommends or requires a change in network design (NIC teaming, jumbo frames, etc.).
  3. The budget owner:  This can be a project sponsor, departmental manager, CIO, or CEO.  They control the budget and thus spending.  Considerable spend thresholds require business justification.  This is where the benefit needs to justify the cost.  They are removed from the most of the technical persuasions.  Financial impact is what matters.  Decisions should align with current and future architectural strategies to minimize costly rip and replace.
  4. The end users:  Not surprisingly, they are interested in application uptime, stability, and performance.  They could care less about the underlying technology except for how it impacts them.  Reduction in performance or slowness is highly visible.  Subtle increases in performance are rarely noticed.  End user perception is reality.

The decision to introduce jumbo frames should be carefully thought out and there should be a compelling reason, use case, or business justification which drives the decision.  Because of the end to end requirements, implementing jumbo frames can bring with it additional complexity and cost to an existing network infrastructure.  Possibly the single best one size fits all reason for a jumbo frames design is a situation where jumbo frames is already a standard in the existing network infrastructure.  In this situation, jumbo frames becomes a design constraint or requirement.  The evangelistic point to be made is VMware vSphere supports jumbo frames across the board.  Short of the previous use case, jumbo frames is a design decision where I think it’s important to weigh cost and benefit.  I can’t give you the cost component as it is going to vary quite a bit from environment to environment depending on the existing network design.  This writing speaks more to the benefit component.  Liberal estimates claim up to 30% performance increase when integrating jumbo frames with IP storage.  The numbers I came up with in lab testing are nowhere close to that.  In fact, you’ll see a few results where IO performance with jumbo frames actually decreased slightly.  Not only do I compare IO with or without jumbo frames, I’m also able to compare two storage protocols with and without jumbo frames which could prove to be an interesting sidebar discussion.

I’ve come across many opinions regarding jumbo frames.  Now that I’ve got a managed switch in the lab which supports jumbo frames and VLANs, I wanted to see some real numbers.  Although this writing is primarily regarding jumbo frames, by way of the testing regimen, it is in some ways a second edition to a post I created one year ago where I compared IO performance of the EMC Celerra NS-120 among its various protocols. So without further ado, let’s get onto the testing.

 

Lab test script:

To maintain as much consistency and integrity as possible, the following test criteria was followed:

  1. One Windows Server 2003 VM with IOMETER was used to drive IO tests.
  2. A standardized IOMETER script was leveraged from the VMTN Storage Performance Thread which is a collaboration of storage performance results on VMware virtual infrastructure provided by VMTN Community members around the world.  The thread starts here, was locked due to length, and continues on in a new thread here.  For those unfamiliar with the IOMETER script, it basically goes like this: each run consists of a two minute ramp up followed by five minutes of disk IO pounding.  Four different IO patterns are tested independently.
  3. Two runs of each test were performed to validate consistent results.  A third run was performed if the first two were not consistent.
  4. One ESXi 4.1 host with a single IOMETER VM was used to drive IO tests.
  5. For the mtu1500 tests, IO tests were isolated to one vSwitch, one vmkernel portgroup, one vmnic, one pNIC (Intel NC360T PCI Express), one Ethernet cable, and one switch port on the host side.
  6. For the mtu1500 tests, IO tests were isolated to one cge port, one datamover, one Ethernet cable, and one switch port on the Celerra side.
  7. For the mtu9000 tests, IO tests were isolated to the same vSwitch, a second vmkernel portgroup configured for mtu9000, the same vmnic, the same pNIC (Intel NC360T PCI Express), the same Ethernet cable, and the same switch port on the host side.
  8. For the mtu9000 tests, IO tests were isolated to a second cge port configured for mtu9000, the same datamover, a second Ethernet cable, and a second switch port on the Celerra side.
  9. Layer 3 routes to between host and storage were removed to lessen network burden and to isolate storage traffic to the correct interfaces.
  10. 802.1Q VLANs were used isolate traffic and categorize standard traffic versus jumbo frame traffic.
  11. RESXTOP was used to validate storage traffic was going through the correct vmknic.
  12. Microsoft Network Monitor and Wireshark were used to validate frame lengths during testing.
  13. Activities known to introduce large volumes of network or disk activity were suspended such as backup jobs.
  14. Dedupe was suspended on all Celerra file systems to eliminate datamover contention.
  15. All storage tests were performed on thin provisioned virtual disks and datastores.
  16. The same group of 15 spindles were used for all NFS and iSCSI tests.
  17. The uncached write mechanism was enabled on the NFS file system for all NFS tests.  You can read more about that in the following EMC best practices document VMware ESX Using EMC Celerra Storage Systems

Lab test hardware:

SERVER TYPE: Windows Server 2003 R2 VM on ESXi 4.1
CPU TYPE / NUMBER: 1 vCPU / 512MB RAM (thin provisioned)
HOST TYPE: HP DL385 G2, 24GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K / 3x RAID5 5×146
SAN TYPE: / HBAs: NFS / swiSCSI / 1Gb datamover ports (sorry, no FCoE)
OTHER: 3Com SuperStack 3 3870 48x1Gb Ethernet switch

 

Lab test results:

NFS test results.  How much better is NFS performance with jumbo frames by IO workload type?  The best result seen here is about a 7% performance increase by using jumbo frames, however, 100% read is a rather unrealistic representation of a virtual machine workload.  For NFS, I’ll sum it up as a 0-3% IOPS performance improvement by using jumbo frames.

SnagIt Capture

SnagIt Capture

iSCSI test results.  How much better is iSCSI performance with jumbo frames by IO workload type?  Here we see that iSCSI doesn’t benefit from the move to jumbo frames as much as NFS.  In two workload pattern types, performance actually decreased slightly.  Discounting the unrealistic 100% read workload as I did above, we’re left with a 1% IOPS performance gain at best by using jumbo frames with iSCSI.

SnagIt Capture

SnagIt Capture

NFS vs iSCSI test results.  Taking the best results from each protocol type, how do the protocol types compare by IO workload type?  75% of the best results came from using jumbo frames.  The better performing protocol is a 50/50 split depending on the workload pattern.  One interesting observation to be made in this comparison is how much better one protocol performs over the other.  I’ve heard storage vendors state that the IP protocol debate is a snoozer, they preform roughly the same.  I’ll grant that in two of the workload types below, but in the other two, iSCSI pulls a significant performance lead over NFS. Particularly in the Max Throughput-50%Read workload where iSCSI blows NFS away.  That said, I’m not outright recommending iSCSI over NFS.  If you’re going to take anything away from these comparisons, it should be “it depends”.  In this case, it depends on the workload pattern, among a handful of other intrinsic variables.  I really like the flexibility in IP based storage and I think it’s hard to go wrong with either NFS or iSCSI.

SnagIt Capture

SnagIt Capture

vMotion test results.  Up until this point, I’ve looked at the impact of jumbo frames on IP based storage with VMware vSphere.  For curiosity sake, I wanted to to address the question “How much better is vMotion performance with jumbo frames enabled?”  vMotion utilizes a VMkernel port on ESXi just as IP storage does so the ground work has already been established making this a quick test.  I followed roughly the same lab test script outlined above so that the most consistent and reliable results could be produced.  This test wasn’t rocket science.  I simply grabbed a few different VM workload types (Windows, Linux) with varying sizes of RAM allocated to them (2GB, 3GB, 4GB).  I then performed three batches of vMotions of two runs each on non jumbo frames (mtu1500) and jumb frames (mtu9000).  Results varied.  The first two batches showed that jumbo frames provided a 7-15% reduction in elapsed vMotion time.  But then the third and final batch contrasted previous results with data revealing a slight decrease in vMotion efficiency with jumbo frames.  I think there’s more variables at play here and this may be a case where more data sampling is needed to form any kind of reliable conclusion.  But if you want to go by these numbers, vMotion is quicker on jumbo frames more often than not.

SnagIt Capture

SnagIt Capture

The bottom line:

So what is the bottom line on jumbo frames, at least today?  First of all my disclaimer:  My tests were performed on an older 3Com network switch.  Mileage may vary on newer or different network infrastructure.  Unfortunately I did not have access to a 10Gb lab network to perform this same testing.  However, I believe my findings are consistent with the majority of what I’ve gathered from the various credible sources.  I’m not sold on jumbo frames as a provider of significant performance gains.  I wouldn’t break my back implementing the technology without an undisputable business justification.  If you want to please the network team and abide by the strategy of an existing jumbo frames enabled network infrastructure, then use jumbo frames with confidence.  If you want to be doing everything you possibly can to boost performance from your IP based storage network, use jumbo frames.  If you’re betting the business on IP based storage, use jumbo frames.  If you need a piece of plausible deniability when IP storage performance hits the fan, use jumbo frames. If you’re looking for the IP based storage performance promise land, jumbo frames doesn’t get you there by itself.  If you come across a source telling you otherwise, that jumbo frames is the key or sole ingredient to the Utopia of incomprehendable speeds, challenge the source.  Ask to see some real data.  If you’re in need of a considerable performance boost of your IP based storage, look beyond jumbo frames.  Look at optimizing, balancing, or upgrading your back end disk array.  Look at 10Gb.  Look at fibre channel.  Each of these alternatives are likely to get you better overall performance gains than jumbo frames alone.  And of course, consult with your vendor.

Flow Control

November 29th, 2010

Thanks to the help from blog sponsorship, I’m able to maintain a higher performing lab environment than I ever had been up until this point.  One area which I hadn’t invested much in, at least from a lab standpoint, is networking.  In the past, I’ve always had some sort of small to mid density unmanageable Ethernet switch.  And this was fine.  Household name brand switches like Netgear and SMC from Best Buy and NewEgg performed well enough and survived for years in the higher temperature lab environment.  Add to that, by virtue of being unmanaged, they were plug and play.  No time wasted fighting a mis configured network. 

I recently picked up a 3Com SuperStack 3 Switch 3870 (48 1GbE ports).  It’s not 10GbE but it does fit my budget along with a few other networking nice-to-haves like VLANs and Layer 3 routing.  Because this switch is managed, I can now apply some best practices from the IP based storage realm.  One of those best practices is configuring Flow Control for VMware vSphere with network storage.  This blog post is mainly to record some pieces of information I’ve picked up along the way and to open a dialog with network minded readers who may have some input.

So what is network Flow Control? 

NetApp defines Flow Control in TR-3749 as “the process of managing the rate of data transmission between two nodes to prevent a fast sender from over running a slow receiver.”  NetApp goes on to advise that Flow Control can be set at the two endpoints (ESX(i) host level and the storage array level) and at the Ethernet switch(es) in between.

Wikipedia is in agreement with the above and adds more meat to the discussion including the following “The overwhelmed network element will send a PAUSE frame, which halts the transmission of the sender for a specified period of time. PAUSE is a flow control mechanism on full duplex Ethernet link segments defined by IEEE 802.3x and uses MAC Control frames to carry the PAUSE commands. The MAC Control opcode for PAUSE is 0X0001 (hexadecimal). Only stations configured for full-duplex operation may send PAUSE frames.

What are network Flow Control best practices as they apply to VMware virtual infrastructure with NFS or iSCSI network storage?

Both NetApp and EMC agree that Flow Control should be enabled in a specific way at the endpoints as well as at the Ethernet switches which support the flow of traffic:

  • Endpoints (that’s the ESX(i) hosts and the storage arrays) should be configured with Flow Control send/tx on, and receive/rx off.
  • Supporting Ethernet switches should be configured with Flow Control “Desired” or send/tx off and receive/rx on.

One item to point out here is that although both mainstream storage vendors recommend these settings for VMware infrastructures as a best practice, neither of their multi protocol arrays ship configured this way.  At least not the units I’ve had my hands on which includes the EMC Celerra NS-120 and the NetApp FAS3050c.  The Celerra is configured out of the box with Flow Control fully disabled and I found the NetApp configured for Flow Control set to full (duplex?).

Here’s another item of interest.  VMware vSphere hosts are configured out of the box to auto negotiate Flow Control settings.  What does this mean?  Network interfaces are able to advertise certain features and protocols which they were purpose built to understand following the OSI model and RFCs of course.  One of these features is Flow Control.  VMware ESX ships with a Flow Control setting which adapts to its environment.  If you plug an ESX host into an unmanaged switch which doesn’t advertise Flow Control capabilities, ESX sets its tx and rx flags to off.  These flags tie specifically to PAUSE frames mentioned above.  When I plugged in my ESX host into the new 3Com managed switch and configured the ports for Flow Control to be enabled, I subsequently found out using the ethtool -a vmnic0 command that both tx and rx were enabled on the host (the 3Com switch has just one Flow Control toggle: enabled or disabled).  NetApp provides a hint to this behavior in their best practice statement which says “Once these [Flow Control] settings have been configured on the storage controller and network switch ports, it will result in the desired configuration without modifying the flow control settings in ESX/ESXi.”  Jase McCarty pointed out back in January a “feature” of the ethtool in ESX.  Basically, ethtool can be used to display current Ethernet adapter settings (including Flow Control as mentioned above) and it can also be used to configure settings.  Unfortunately, when ethtool is used to hard code a vmnic for a specific Flow Control configuration, that config lasts until the next time ESX is rebooted.  After reboot, the modified configuration does not persist and it reverts back to auto/auto/auto.  I tested with ESX 4.1 and the latest patches and the same holds true.  Jase offers a workaround in his blog post which allows the change to persist by embedding it in /etc/rc.local.

Third item of interest.  VMware KB 1013413 talks about disabling Flow Control using esxcfg-module for Intel NICs and ethtool for Broadcom NICs.  This article specifically talks about disabling Flow Control when PAUSE frames are identified on the network.  If PAUSE frames are indicative of a large amount of traffic which a receiver isn’t able to handle, it would seem to me we’d want to leave Flow Control enabled (by design to mediate the congestion) and perform root cause analysis on exactly why we’ve hit a sustained scaling limit (and what do we do about it long term).

Fourth.  Flow Control seems to be a simple mechanism which hinges on PAUSE frames to work properly.  If the Wikipedia article is correct in that only stations configured for full-duplex operation may send PAUSE frames, then it would seem to me that both network endpoints (in this case ESX(i) and the IP based storage array) should be configured with Flow Control set to full duplex, meaning both tx and rx ON.  This conflicts with the best practice messages from EMC and NetApp although it does align with the FAS3050 out of box configuration.  The only reasonable explanation is that I’m misinterpreting the meaning of full-duplex here.

Lastly, I’ve got myself all worked up into a frenzy over the proper configuration of Flow Control because I want to be sure I’m doing the right thing from both a lab and infrastructure design standpoint, but in the end Flow Control is like the Shares mechanism in VMware ESX(i):  The values or configurations invoked apply only during periods of contention.  In the case of Flow Control, this means that although it may be enabled, it serves no useful purpose until a receiver on the network says “I can’t take it any more” and sends the PAUSE frames to temporarily suspend traffic.  I may never reach this tipping point in the lab but I know I’ll sleep better at night knowing the lab is configured according to VMware storage vendor best practices.