Posts Tagged ‘ESXi’

Veeam Reporter 4.0 Free Edition

August 16th, 2010

SnagIt CaptureToday, Veeam has launched a new free version of an existing product which you may already be familiar with: Veeam Reporter Free Edition.  Veeam Reporter is an enterprise virtual infrastructure tool which is best described by Veeam on their product page:

Veeam Reporter™ discovers, documents and analyzes your entire virtual infrastructure. It maintains a complete history of all objects, settings and changes. And it trends performance and utilization. So you can really understand your virtual infrastructure—past, present and future.

When it comes to documenting and reporting on your virtual infrastructure, Reporter does it all.

This new free version contains most of the features of the full version.  The free edition can easily be upgraded to the full version of Veeam Reporter to gain these additional capabilities (A features comparison can found here):

  • Capacity planning (report pack)
  • Historical change management (beyond the most recent 24 hours)
  • Microsoft Visio reports for multipathing, network, vMotion, and datastore utilization
  • Full access to archive data—to create custom reports or update your configuration management database (CMDB)
  • Full dashboard capabilities
  • Automatic report distribution

I was invited by Veeam to take a look at the beta version of Veeam Reporter Free Edition.  I’ve captured some of my experience and documented it here.

Installation

Installation of Veeam Reporter Free Edition is fairly straightforward but I should disclose that I’m working with a beta (pre GA) version.  I installed on Microsoft Windows Server 2008 R2 Standard (64-bit only) which is my preferred platform, if supported by the vendor’s product (Veeam Reporter supports it).  Veeam Reporter requires Microsoft .NET Framework 3.5.1.  In Windows Server 2008 R2, this is installed as a Feature:

SnagIt Capture

If installing the Veeam Reporter’s Web UI (the default), the IIS Role is also required during the .NET Framework instllation…plus a few extra roles:

 SnagIt Capture

SnagIt Capture

During the beta, I ran into a JavaScrip error message after the installation was complete:

8-15-2010 8-45-45 PM

As it turns out, the issue has nothing to do with JavaScript, rather, the Static Content Role must be installed for IIS:

8-16-2010 6-31-57 PM

During the Veeam Reporter installation routine, I also installed the Microsoft PowerShell component which is optional:

SnagIt Capture

The Veeam Reporter PowerShell snap-in enables users to perform reporting tasks by running single cmdlets or custom automation scripts via the command-line interface.  The PowerShell SnapIn ReporterDBSnapIn is installed which adds the following Veeam Reporter specific cmdlets to the PowerShell environment:

CONNECT-VRVISERVER
DISCONNECT-VRVISERVER
GET-VRVM
GET-VRVMHOST
GET-VRDATASTORE
GET-VRRESOURCEPOOL
GET-VRCLUSTER
GET-VRSNAPSHOT
GET-VRCURRENTDATE
SET-VRCURRENTDATE

As is quite common with virtualization management tools, including VMware vCenter itself, a back end database is required for the storage of datacenter information.  Veeam Reporter has the ability to leverage an existing Microsoft SQL Server.  In the absence of a dedicated SQL server, Veeam Reporter will install Microsoft SQL Express and integrate with it locally.  Installation of a local SQL Express instance takes quite some time as the necessary SQL binaries (including SP1) are downloaded at this time (this also implies internet connectivity from the Veeam Reporter server is required).

SnagIt Capture

A logoff/logon is required at the end of the installation as opposed to a system reboot:

SnagIt Capture

Configuration

Now that the installation is complete, the next step is to configure Veeam Reporter Free Edition.  There’s really not much to the initial configuration or data collection.  Add to that, the installation and data collection process is agentless – a definite plus. 

So before any data can be displayed, it needs to be collected from the vCenter Server(s).  This is handled by creating a Collection Job which points at the vCenter Server and pulls in the data that Veeam uses.  A collection job should be scheduled to run periodically so that it grabs updated data at regular intervals.  I set up a Collection Job to run automatically once per day at midnight.  For the purposes of instant gratification, I manually ran the job to get some data:

8-16-2010 8-54-55 PM

In addition to configuring a Collection Job, I also set up a few of the ancillary items one would commonly find in reporting and management applications such as an Email server.

Now that I have some data, I can start creating useful reports and that’s where the fun begins.  I will cover some of the reports in the next update so stay tuned.

In the mean time, download your copy of Veeam Reporter Free Edition today and get started!

 

vSphere 4.1: Multicore Virtual CPUs

July 25th, 2010

With the release of vSphere 4.1, VMware has introduced Multicore Virtual CPU technology to its bare metal flagship hypervisor.  This is an interesting feature which had already existed in current versions of VMware Workstation.  VMware has consistently baked in new features in its Type 2 hypervisor products, such as Workstation, Player, Fusion, etc., more or less as a functionality/stability test before releasing the same features in ESX(i).  VMware highlights this new feature as follows:

User-configurable Number of Virtual CPUs per Virtual Socket: You can configure virtual machines to have multiple virtual CPUs reside in a single virtual socket, with each virtual CPU appearing to the guest operating system as a single core. Previously, virtual machines were restricted to having only one virtual CPU per virtual socket. See the vSphere Virtual Machine Administration Guide.

VMware multicore virtual CPU support lets you control the number of cores per virtual CPU in a virtual machine. This capability lets operating systems with socket restrictions use more of the host CPU’s cores, which increases overall performance.

Using multicore virtual CPUs can be useful when you run operating systems or applications that can take advantage of only a limited number of CPU sockets. Previously, each virtual CPU was, by default, assigned to a single-core socket, so that the virtual machine would have as many sockets as virtual CPUs.

You can configure how the virtual CPUs are assigned in terms of sockets and cores. For example, you can configure a virtual machine with four virtual CPUs in the following ways:

  • Four sockets with one core per socket (legacy, this is how we’ve always done it prior to vSphere 4.1)
  • Two sockets with two cores per socket (new in vSphere 4.1)
  • One socket with four cores per socket (new in vSphere 4.1)

VMware defines a CPU as:

The portion of a computer system that carries out the instructions of a computer program and is the primary element carrying out the computer’s functions.

VMware defines a Core as:

A logical execution unit containing an L1 cache and functional units needed to execute programs. Cores can independently execute programs or threads.

VMware defines a Socket as:

A physical connector on a computer motherboard that accepts a single physical chip. Many motherboards can have multiple sockets that can in turn accept multicore chips.

One of the benefits of multicore which physical computing had was increased density of the hardware.  VMs do not share this advantage as they are virtual to begin with and have no rack footprint to speak of.

VMware’s benefit statement for this feature is a legitimate one and is the primary use case.  It’s the same benefit which applied when multicore (as well as hyperthreading to some extent) technology was introduced to physical servers.  What VMware doesn’t advertise is that the limitation being discussed usually revolves around software licensing - a per-socket license model to be precise which is what many software vendors still use.  For example, if I own a piece of software and I have a single socket license, traditionally I was only able to use this software inside of a single vCPU VM.  With Multicore Virtual CPUs, Virtual Machines have now caught up with their physcial hardware counterparts in that a single socket VM can be created which has 4 cores per socket.  Using the working example, the advantage I have now is that I can run my application inside a VM which still has 1 socket, but 4 cores for a net result of 4 vCPUs instead of just 1 vCPU.  I didn’t have to pay my software vendor additional money for the added CPU power.  To show how this translates into dollars and cents, let’s assume a per socket license cost of my application to be $1,000 and then extrapolate those numbers using VMware’s example above of how CPUs can be assigned in terms of sockets and cores:

  • Four sockets with one core per socket = $1,000 x 4 sockets = $4,000 net license cost, 4 CPUs
  • Two sockets with two cores per socket = $1,000 x 2 sockets = $2,000 net license cost, 4 CPUs
  • One socket with four cores per socket = $1,000 x 1 socket = $1,000 net license cost, 4 CPUs
  •  

    Now, all of this said, the responsibility is on the end user to be in license compliance with his or her software vendors.  Just becasue you can do this doens’t mean you’re legally obliged to do so.  Be sure to read your EULA and check with your software vendor or reseller before implementing VMware Multicore Virtual CPUs.

    Implementation of Multicore Virtual CPUs was quite straightfoward in VMware Workstation.  Upon creating a new VM or editing an existing VM’s settings, the following interface was presented for configuring vCPUs and cores per vCPU in VMware Workstation.  In this example, a 2xDC (Dual Core) configuration is being applied which results in a total of 4 CPU cores which will serve the VM’s operating system, applications, and users. Note that here, the term “processors” on the first line translates to “sockets”:

    7-25-2010 11-39-53 AM

    Making the same 2xDC CPU configuration in vSphere 4.1 isn’t difficult but nonetheless it is done differently.  Configuring total vCPUs and cores per vCPU is achieved by applying configurations in two different areas of the VM configuration. The combination of the two configurations produces a mathematical calculation which ultimately determines cores per vCPU.

    First of all, the total number of cores (processors) is selected in the VM’s CPU configuration.  This hasn’t changed and should be familiar to you.  The number of cores (processors) available for selection here is going to be 1 thru 4 or 1 thru 8 if you have Enterprise Plus licensing.  I’ve purposely included the notation of the VM hardware version 7 which is required. An inconsistency here compared to VMware Workstation is that the term “virtual processors” translates to “cores”, not “sockets”:

     7-25-2010 11-41-09 AM

    Configuring the number of cores per processor is where VMware has deviated from the VMware Workstation implementation.  In ESX and ESXi, this configuration is made as an advanced setting in the .vmx file.  Edit the VM settings, navigate to the Options tab, choose General in the Advanced options list. Click the Configuration Parameters button which allows you to edit the .vmx file on a row by row basis.  Click the Add Row button and add the line item cpuid.coresPerSocket. For the value, your going to supply the number of cores per processor which is generally going to be a value of 2, 4, or 8 (Enterprise Plus licensing required).  Note, using a value of 1 here would serve no practical purpose because it would configure a single core vCPU which is what we’ve had all along up until this point:

    7-25-2010 11-45-38 AM

    As a supplement, here are the requirements for implementing Multicore Virtual CPUs:

    • VMware vSphere 4.1 (vCenter 4.1, ESX 4.1 or ESXi 4.1).
    • Virtual Machine hardware version 7 is required.
    • The VM must be powered off to configure Multicore Virtual CPUs.
    • The total number of vCPUs for the VM divided by the number of cores per socket must be a positive integer.
    • The cpuid.coresPerSocket value must be a power of 2. The documentation explicitely states a value of 2, 4, or 8 is required, but 1 works as well although as stated before it would serve no practical purpose.
      • 2^0=1 (anything to the power of 0 always equals 1)
      • 2^1=2 (anything to the power of 1 always equals itself)
      • 2^2=4
      • 2^4=8
    • When you configure multicore virtual CPUs for a virtual machine, CPU hot Add/Remove is disabled (previously called CPU hot plug).
    • You must be in compliance with the requirements of the operating system EULA.

    This feature rocks and I think customers have been waiting a long time for it.  Duncan mentioned it quite some time ago but obvioulsy it was unsupported at that time.  I am a little puzzled by the implementation mechanisms, mainly the configuration of the .vmx to specify cores per CPU.  I suppose it lends itself to scriptability and thus automation, but in that sense, we lack the flexibility to configure cores per CPU with guest customization when deploying VMs from a template.  Essentially this means cores per CPU needs to be hard coded in each of my templates or cores per CPU needs to be manually tuned after deploying each VM from a template.  When I take a step back, I guess that’s no different than any other virtual hardware configuration stored in templates, but with the cores per CPU setting being buried in the .vmx as an advanced setting, it’s that much more of a manal/administrative burden to configure cores per CPU for each VM deployed than it is to simply change the number of CPUs or amount of RAM.  It would be nice if the guest customization process offered a quick way to configure cores per processor.

    OVF? OVA? WTF?

    July 2nd, 2010

    If you’ve worked with recent versions of VMware virtual infrastructure, Converter, or Workstation, you may be familiar with the fact that these products have the native ability to work with virtual machines in the Open Virtualization Format, or OVF for short.  OVF is a Specification governed by the DMTF (Distributed Management Task Force) which to me sounds a lot like RFCs which provide standards for protocols and communication across compute platforms – basically SOPs for how content is delivered on the internet as we know it today.

    So if there’s one standard, why is it that when I choose to create an OVF (Export OVF Template in the vSphere Client), I’m prompted to create either an OVF or an OVA?  If the OVF is an OVF, then what’s an OVA?

     7-2-2010 8-00-01 PM

    Personally, I’ve seen both formats, typically when deploying packaged appliances.  The answer is simple: Both the OVF and the OVA formats roll up into the Specification defined by the DMTF.  The difference between the two is in the presentation and encapsulation.  The OVF is a construct of a few files, all of which are essential to its definition and deployment.  The OVA on the other hand is a single file with all of the necessary information encapsulated inside of it.  Think of the OVA as an archive file.  The single file format provides ease in portability.  From a size or bandwidth perspective, there is no advantage between one format or the other as they each tend to be the same size when all is said and done.

    7-2-2010 8-13-26 PM

    The DMTF explains the two formats on pages 12 through 13 in the PDF linked above:

    An OVF package may be stored as a single file using the TAR format. The extension of that file shall be .ova (open virtual appliance or application).

    An OVF package can be made available as a set of files, for example on a standard Web server.

    Do keep in mind that which ever file type you choose to work with, if you plan on hosting them on a web server, MIME types will need to be set up for .OVF, OVA, or both, in order for a client to download them for deployment onto your hypervisor.

    At 41 pages, the OVF Specification contains a surprising amount of detail.  There’s more to it than you might think, and for good reason:

    The Open Virtualization Format (OVF) Specification describes an open, secure, portable, efficient and extensible format for the packaging and distribution of software to be run in virtual machines.

    Open, meaning cross platform (bring your own hypervisor).  Combined with Secure and Portable attributes, OVF may be one of the key technologies for intracloud and intercloud mobility.  The format is a collaborative effort spawned from a variety of contributors:

    Simon Crosby, XenSource
    Ron Doyle, IBM
    Mike Gering, IBM
    Michael Gionfriddo, Sun Microsystems
    Steffen Grarup, VMware (Co-Editor)
    Steve Hand, Symantec
    Mark Hapner, Sun Microsystems
    Daniel Hiltgen, VMware
    Michael Johanssen, IBM
    Lawrence J. Lamers, VMware (Chair)
    John Leung, Intel Corporation
    Fumio Machida, NEC Corporation
    Andreas Maier, IBM
    Ewan Mellor, XenSource
    John Parchem, Microsoft
    Shishir Pardikar, XenSource
    Stephen J. Schmidt, IBM
    René W. Schmidt, VMware (Co-Editor)
    Andrew Warfield, XenSource
    Mark D. Weitzel, IBM
    John Wilson, Dell

    Take a look at the OVF Specifications document as well as some of the other work going on at DTMF. 

    Have a great and safe July 4th weeekend, and congratulations to the Dutch on their win today in World Cup Soccer.  I for one will be glad when it’s all over with and our Twitter APIs can return to normal again.

    vSphere Cluster Showing Noncompliant on the Profile Compliance Tab

    June 24th, 2010

    To troubleshoot a vSphere cluster showing Noncompliant on the Profile Compliance tab, check the following:

    FT logging NIC speed is at least 1000 Mbps
    At least one shared datastore exists
    FT logging is enabled
    VMotion NIC speed is at least 1000 Mbps
    All the hosts in the cluster have the same build for Fault Tolerance
    The host hardware supports Fault Tolerance
    VMotion is enabled

    Read more at: http://kb.vmware.com/kb/1017471

    Disable Copy and Paste for a VM

    June 23rd, 2010

    Security Tip: Disable Copy and Paste operations between the guest VM operating system and remote console by providing the following advanced parameters for the VM’s configuration (stored in the .vmx file):

    isolation.tools.copy.disable = true
    isolation.tools.paste.disable = true
    isolation.tools.setGUIOptions.enable = false

    Read more at: http://www.vmware.com/files/pdf/vi35_security_hardening_wp.pdf

    ESX and the Service Console Are Going Away

    June 17th, 2010

    ESX and the Service Console are going away.  Theories on this are evident - plastered all over the internet:  here, here, here, here, here, here, here, etc.

    Go to Google and perform the following search:

    esx “service console” “going away”

    Better yet, let me Google that for you (Thank you Doug for the introduction to this wonderfully funny tool!)

    ESXi was first introduced on December 10th, 2007.  We’ve had 2 1/2 years to get familiar with this hypervisor which is minimal in size but as feature rich, as powerful, and as fast as ESX.  The management tools for ESXi have evolved and the platform has been given its due time to prove its stability and viability as an enterprise bare metal hypervisor in the datacenter.

    I conducted an informal poll on Twitter this week and a large number of respondents claimed to still be using ESX.  More alarming was the disposition of some who have no plans whatsoever to go to ESXi.  One person went so far as to say the Service Console would have to be pried from his cold dead hands.

    If you have not yet broken your dependency on ESX and the Service Console, I suggest you do it soon.  Time is running out.  Don’t wait until the last minute.  Be sure you leave yourself enough time to architect and test a sound ESXi design for your datacenter, as well as get familiar with the tools you’ll be dependent on to manage the ESXi hypervisor.

    VMware Tools install – A general system error occurred: Internal error

    June 16th, 2010

    When you invoke the VMware Tools installation via the vSphere Client, you may encounter the error “A general system error occurred: Internal error“.

    6-16-2010 9-45-42 AM

    One thing to check is that the VM shell has the correct operating system selected for the guest operating system type.  For example, a setting of “Other (32-bit)” will cause the error since VMware cannot determine the correct version of the tools to install in the guest operating system because the flavor of guest operating system is unknown (ie. Windows or Linux).

    6-16-2010 10-45-49 AM

    Other causes for this error can be found at VMware KB Article 1004718:

    The virtual machine has CD-ROM configured.
    The windows.iso is present under the /vmimages/tools-iso/ folder.
    The virtual machine is powered on.
    The correct guest operating system selected. For example, if the guest operating system is Windows 200, ensure you have chosen Windows 2000 and not Other.

    NFS and Name Resolution

    June 11th, 2010

    Sometimes I take things for granted.  For instance, the health and integrity of the lab environment.  Although it is “lab”, I do run some workloads which are key to keep online on a regular basis. Primarily the web server which this blog is served from, the email server which is where I do a lot of collaboration, and the Active Directory Domain Controllers/DNS Servers which provide the authentication mechanisms, mailbox access, external host name resolution to fetch resources on the internet, and internal host name resolution.

    The workloads and infrastructure in my lab are 100% virtualized.  The only “physical” items I have are type 1 hypervisor hosts, storage, and network.  By this point I’ll assume most are familiar with the benefits of consolidation.  The downside is that when the wheels come off in a highly consolidated environment, the impacts can be severe as they fan out and tip over down stream dependencies like dominos.

    A few weeks ago I had decided to recarve the EMC Celerra fibre channel SAN storage.  The VMs which were running on the EMC fibre channel block storage were all moved to NFS on the NetApp filer.  Then last week, the Gb switch which supports all the infrastructure died.  Yes it was a single point of failure – it’s a lab.  The timing for that to happen couldn’t have been worse since all lab workloads were running on NFS storage.  All VMs had lost their virtual storage and the NFS connections on the ESX(i) hosts eventually timed out.

    The network switch was replaced later that day and since all VMs were down and NFS storage had disconnected, I took the opportunity to gracefully reboot the ESX(i) hosts; good time for a fresh start.  Not surprised, I had to use the vSphere Client to connect to each host by IP address since at that point I had no functional DNS name resolution in the lab whatsoever. When the hosts came back online, I was about to begin powering up VMs, but instead I encountered a situation which I hadn’t planned for – all the VMs were grayed out, esentially disconnected.  I discovered the cause of this was that after the host reboot, the NFS storage hadn’t come back online – both NetApp and EMC Celerra – on both hosts.  There’s no way both storage cabinets and/or both hosts were having a problem at the same time so I assumed it was a network or cabling problem. With the NFS mounts in the vSphere client staring back at me in their disconnected state, it dawned on me – lack of DNS name resolution was preventing the hosts from connecting to the storage.  The hosts could not resolve the FQDN name of the EMC Celerra or the NetApp filer storage.  I modified /etc/hosts on each ESX(i) host, adding the TCP/IP address and FQDN for the NetApp filer and Celerra Data Movers.  Shortly after I was back in business.

    What did I learn?  Not much.  It was more a reiteration of important design considerations which I was already aware of:

    1. 100% virtualization/consolidation is great – when it works.  The web of upstream/downstream dependencies makes it a pain when something breaks.  Consolidated dependencies which you might consider leaving physical or placing in a separate failure domain:
      • vCenter Management
      • Update Manager
      • SQL/Oracle back ends
      • Name Resolution (DNS/WINS)
      • DHCP
      • Routing
      • Active Directory/FSMO Roles/LDAP/Authentication/Certification Authorities
      • Mail
      • Internet connectivity
    2. Hardware redundancy is always key but expensive.  Perform a risk assessment and make a decision based on the cost effectiveness.
    3. When available, diversify virtualized workload locations to reduce failure domain, particularly to split workloads which provide redundant infrastructure support such as Active Directory Domain Controllers, DNS servers.  This can mean placing workloads on separate hosts, separate clusters, separate datastores, separate storage units, maybe even separate networks depending on the environment.
    4. Static entires in /etc/hosts isn’t a bad idea as a fallback if you plan on using NFS in an environment with unreliable DNS but I think the better point to discuss is the risk and pain which will be realized in deploying virtual infrastructure in an unreliable environment. Garbage In – Garbage Out.  I’m not a big fan of using IP addresses to mount NFS storage unless the environment is small enough.

    vSphere Upgrade Prerequisites Checklist

    May 27th, 2010

    Upgrading your virtual infrastructure to vSphere?  Be sure to check out this handy reference from VMware:  vSphere Upgrade Prerequisites Checklist.  There are several areas which need to be considered and this document covers them all, including both requirements and recommendations.  If you’re a consultant who visits new customer sites on a regular basis, it wouldn’t be a bad idea to bring this with to each engagement, or at least a condensed version of it.

    Happy Birthday vSphere!

    May 21st, 2010

    birthday-cake

    I was reminded by today’s vCalendar page that vSphere was launched by VMware one year ago today.  Happy Birthday Buddy – you set the bar which all other hypervisors aspire to be at one day.

    On this day in 2009, VMware vSphere, the next generation datacenter virtualization product and successor to Virtual Infrastructure 3 (VI3), was released boasting approximately 150 new features, new license tiers, and an amazing 350,000 I/O operations per second (IOPS). vSphere is a 64-bit only ESX host OS.

    Don’t have a vCalendar yet?  Get one!

    Top 10 Free vSphere ESX Tools and Utilities by KendrickColeman.com

    May 19th, 2010

    KendrickColeman.com has compiled a nice list of no-cost VMware vSphere utilities. A grading scale was disclosed to provide a value ranking of the utilities.  Information like this is valuable because I often see questions raised in the virtualization community about low-cost or no-cost ways to do this or that with VMware virtual infrastructure (backup is a frequent request).  I will be the first to admit that lab time is precious.  KendrickColeman.com has used their free time to install, test, and summarize each application for the benefit of the community.  Nice job and on behalf of the virtualization community, Thank You!

    Speaking of free, KendirckColeman.com has also pointed to a VMTN forum member who stumbled onto a way to use a free ESXi 4.0 license key to permanently license ESX 4.0.  Interesting find there.

    P2V Milestone

    May 15th, 2010

    If you’re reading this, that’s good news because it means last night’s P2V completed successfully.  I took the last remaining non-virtualized physical infrastructure server in the lab and made it a virtual machine.  Resource and role wise, this was the largest physical lab server next to the ESX hosts themselves.

    Resources:

    • HP Proliant DL380 G3
    • Dual Intel P4 2.8GHz processors
    • 6GB RAM
    • 1/2 TB  local storage
    • Dual Gb NICs
    • Dual fibre channel HBAs

    Roles:

    • Windows Server 2003 R2 Enterprise Edition SP2
    • File server
      • binaries
      • isos
      • my documents
      • thousands of family pictures
      • videos
    • Print server
    • IIS web server
      • WordPress blog
      • ASP.NET based family web site
      • other hosted sites
    • DHCP server
    • SQL 2005 server
      • vCenter
      • VUM
      • Citrix Presentation Server
    • MySQL server
      • WordPress blog
    • Backup Sever
    • SAN management

    I’m shutting down this last remaining physical server as well as the tape library.  They’ll go in the pile of other physical assets which are already for sale or they will be donated as sales for 32-bit server hardware are slow right now.  This is a milestone because this server, named SKYWALKER – you may have heard me mention it from time to time, has been a physical staple in the lab for as long as the lab has existed (circa 1995).  Granted it has gone through several physical hardware platform migrations, its logical role is historic and its composition has always been physical.  To put it into perspective, at one point in time SKYWALKER was a Compaq Prosignia 300 server with a Pentium Pro processor and a single internal Barracuda 4.3GB SCSI drive.  Before my abilities to acquire server class hardware, it was hand-me-down whitebox parts from earlier gaming rigs.

    The P2V (using VMware Converter) took a little over 5 hours for 500GB of storage.  So the only physical servers remaining in the lab are the ESX hosts themselves.  2 DL385 G2s and 2 DL385s which typically remain powered down, earmarked for special projects.  A successful P2V is a great start to a weekend if you ask me.  Now I’m off to my daughter’s T-ball game. :)

    QuickPress – VMs Per…

    May 7th, 2010

    I’m trying out my frist QuickPress. Let’s see how this turns out.
    Right off the bat, I’m missing the autocomplete feature for Tags. As it turns out, typing more than three lines in the small content box isn’t much fun.

    On with the VMware content… This all comes from the VMware vSphere Configuration Maximums document.  I’ve bolded some of what I’d call core stats which capacity planners or architects would need to be aware of on a regular basis:

    15,000 VMs registered per Linked-mode vCenter Server
    10,000 powered on VMs per Linked-mode vCenter Server
    4,500 VMs registered per 64-bit vCenter Server
    4,000 VMs concurrently scanned by VUM (64-bit)
    3,000 powered on VMs per 64-bit vCenter Server
    3,000 VMs registered per 32-bit vCenter Server
    3,000 VMs connected per Orchestrator
    2,000 powered on VMs per 32-bit vCenter Server
    1,280 powered on VMs per DRS cluster
    320 VMs per host (standalone)
    256 VMs per VMFS volume
    256 VMs per host in a DRS cluster
    200 VMs concurrently scanned by VUM (32-bit)
    160 VMs per host in HA cluster with 8 or fewer hosts (vSphere 4.0 Update 1)
    145 powered on Linux VMs concurrently scanned per host
    145 powered on Linux VMs concurrently scanned per VUM server
    145 VMs per host scanned for VMware Tools
    145 VMs per host scanned for VMware Tools upgrade
    145 VMs per host scanned for virtual machine hardware
    145 VMs per host scanned for virtual machine hardware upgrade
    145 VMs per VUM server scanned for VMware Tools
    145 VMs per VUM server scanned for VMware Tools upgrade
    100 VMs per host in HA cluster with 8 or fewer hosts (vSphere 4.0)
    72 powered on Windows VMs concurrently scanned per VUM server
    40 VMs per host in HA cluster with 9 or more hosts
    10 powered off Windows VMs concurrently scanned per VUM server
    6 powered on Windows VMs concurrently scanned per host
    6 powered off Windows VMs concurrently scanned per host
    5 VMs per host concurrently remediated

    Got all that?

    Update 5/10/10: Added the row 160 VMs per host in HA cluster with 8 or fewer hosts (vSphere 4.0 Update 1) – Thanks for the catch Matt & Joe!

    VKernel Capacity Analyzer

    May 6th, 2010

    Last month, I attended Gestalt IT Tech Field Day in Boston.  This is an independent conference made up of hand selected delegates and sponsored by the technology vendors whom we were visiting.  All of the vendors boast products which tie into a virtualized datacenter which made the event particularly exciting for me!

    One of the vendors we met with is VKernel.  If you’re a long time follower of my blog, you may recall a few of my prior VKernel posts including VKernel CompareMyVM.  Our VKernel briefing covered Capacity Analyzer.  This is a product I actually looked at in the lab well over a year ago, but it was time to take another peek to see what improvements have been made.

    Before I get into the review, some background information on VKernel:

    VKernel helps systems administrators manage server and storage capacity utilization in their virtualized datacenters so they can:

    • Get better utilization from existing virtualization resources
    • Avoid up to 1/2 the cost of expanding their virtualized datacenter
    • Find and fix or avoid capacity related performance problems

    VKernel provides easy to use, highly affordable software for systems managers that:

    • Integrates with their existing VMware systems
    • Discovers their virtualized infrastructure and
    • Determines actual utilization vs. provisioned storage, memory, and CPU resources

     And the VKernel Capacity Analyzer value proposition:

    Capacity Analyzer proactively monitors shared CPU, memory, network, and disk (storage and disk I/O) utilization trends in VMware and Hyper-V environments across hosts, clusters, and resource pools enabling you to:

    • Find and fix current and future capacity bottlenecks
    • Safely place new VMs based on available capacity
    • Easily generate capacity utilizatino alerts

    Capacity Analyzer lists for $299/socket, however, VKernel was nice enough to provide each of the delegates with a 10 socket/2 year license which was more than adequate for evaluation in the lab.  From this point forward, I will refer to Capacity Analyzer as CA.

    One of the things which was noticed right away by another delegate and by myself was the quick integration and immediate results.  CA 4.2 Standard Edition ships as a virtual appliance in OVF or Converter format.  The 32-bit SLES VM is pre-built, pre-configured, and pre-optimized for the role which it was designed for in the virtual infrastructure.  The 600MB appliance deploys in just minutes.  The minimum deployment tasks consist of network configuration (including DHCP support), licensing, and pointing at a VI3 or vSphere virtual infrastructure. 

    CA is managed by HTTP web interface which has been the subject of noticable improvement and polishing since the last time I reviewed the product.  The management and reporting interface is presented in a dashboard layout which makes use of the familiar stoplight colors.  A short period of time after deployment, I was already seeing data being collected.  I should note that the product supports management of multiple infrastructures.  I pointed CA at VI3 and vSphere vCenters simultaneously.

    5-5-2010 10-58-08 PM

    One of the dashboard views in CA is the “Top VM Consumers” for metrics such as CPU, Memory, Storage, CPU Ready, Disk Bus Resets, Disk Commands Aborted, Disk Read, and Disk Write.  The dashboard view shows the top 5, however, detailed drilldown is available which lists all the VMs in my inventory.

    5-5-2010 10-48-59 PM

    Prior to deploying CA, I felt I had a pretty good feel for the capacity and utilization in the lab.  After letting CA digest the information available, I thought it would be interesting to compare results provided by CA with my own perception and experience.  I was puzzled by the initial findings.  Consider the following physical two node cluster information from vCenter.  Each node is configured identically with 2xQC AMD Opteron processors and 16GB RAM. Each host is running about 18 powered on VMs.  Host memory is and always has been my limiting resource, and it’s evident here, however, with HA admission control disabled, there is still capacity to register and power on several more “like” VMs.

    5-5-2010 10-46-54 PM

    So here’s where things get puzzling for me.  Looking at the Capacity Availability Map, CA is stating
    1) Memory is my limiting resource – correct
    2) There is no VM capacity left on the DL385 G2 Cluster – that’s not right

    5-5-2010 10-46-01 PM

    After further review, the discrepancy is revealed.  The Calculated VM Size (slot size if you will) for memory is at 3.5GB.  I’ not sure where CA is coming up with this number. It’s not the HA calculated slot size, I checked.  3.5GB is nowhere near the average VM memory allocation in the lab.  Most of my lab VMs are thinly provisioned from a memory standpoint since host memory is my limiting resource.  I’ll need to see if this can be adjusted because these numbers are not accurate, thus not reliable.  I wouldn’t want to base a purchasing decision on this information.

    5-5-2010 10-59-20 PM

    Here’s an example of a drilldown.  Again, I like the presentation, although this screen seems to have some justification inconsistencies (right vs. center).  Reports in CA can be saved in .PDF or .CSV format, making them ideal for sharing, collaboration, or archiving.  Another value add is a recommendation section which is stated in plain English in the event the reader is unable to interpret the numbers.  What I’m somewhat confused about is fact that the information provided in different areas is contradicting.  In this case, the summary reports VM backupexec “is not experiencing problems with memory usage… the VM is getting all required memory resources”.  However, it goes on to say there is a problem in that there exists a Memory usage bottleneck… the VM may experience performance degradation if memory usage increases.  Finally, it recommends incresaing the VM memory size to almost double the currently assigned value – and this Priority is ranked as High.

    5-5-2010 10-42-01 PM

    It’s not clear to me from the drilldown report if there is a required action here or not. With the high priority status, there is a sense of urgency, but to do what?  The analysis states performance could suffer if memory usage increases.  That typically will be the case for virtual or physical machines alike.  The problem as I see it is the analysis is concerned with a future event of which may or may not occur.  If the VM has shown no prior history of higher memory consumption and there is no change to the application running in the VM, I would expect the memory utilization to remain constant.  VKernel is on the right track, but I think the out-of-box logic needs tuning so that it is more intuitive.  Else this is a false alarm which would cause me to overutilize host capacity or I would learn to ignore which is dangerous and provides no return on investment in a management tool.

    I’ve got more areas to explore with VKernel Capacity Analyzer and I welcome input, clarification, corrections from VKernel.  Overall I like the direction of the product and I think VKernel has the potential to service capacity planning needs for virtual infrastructures of all sizes.  The ease in deployment provides rapid return. As configuration maximums and VM densities increase, capacity planning becomes more challenging.  When larger VMs are deployed, significant dents are being made in the virtual infrastructure causing shared resources to deplete more rapidly per instance than in years past.  Additional capacity takes time to procure. We need to be able to lean on tools like these to provide the automated analysis and alarms to stay ahead of capacity requests and not be caught short on infrastructure resources.

    No Failback Bug in ESX(i)

    April 7th, 2010

    I few weeks ago, I came across a networking issue with VMware ESX(i) 4.0 Update 1.  The issue is that configuring a vSwitch or Portgroup for Failback: No doesn’t work as expected in conjunction with a Network Failover Detection type of Link Status Only. 

    For the simplest of examples:

    1. Configure a vSwitch with 2 VMNIC uplinks with both NICs in an Active configuration.  I’ll refer to the uplinks as VMNIC0 and VMNIC1.
    2. Configure the vSwitch and/or a Portgroup on the vSwitch for Failback: No.
    3. Create a few test VMs with outbound TCP/IP through the vSwitch. 
    4. Power on the VMs and begin a constant ping to each of the VMs from the network on the far side of the vSwitch.
    5. Pull the network cable from VMNIC0.  You should see little to no network connectivity loss on the constant pings.
    6. With VMNIC0 failed, at this point, all network traffic is riding over VMNIC1.  When VMNIC0 is recovered, the expected behavior with No Failback is that all traffic will continue to traverse VMNIC1.
    7. Now provide VMNIC0 with a link signal by connecting it to a switch port which has no route to the physical network.  For example, simply connect VMNIC0 to a portable Netgear or Linksys switch.
    8. What you should see now is that at least one of the VMs is unpingable.  It has lost network connectivity because ESX has actually failed its network path back to VMNIC0.  In the failback mechanism, VMware appears to balance the traffic evenly.  In a 2 VM test, 1 VM will fail back to the recovered VMNIC.  In a 4 VM test, 2 VMs will fail back to the recovered VMNIC.  Etc.

     The impact spreads to any traffic being supported by the vSwitch, not just VM networking.  Thus, the impact includes Service Console, Management Network, IP based storage, VMotion, and FT.  Scope of the bug includes both the standard vSwitch as well as the vSphere Distributed Switch (vDS).

    Based on the following VMTN forum thread, it would appear this bug has existed since August of 2008.  Unfortunately, documentation of the bug never made it to VMware support:

    http://communities.vmware.com/thread/165302

    You may be asking yourself at this point, well who really cares?  At the very least, we have on our hands a feature which does not work as documented on page 41 of the following manual: http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_esxi_server_config.pdf.  Organizations which have made a design decision for no failback have done so for a reason and rely on the feature to work as it is documented.

    Why would my VMNIC ever come up with no routing capabilities to the physical network?  Granted, my test was simplistic in nature and doesn’t likely represent an actual datacenter design but the purpose was merely to point out to folks that the problem does exist.  The issue actually does present a real world problem for at least one environment I’ve seen.  Consider an HP C-class blade chassis fitted with redundant 10GbE Flex10 Ethernet modules.  Upstream to the Flex10 modules are Cisco Nexus 5k switches and then Nexus 7k switches.

    When a Flex10 module fails and is recovered (say for instance it was rebooted – which you can test yourself if you have one), it has an unfortunate habit of bringing up the blade facing network ports (in this case, VMNIC0 labeled 1 in the diagram) up to 20 seconds before a link is established with the upstream Cisco Nexus 5k (labeled 2 in the diagram) which grants network routing to other infrastructure components.  So what happens here?  VMNIC0 shows a link and ESX fails back traffic to it up to 20 seconds before the link to the Nexus 5k is established.  There is a network outage for Service Console, Management Network, IP based storage, VMotion, and FT.  Perhaps some may say they can tolerate this much of an outage for their VM traffic, but most people I have talked to say even an outage of 2 or more seconds is unacceptable.  And what about IP based storage?  Can you afford the 20 second latency?  What about Management Network and Service Console?  Think HA and isolation response impact.  VMs shutting down as a result.  Etc.  It’s a nasty chain of events.  In such a case, a decision can be made to enable no failback as a policy on the vSwith and Portgroups.  However, due to the VMware bug, this doesn’t work.  Some day you may experience an outage which you did not expect.

    As pointed out in the VMTN forum thread above, there is a workaround which I have tested and does work: Force at least one VMNIC to act as Standby.  This is not by VMware design, it just happens to make the no failback behavior work correctly.  The impact with this design decision is of course that now one VMNIC stands idle and there are no load balancing opportunities over this VMNIC.  In addition, with no failback enabled, network traffic will tend to become polarized to one side again impacting network load balancing.

    An SR has been opened with VMware on this issue.  They have confirmed it is a bug and will be working to resolve the issue in a future patch or release.

    Update 4/27/10:  The no failback issue has been root-caused by VMware and a fix has been constructed and is being tested. It will be triaged for a future release.