Posts Tagged ‘DPM’

Request for UI Consistency

October 25th, 2010

Sometimes it’s the little things that can make life easier.  This post is actually a fork from the one which I originally wrote previously on a DPM issue.  Shortly after that post, it was pointed out that I was reading DPM Priority Recommendations incorrectly.  Indeed that was the case.

Where did I go wrong?  Look at the priority descriptions between DRS and DPM, where both slide bars are configured with the same aggressiveness:

 “Apply priority 1, priority 2, priority 3, and priority 4 recommendations”

SnagIt Capture

 “Apply priority 4 or higher recommendations”

SnagIt Capture

My brief interpretation was that a higher recommendation meant a higher number (ie. priority 4 is a higher recommendation than priority 3).  It’s actually the opposite that is true.  A higher recommendation is a lower number.

I believe there is too much left open to interpretation on the DPM screen as to what is higher and what is lower.  The DRS configuration makes sense because it’s clear as to what is going to be applied; no definition of high or low to be (mis-)interpreted.  The fix?  Make the DPM configuration screen mirror DRS configuration screen.  Development consistency goes a long way.  As a frequent user of the tools, I expect it.  I view UI inconsistency as sloppy.

If you are a VMware DPM product manager, please see my previous post VMware DPM Issue.

VMware DPM Issue

October 24th, 2010

I’ve been running into a DPM issue in the lab recently.  Allow me briefly describe the environment:

  • 3 vCenter Servers 4.1 in Linked Mode
  • 1 cluster with 2 hosts
    • ESX 4.1, 32GB RAM, ~15% CPU utilization, ~65% Memory utilization, host DPM set for disabled meaning the host should never be placed in standby by DPM.
    • ESXi 4.1, 24GB RAM, ~15% CPU utilization, ~65% Memory utilization, host DPM set for automatic meaning the host is always a candidate to be placed in standby by DPM.
  • Shared storage
  • DRS and DPM enabled for full automation (both configured at Priority 4, almost the most aggressive setting)

Up until recently, the ESX and ESXi hosts weren’t as loaded and DPM was working reliably.  Each host had 16GB RAM installed.  When aggregate load was light enough, all VMs were moved to the ESX host and the ESXi host was placed in standby mode by DPM.  Life was good.

There has been much activity in the lab recently.  The ESX and ESXi host memory was upgraded to 32GB and 24GB respectively.  Many VMs were added to the cluster and powered on for various projects.  The DPM configuration remained as is.  Now what I’m noticing is that with a fairly heavy memory load on both hosts in cluster, DPM moves all VMs to the ESX host and places the ESXi host in standby mode.  This places a tremendous amount of memory pressure and over commit on the solitary ESX host.  This extreme condition is observed by the cluster and nearly as quickly, the ESXi host is taken back out of standby mode to balance the load.  Then maybe about an hour later, the process repeats itself again.

I then configured DPM to manual mode so that I could examine the recommendations being made by the calculator.  The VMs were being evacuated for the purposes of DPM via a Priority 3 recommendation which is half way between Conservative and Aggressive recommendations.

SnagIt Capture

What is my conclusion?  I’m surprised at the perceived increase in aggressiveness of DPM.  In order to avoid the extreme memory over commit, I’ll need to configure DPM slide bar for Priority 2.  In addition, I’d like to get a better understanding of the calculation.  I have a difficult time believing the amount of memory over commit being deemed acceptable in a neutral configuration (Priority 3) which falls half way between conservative and aggressive.  In addition to that, I’m not a fan of a host continuously entering and exiting standby mode, along with the flurry of vMotion activity which results.  This tells me that the calculation isn’t accounting for the amount of memory pressure which is actually occurring once a host goes into standby mode, or coincidentally there are significant shifts in the workload patterns shortly after each DPM operation.

If you are a VMware DPM product manager, please see my next post Request for UI Consistency.

Preferential Treatment for DPM Hosts

February 7th, 2010

Here’s a tip that’s so simple and probably well known that it could be categorized as a stupid pet trick.

As I’ve mentioned in the past, I leverage VMware DPM (an Enterprise licensing feature) in the lab so that during periods of lesser activity (while I’m at work or sleeping, or both), ESX hosts in the lab can be placed in standby mode to cut electricity consumption and save on the energy bill.  I haven’t taken the time to research how hosts in the cluster are arbitrarily chosen for standby mode.  Over the course of time, the pattern I have witnessed tells me it’s more of a round robin type selection.  For instance, today host A will be chosen for standby mode, tomorrow host B will be chosen, and the next day, again host A will be chosen.  Perhaps load is taken into the calculation.  I don’t honestly know.  It’s not important right now.

I’ve also mentioned in the past that I run both ESX and ESXi in the same vSphere cluster.  This is a VMware supported configuration. I do this so that I can get a daily dose of both host platform experiences.  I’m not shy in saying my platform preference is still ESX because of its Service Console. What can I say… old habits are hard to break, but I’m trying, I really am.  More often than not, I need ESX Service Console access for whatever reason.  When I pop in the lab and find out that the ESX host is in standby mode, it takes a good 5 minutes to wake it up and then work on the things I need to get done.

Enter DPM Host Options.  This feature lets me apply some rules in the host selection process for DPM.  In this case, I want DPM to do its thing and save me money, but I don’t want it to shut down the ESX host.  Rather, shut down the ESXi host instead.  To do this is simple.  Modify the cluster settings and disable DPM for the ESX host as shown below.

With this rule in place, DPM will always choose solo.boche.mcse for standby mode, which is the ESXi host.  The ESX host, lando.boche.mcse, has been disabled for DPM and should always remained powered on and ready for action.

Tame Electrical and Heating Costs with CPU Power Management

November 11th, 2009

A casual Twitter tweet about my power savings through the use of VMware Distributed Power Management (DPM) found its way to VMware Senior Product Manager for DPM, Ulana Legedza, and Andrei Dorofeev. Ulana was interested in learning more about my situation. I explained how VMware DPM had evaluated workloads between two clustered vSphere hosts in my home lab, and proceeded to shut down one of the hosts for most of the month of October, saving me more than $50 on my energy bill.

Ulana and Andrei took the conversation to the next level and asked me if I was using vSphere’s Advanced CPU Power Management feature (See vSphere Resource Management Guide page 22). I was not, in fact I was unaware of its existence. Power Management is a new feature in ESX(i)4 available to processors supporting Enhanced Intel SpeedStep or Enhanced AMD PowerNow! power management technologies. To quote the .PDF article:

“To improve CPU power efficiency, you can configure your ESX/ESXi hosts to dynamically switch CPU frequencies based on workload demands. This type of power management is called Dynamic Voltage and Frequency Scaling (DVFS). It uses processor performance states (P-states) made available to the VMkernel through an ACPI interface.”

A quick look at the Quad Core AMD Opteron 2356 processors in my HP DL385 G2 showed they support Enhanced AMD PowerNow! Power Management Technology:

There are two steps to enabling this power management feature. The first step is to ensure it is enabled in the server BIOS. On an HP DL385 G2, CPU power management is enabled by default. In this particular server model, it is configured via the BIOS by hitting at the end of the POST (would require a reboot obviously)

A slightly easier method might be to verify and/or configure the policy through HP’s out of band (OOB) iLO 2, however, a reboot will be requested by the iLO 2 for a policy change to take effect. On an HP server, configure for OS Control mode, but again, this appears to be the default for the HP DL385 G2 so hopefully no reboot is required for you to implement this power saving measure in your environment:

After enabling power management in the BIOS, the second step is to modify the Power Management Policy on each ESX(i) host from the default of static to dynamic. The definitions of these two settings can be found in the .PDF linked above and are as follows:

static – The default. The VMkernel can detect power management features available on the host but does not actively use them unless requested by the BIOS for power capping or thermal events.

dynamic – The VMkernel optimizes each CPU’s frequency to match demand in order to improve power efficiency but not affect performance. When CPU demand increases, this policy setting ensures that CPU frequencies also increase.

You might be asking yourself by this point “Ok, this is nice, but what’s the trade off?” Note the wording in the dynamic definition above “improves power efficiency but does not affect performance”. This is a win/win configuration change!

This step can be performed one of a few ways on each host (again, no reboot required for this change):

  1. Using the vSphere Client, change the Advanced host setting Power.CpuPolicy from static to dynamic
  2. Scriptable: Via the ESX service console, PuTTY, or script, issue the command esxcfg-advcfg -s dynamic /Power/CpuPolicy

The impact on my home lab was quite visible. After 12 hours, the blue area in the following 24 hour graph reflects average electrical consumption was reduced from an average 337 Watts down to 292 Watts. All things being equal and CPU loads balanced by DRS, that’s a reduction in energy consumption of over 13% per host:

An alternate graph shows Btu output dropped from 1,135 Btu to about 1,000 Btu. All things being equal, a reduction of about 135 Btu per host:

A Btu is heat – explained more at wiseGEEK’s What is a Btu? Heat is a byproduct of technology in the datacenter and in most cases is viewed as overhead expense because it requires cooling (additional costs) to maintain optimal operating conditions for the equipment running in the environment. If we can eliminate heat, we eliminate the associated cost of removing the heat. This is known as cost avoidance.

Eliminating heat is as much of an interest to me as reducing my energy bill. The excessive heat generated in the basement eventually finds its way upstairs causing the rest of the house to be a little uncomfortable. The air conditioner in my home wasn’t manufactured to handle the excessive heat. Now, I live in the midwest where we have some frigid winters. Heat in the home is welcomed during the winter months. I could turn off CPU Power Management raising the Btu index as well as my energy bill, in favor of reducing my natural gas heating bill. I don’t know which is more expensive. This could be a great experiment for the January/February time frame.

In summary, we can attack operating costs from two sides by using VMware CPU Power Management:

  1. Reduction in excess electricity used by idle CPU cycles
  2. Reduction in cooling costs by reducing Btu output

I’m excited to see what next month’s energy bill looks like.

Update 11-17-09:  I was just made aware that Simon Seagrave wrote an earlier article on CPU power management here.  Sorry Simon, I was unaware of your article and I did not intentionally copy your topic.  Your article covered the topic well.  I hope we’re still friends :)

DPM best practices. Look before you leap.

March 16th, 2009

It has previously been announced that VMware’s Distributed Power Management (DPM) technology will be fully supported in vSphere. Although today DPM is for experimental purposes only, virtual infrastructure users with VI Enterprise licensing can nonetheless leverage its usefulness of powering down ESX infrastructure during non-peak periods where they see fit.

Before enabling DPM, there are a few precautionary steps I would go through first to test each ESX host in the cluster for DPM compatibility which will help mitigate risk and ensure success. Assuming most, if not all, hosts in the cluster will be identical in hardware make and model, you may choose to perform these tests on only one of the hosts in the cluster. More on testing scope a little further down.

This first step is optional but personally I’d go through the motions anyway. Remove the hosts to be tested individually from the cluster. If the hosts have running VMs, place the host in maintenance mode first to displace the running VMs onto other hosts in the cluster:

3-16-2009 10-31-19 PM

If the step above was skipped or if the host wasn’t in a cluster to begin with, then the first step is to place the clustered host into maintenance mode. The following step would be to manually place the host in Standby Mode. This is going to validate whether or not vCenter can successfully place a host into Standby Mode automatically when DPM is enabled. One problem I’ve run into is the inability to place a host into Standby Mode because the NIC doesn’t support Wake On LAN (WOL) or WOL isn’t enabled on the NIC:

3-16-2009 10-25-53 PM

Assuming the host has successfully been place into Standby Mode, use the host command menu (similar in look to the menu above) to take the host out of Standby Mode. I don’t have the screen shot for that because the particular hosts I’m working with right now aren’t supporting the WOL type that VMware needs.

Once the host has successfully entered and left Standby Mode, the it can be removed from maintenance mode and added back into the cluster. Now would not be a bad time to take a look around some of the key areas such as networking and storage to make sure those subsystems are functioning properly and they are able to “see” their respective switches, VLANs, LUNs, etc. Add some VMs to the host and power them on. Again, perform some cursory validation to ensure the VMs have network connectivity, storage, and the correct consumption of CPU and memory.

My point in all of this is that ESX has been brought back from a deep slumber. A twelve point health inspection is the least amount of effort we can put forth on the front side to assure ourselves that, once automated, DPM will not bite us down the road. The steps I’m recommending have more to do with DPM compatibility with the different types of server and NIC hardware, than they have to do with VMware’s DPM technology in and of itself. That said, at a minimum I’d recommend these preliminary checks on each of the different hardware types in the datacenter. On the other end of the spectrum if you are very cautious, you may choose to run through these steps for each and every host that will participate in a DPM enabled cluster.

After all the ESX hosts have been “Standby Mode verified”, the cluster settings can be configured to enable DPM. Similar to DRS, DPM can be enabled in a manual mode where it will make suggestions but it won’t act on them without your approval, or it can be set for fully automatic, dynamically making and acting on its own decisions:

3-16-2009 10-24-33 PM

DPM is an interesting technology but I’ve always felt in the back of my mind it conflicts with capacity planning (including the accounting for N+1 or N+2, etc.) and the ubiquitous virtualization goal of maximizing the use of server infrastructure. In a perfect world, we’ll always be teetering on our own perfect threshold of “just enough infrastructure” and “not too much infrastructure”. Having infrastructure in excess of what what would violate availability constraints and admission control is where DPM fits in. That said, if you have a use for DPM, in theory, you have excess infrastructure. Why? I can think of several compelling reasons why this might happen, but again in that perfect world, none could excuse the capital virtualization sin of excess hardware not being utilized to its fullest potential (let alone, powered off and doing nothing). In a perfect world, we always have just enough hardware to meet cyclical workload peaks but not too much during the valleys. In a perfect world, virtual server requests come planned so well in advance that any new infrastructure needed is added the day the VM is spun up to maintain that perfect balance. In a perfect world, we don’t purchase larger blocks or cells of infrastructure than what we actually need because there are no such things as lead times for channel delivery, change management, and installation that we need to account for.

If you don’t live in a perfect world (like me), DPM offers those of us with an excess of infrastructure and excuses an environment friendly and responsible alternative to at least cut the consumption of electricity and cooling while maintaining capacity on demand if and when needed. Options and flexibility through innovation is good. That is why I choose VMware.

Putting some money where my VMware mouth is

February 15th, 2009

I came home this afternoon from a Valentines Day wedding in North Dakota to find that my one and only workstation in the house (other than the work laptop) had a belated Valentines Day present for me:  It would no longer boot up.  No Windows.  No POST.  No video signal.  No beep codes.

DSC00473

I was feeling adventurous and I needed a relatively quick and inexpensive fix.  I decided to take one of the thin clients I received from Chip PC via VMworld 2008 plus a freshly deployed Windows XP template on the Virtual Infrastructure and promote this VDI solution to main household workstation status for the next few weeks.  The timing on this could not have been better.  The upcoming Minnesota VMUG on Wednesday March 11th is going to be VDI focused.  I guess I’ll have more to contribute at that meeting than I had originally planned on.  With any luck, Chip PC will be in attendance and we can discuss some things.

The thin client:  Chip PC Xtreme PC NG-6600 (model: EX6600N, part number: CPN04209).

Specs:

  • RMI – Alchemy Au 1550, 500MHz RISC processor (equivalent to 1.2GHz x86 TC processors)
  • 128MB DDR RAM
  • 64MB Disk-On-Chip with TFS
  • 128-bit 3D graphics acceleration engine with separate 2x8MB display memory SDRAM
  • Dual DVI ports each supporting 19201200 16-bit color.  Supports quad displays up to 1024768
  • Audio I/O
  • 4 USB 2.0 ports
  • 10/100 Ethernet NIC
  • Power draw:  3.5W work mode, .35W sleep mode
  • OS:  Enhanced Microsoft Windows CE (6.00 R2 Professional)
  • Integrated applications (Plugins – note plugins are downloaded at no charge from the Chip PC website and are not, by default, embedded or included with the thin client – just enough OS concept)
    • Citrix ICA
    • RDP 5.2 and 6
    • Internet Explorer 6.0
    • VDM Client
    • VDI Client
    • Media Player
    • VPN Client
    • Ultra VNC
    • Pericom (Team Talk) Terminal Emulation
    • LPD Printer
    • ELO Touch Screen
  • Compatibility
    • Citrix WinFrame, MetaFrame, and Presentation Server 4.5
    • MS Windows Server 2000/2003
    • MS Windows NT 4.0 – TS Edition
    • VMware Virtual Desktop Interface using RDP
  • Full support of both local and network printers:  LPD, LPR, SMB, LPT, USB, COM
  • Support for USB mass storage (thumb drives – deal breaker for me)
  • Support for wireless USB NIC (not included)
  • etc. etc. etc.

DSC00474

Truth be told, this isn’t really a promotion in the sense that I had already performed extensive testing on it.  I hadn’t even taken the thing out of the box yet other than to register it for the extended warranty.  I’ve had only a little experience on these devices as I have an identical unit in the lab at work which I’ve spent a total of 30 minutes on.  To the best of my knowledge, this is the Cadillac unit from Chip PC.

I don’t have any fancy VDI brokering solutions here in the home lab and I’m not up to speed on VMware View so the plan is to leverage Thin Client -> RDP -> Windows XP desktop on VMware Virtual Infrastructure 3.5.

I think this is going to be a good test.  A trial by fire of VDI (granted, a fairly simple variation).  I spout a lot about the goodness that is VMware and now I’ll be eating some of my own dog food from the desktop workspace.  I’m a power user.  I’ve got my standard set of applications that I use on a regular basis and I’ve got a few hardware devices such as a flatbed scanner, iPod Shuffle, USB thumb drives, digital cameras, etc.  I should know within a short period of time whether or not this will be a viable solution for the short term.  Also add to the mix my wife’s career.  She uses our home computer to access her servers at work on a fairly regular basis.  Lastly, my wife sometimes works from home while I’m away at the office or traveling.  It’s going to be critical that this solution stays up and running and continues to be viable for my wife while I’m remote and not able to provide computer support.

So where am I at now?  I’ve got the VDI session patched along with my most critical applications installed to get me by in the short term:  Quicken, SnagIt, network printer, and Citrix clients.  I’ll install MS Office later but for now I can use the published application version of Office on my virtualized Citrix servers.  I’ve been listening some Electro House on www.di.fm on the VDI and music quality is as good as it was on my PC before it died, although it doesn’t completely drive my 5.1 surround in the den.  Pretty sure I’m getting 2.1 right now.  Oh well, at least the sub is thumpin.  Shhhh… the thin client is sleeping:

DSC00478

So what else?  As long as I’m throwing caution to the wind, I think it’s time to take the training wheels off VMware DPM (Distributed Power Management) and see what happens in a two node cluster.

2-15-2009 10-53-10 PM

Based on the environment below, what do you think will happen?  CPU load is very low, however, memory utilization is close to being over committed in a one host scenario. Will DPM kick in?

2-15-2009 10-53-59 PM

Most of my infrastructure at home is virtual including all components involving internet access both incoming and outgoing.  If the blog becomes unavailable for a while in the near future, I’ll give you one guess as to what happened.  :)

No matter what the outcome, vmwarenews.de aka Roman Haug – you are no longer welcomed to republish my blog articles.  Albeit flattering, the fact that you have not even so much as asked in the first place has officially pissed me off.  You publish my content as if it were your own, written by you as indicated by the “by Roman” header preceeding each duplicated post.  Please remove my content from your site and refrain from syndicating my content going forward.  Thank you in advance.

Update: Roman Haug has offered an apology and I believe we have reached an understanding.  Thank you Roman!

Make VirtualCenter highly available with VMware Virtual Infrastructure

November 17th, 2008

A few days ago I posted some information on how to make VirtualCenter highly available with Microsoft Cluster Services.

Monday Night Football kickoff is coming up but I wanted follow up quickly with another option (as suggested by Lane Leverett): Deploy the VirtualCenter Management Server (VCMS) on a Windows VM hosted on a VMware Virtual Infrastructure cluster. Why is this a good option? Here are a few reasons:

  1. It’s fully supported by VMware.
  2. You probably already have a VI cluster in your environment you can leverage. Hit the ground running without spending the time to set up MSCS.
  3. Removing MSCS removes a 3rd party infrastructure complexity and dependency which requires an advanced skill set to support.
  4. Removing MSCS removes at least one Windows Server license cost and also removes the need for the more expensive Windows Enterprise Server licensing and the special hardware needs required by MSCS.
  5. Green factor: Let VCMS leverage the use of VMware Distributed Power Management (DPM).

How does it work? It’s pretty simple. A virtualized VCMS shares the same advantages any other VM inherently has when running on a VMware cluster:

  1. Resource balancing of the four food groups (vProcessor, vRAM, vDisk, and vNIC) through VMware Distributed Resource Scheduler (DRS) technology
  2. Maximum uptime and quick recovery via VMware High Availability (HA) in the event of a VI host failure or isolation condition (yes, HA will still work if the VCMS is down. HA is a VI host agent)
  3. Maximum uptime and quick recovery via VMware High Availability (HA) in the event of a VMware Tools heartbeat failure (ie. the guest OS croaks)
  4. Ability to perform host maintenance without downtime of the VCMS

A few things to watch out for (I’ve been there and done that, more than once):

  1. If you’re going to virtualize the VCMS, be sure you do so on a cluster with the necessary licensed options to support the benefits I outlined above (DRS, HA, etc.) This means VI Enterprise licensing is required (see the licensing/pricing chart on page 4 of this document). I don’t want to hide the fact that a premium is paid for VI Enterprise licensing, but as I pointed out above, if you’ve already paid for it, the bolt ons are unlimited use so get more use out of them.
  2. If your VCMS (and Update manager) database is located on the VCMS, be sure to size your virtual hardware appropriately. Don’t go overboard though. From a guest OS perspective, it’s easier to grant additional virtual resources from the four food groups than it is to retract them.
  3. If you have a power outage and your entire cluster goes down (and your VCMS along with it), it can be difficult to get things back on their feet while you don’t have the the use of the VCMS. Particularly if you’ve lost the use of other virtualized infrastructure components such as Microsoft Active Directory. Initially it’s going to be command line city so brush up on your CLI. It really all depends on how badly the situation is once you get the VI hosts back up. One example I ran into is host A wouldn’t come back up. Host B wasn’t the registered owner of the VM I needed to bring up. This requires running the vmware-cmd command to register the VM and bring it up on host B.

Well, I missed the first few minutes of Monday Night Football, but everyone who reads (tolerates) my ramblings is totally worth it.

Go forth and virtualize!