CPU Ready to %RDY Conversion

October 21st, 2010 by jason Leave a reply »

Most customers expect x amount of performance out of their virtual machines which is going to be dependent on established Service Level Agreements (SLAs).  Capacity planning, tuning, and monitoring performance all play a role in meeting customer SLAs.  When questioning performance of a physical machine, one of the first ubiquitous metrics that comes to mind is CPU utilization.  Server administrators are naturally inclined to look at this metric on virtual machines as well.  However, when looking at VM performance, Ready time is an additional metric to be examined from a CPU standpoint.  This metric tells us how much time the guest VM is waiting for its share of CPU execution from the host.

I began learning ESX in 2005 on version 2.0.  At that time, the VMware ICM class focused a lot on leveraging the Service Console.  At that time, vCenter Server 1.x was brand new and as such, ESXTOP was king for performance monitoring.  In particular, the %RDY metric in ESXTOP was used to reveal CPU bottlenecks as described above.  %RDY provides statistics in a % format.  I learned what acceptable tolerances were, I learned when to be a little nervous, and I could pretty well predict when the $hit was hitting the fan inside a VM from a CPU standpoint.  Duncan Epping at Yellow Bricks dedicates a page to ESXTOP statistics on his blog and at the very beginning, you’ll see a threshold he has published which you should keep in the back of your mind.

Today, ESXTOP still exists fortunately (it’s one of my favorite old-school-go-to tools).  The Service Console is all but gone, however, you’ll still find resxtop in VMware’s vMA appliance which is used to remotely manage ESXi (and ESX as well).  But what about the vSphere Client and vCenter Server?  With the introduction of vCenter Server, the disappearance of the Service Console, and the inclination of a Windows based administrator to lean on GUI based tools as a preference, notable focus has moved away from the CLI approach in lieu of the vSphere Client (in conjunction with the vCenter Server). 

Focusing on a VM in the vSphere Client, you’ll find a performance metric called CPU Ready.  This is the vSphere Client metric which tells us how much time the guest VM is waiting for its share of CPU execution from the host just as %RDY did in ESXTOP.  But when you look at the statistics, you’ll notice a difference.  %RDY in ESXTOP provides us with metrics in a % format.  CPU Ready in the vSphere Client provides metrics in a millisecond summation format.  I learned way back from the ICM class and through trench experience that ~10% RDY (per each vCPU) is a threshold to watch out for.  How does a % value from ESXTOP translate to a millisecond value in the vSphere Client?  It doesn’t seem to be widely known or published but I’ve found it explained a few places.  A VMware communities document here and a Josh Townsend blog post here.

There’s a little math involved.  To convert the vSphere Client CPU Ready metric to the ESXTOP %RDY metric, you divide the CPU Ready metric by the rollup summation (which are both values in milliseconds).  What does this mean?  Say for instance you’re looking at the overall CPU Ready value for a VM in Real-time.  Real-time is refreshed every 20 seconds and represents a rollup of values over a 20 second period (that’s 20,000 milliseconds).  Therefore…

  • If the CPU Ready value for the VM is, say 500 milliseconds, we divide 500 milliseconds by 20,000 milliseconds and arrive at nearly 3% RDY.  Hardly anything to be concerned about. 
  • If the CPU Ready time were 7,500, we divide 7,500 milliseconds by 20,000 milliseconds and arrive at 37.5% RDY or $hit hitting the fan assuming a 1 vCPU VM. 

What do I mean above by 1 vCPU VM?  The overall VM CPU Ready metric is the aggregate total of CPU Ready for each vCPU.  This should sound familiar – if you know how %RDY works in ESXTOP, then you’re armed with the knowledge needed to understand what I’m explaining.  The %RDY value in ESXTOP is the aggregate total of CPU Ready for each vCPU.  In other words, if you saw a 20% RDY value in ESXTOP for a 4 vCPU VM, the actual %RDY for each vCPU is 5% which is well under the 10% threshold we generally watch for.  In the vSphere Client, not only can you look at the overall aggregate CPU Ready for a particular VM (which should be divided by the number of assigned vCPUs for the VM), but you can also look at the CPU Ready values for the individual vCPUs themselves.  It is the per CPU Ready value which should be compared with published and commonly known thresholds.  When looking at Ready values, it’s important to interpret the data correctly in order to compare the right data to thresholds.

I’ve often heard the conversation of “how do I convert millisecond values in the vSphere Client to % values in ESXTOP?”  I’ve provided a working example using CPU Ready data.  Understand it can be applied to other metrics as well.  Hopefully this helps.

Advertisement

No comments

  1. Erik Bussink says:

    Excellent summary, thanks for the info. Wish they could also have a %RDY graph using directly the ESXTOP data… for those old school persons…

  2. AFidel says:

    Veeam Monitor (even free) will allow you to see %RDY for multiple VM’s at once which AFAIK you can’t do from the VIC. I found it to be quite useful during a recent deployment.

  3. Great write-up, Jason. You provide great clarity on this. Thanks for linking to my article.

  4. Orth Otic says:

    So, what %RDY is considered a fail?

  5. Brandon says:

    Generally greater than 10% ready per vCPU is considered high in the community. VMware documentation says that if it is above 20% performance may be impacted, but that is generally not what you see most experts say. See duncan epping’s blog for a list of values considered thresholds by the community within esxtop/resxtop. http://www.yellow-bricks.com/esxtop/

  6. jason says:

    @Brandon 20% RDY is generally higher than what I’m comfortable with but that said, %RDY doesn’t necessarily need to be a static threshold. As they say, mileage may vary. It really depends on end user experience and perception. As far as Duncan’s ESXTOP article, I had referenced it in this blog post.

  7. John Gannon says:

    Jason – great coverage of a hot topic. CPU ready / co-scheduling issues is one of the most common complaints we hear from our customers. How would you recommend administrators address this symptom if they see it in their environment?

  8. Ed Fran says:

    Jason, your post clarified my mind about CPU Ready to %RDY conversion.
    But I have one problem, at vCenter I can export information about one VM with CPU Ready for a whole October month. And by summation, the result that I got is 150325 (ms) for just one day. Okay, no problem here. But is there a way to calculate de %RDY in that day? and month? Thanks

  9. Michal Cz says:

    hello,

    Thanks for explanation i find it very useful. However i am concern about the threshold value for %RDY then.
    In your example %RDY of 3% came from 500ms CPU ready value. 500ms is defiantly wrong number (waiting half of second for vCPU to become available?!!). I thought CPU ready (ms) shouldn’t exceed 50-100 ms per VM in order to keep performance of VM at a good level.

    Real life example of mine. I have a 2vcpu VM in an overcommited (cpu) host. I can notice the poor performance of this VM, and the only counter that points to a problem is CPU ready value (Virtual Center) which on average (past hour) shows 700 ms. At the same time when i look at %cpu ready value in veeam monitor (past hour) it shows 2%. Accordingly to you i shouldn’t be worried, but it seems that it actually is a problem…

  10. Miroslav says:

    You can perform the calculation at http://www.vmcalc.com, no need to memorize the formulas.