There is a VMware storage whitepaper available which is titled Scalable Storage Performance. It is an oldie but goodie. In fact, next to VMware’s Configuration Maximums document, it is one of my favorites and I’ve referenced it often. I like it because it is efficient in specifically covering block storage LUN queue depth and SCSI reservations. It was written pre-VAAI but I feel the concepts are still quite relevant in the block storage world.
One of the interrelated components of queue depth on the VMware side is the advanced VMkernel parameter Disk.SchedNumReqOutstanding. This setting determines the maximum number of active storage commands (IO) allowed at any given time at the VMkernel. In essence, this is queue depth at the hypervisor layer. Queue depth can be configured at various points in the path of an IO such as the VMkernel which I already mentioned, in addition to the HBA hardware layer, the kernel module (driver) layer, as well as at the guest OS layer.
Getting back to Disk.SchedNumReqOutstanding, I’ve always lived by the definition I felt was most clear in the Scalable Storage Performance whitepaper. Disk.SchedNumReqOutstanding is the maximum number of active commands (IO) per LUN. Clustered hosts don’t collaborate on this value which implies this queue depth is per host. In other words, each host has its own independent queue depth, again, per LUN. How does Disk.SchedNumReqOutstanding impact multiple VMs living on the same LUN (again, same host)? The whitepaper states each VM will evenly share the queue depth (assuming each VM has identical shares from a storage standpoint).
When virtual machines share a LUN, the total number of outstanding commands permitted from all virtual machines to that LUN is governed by the Disk.SchedNumReqOutstanding configuration parameter that can be set using VirtualCenter. If the total number of outstanding commands from all virtual machines exceeds this parameter, the excess commands are queued in the ESX kernel.
I was recently challenged by a statement agreeing to all of the above but with one critical exception: Disk.SchedNumReqOutstanding provides an independent queue depth for each VM on the LUN. In other words, if Disk.SchedNumReqOutstanding is left at its default value of 32, then VM1 has a queue depth of 32, VM2 has a queue depth of 32, and VM3 has its own independent queue depth of 32. Stack those three VMs and we arrive at a sum total of 96 outstanding IOs on the LUN. A few sources were provided to me to support this:
Fibre Channel SAN Configuration Guide:
You can adjust the maximum number of outstanding disk requests with the Disk.SchedNumReqOutstanding parameter in the vSphere Client. When two or more virtual machines are accessing the same LUN, this parameter controls the number of outstanding requests that each virtual machine can issue to the LUN.
VMware KB Article 1268 (Setting the Maximum Outstanding Disk Requests per Virtual Machine):
You can adjust the maximum number of outstanding disk requests with the Disk.SchedNumReqOutstanding parameter. When two or more virtual machines are accessing the same LUN (logical unit number), this parameter controls the number of outstanding requests each virtual machine can issue to the LUN.
The problem with the two statements above is that I feel they are poorly worded, and as a result, misinterpreted. I understand what the statement is trying to say, but it’s implying something quite a bit different depending on how a person reads it. Each statement is correct in that Disk.SchedNumReqOutstanding will gate the amount of active IO possible per LUN and ultimately per VM. However, the wording implies that the value assigned to Disk.SchedNumReqOutstanding applies individually to each VM which is not the case. The reason I’m pointing this out is due to the number of misinterpretations I’ve subsequently discovered via Google which I gather are the result of reading one of the latter sources above.
The scenario can be quickly proven in the lab. Disk.SchedNumReqOutstanding is configured for the default value of 32 active IOs. Using resxtop, I see my three VMs cranking out IO with IOMETER. Each VM is configured with IOMETER to create 32 active IOs. If what I’m being told by the challenge is true, I should be seeing 96 active IO being generated to the LUN from the combined activity of the three VMs.

But that’s not what’s happening. Instead what I see is approximately 32 ACTV (active) IOs on the LUN, with another 67 IOs waiting in queue (by the way, ESXTOP statistic definitions can be found here). In my opinion, the Scalable Storage Performance whitepaper most accurately and best defines the behavior of the Disk.SchedNumReqOutstanding value.

Now going back to the possibility of the Disk.SchedNumReqOutstanding stacking, LUN utilization could get out of hand rapidly with 10, 15, 20, 25 VMs per LUN. We’d quickly exceed the max supported value of Disk.SchedNumReqOutstanding (and all HBAs I’m aware of) which is 256. HBA ports themselves typically support a few thousand IOPS. Stacking the queue depths for each VM could quickly saturate an HBA meaning we’d get a lot less mileage out of those ports as well.
While having a queue depth discussion, it’s also worth noting the %USD value is at 100% and LOAD is approximately 3. The LOAD statistic corroborates the 3:1 ratio of total IO:queue depth and both figures paint the picture of an oversubscribed LUN from an IO standpoint.
In conclusion, I’d like to see VMware modify the wording in their documentation to provide better understanding leaving nothing open to interpretation.
Update 6/23/11: Duncan Epping at Yellow Bricks responded with a great followup Disk.SchedNumReqOutstanding the story.