The VMTN Storage Performance Thread is a collaboration of storage performance results on VMware virtual infrastructure provided by VMTN Community members around the world. The thread starts here, was locked due to length, and continues on in a new thread here. There’s even a Google Spreadsheet version, however, activity in that data repository appears to have diminished long ago. The spirit of the testing is outlined by thread creater and VMTN Virtuoso christianZ:
“My idea is to create an open thread with uniform tests whereby the results will be all inofficial and w/o any warranty. If anybody shouldn’t be agreed with some results then he can make own tests and presents his/her results too. I hope this way to classify the different systems and give a “neutral” performance comparison. Additionally I will mention that the performance [and cost] is one of many aspects to choose the right system.”
Testing standards are defined by christianZ so that results from each submission are consistent and comparable. A pre-defined template is used in conjunction with IOMETER to generate the disk I/O and capture the performance metrics. The test lab environment and the results are then appended to the thread discussion linked above. The performance metrics measured are:
- Average Response Time (in Milliseconds, lower is better) – also known as latency of which VMware declares a potential problem threshold of 50ms in their Scalable Storage Performance whitepaper
- Average I/O per Second (number of I/Os, higher is better)
- Average MB per Second (in MB, higher is better)
Following are my results with the EMC Celerra NS-120 Unified Storage array:
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / RAID 5
SAN TYPE / HBAs: Emulex dual port 4Gb Fiber Channel, HP StorageWorks 2Gb SAN switch
OTHER: Disk.SchedNumReqOutstanding and HBA queue depth set to 64
Fibre Channel SAN Fabric Test
| Test Name |
Avg. Response Time |
Avg. I/O per Second |
Avg. MB per Second |
| Max Throughput – 100% Read |
1.62 |
35,261.29 |
1,101.92 |
| Real Life – 60% Rand / 65% Read |
16.71 |
2,805.43 |
21.92 |
| Max Throughput – 50% Read |
5.93 |
10,028.25 |
313.38 |
| Random 8K – 70% Read |
11.08 |
3,700.69 |
28.91 |
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: swISCSI
OTHER: Shared NetGear 1Gb SoHo Ethernet switch
swISCSI Test
| Test Name |
Avg. Response Time |
Avg. I/O per Second |
Avg. MB per Second |
| Max Throughput – 100% Read |
17.52 |
3,426.00 |
107.06 |
| Real Life – 60% Rand / 65% Read |
14.33 |
3,584.53 |
28.00 |
| Max Throughput – 50% Read |
11.33 |
5,236.50 |
163.64 |
| Random 8K – 70% Read |
15.25 |
3,335.68 |
22.06 |
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: NFS
OTHER: Shared NetGear 1Gb SoHo Ethernet switch
NFS Test
| Test Name |
Avg. Response Time |
Avg. I/O per Second |
Avg. MB per Second |
| Max Throughput – 100% Read |
17.18 |
3,494.48 |
109.20 |
| Real Life – 60% Rand / 65% Read |
121.85 |
480.81 |
3.76 |
| Max Throughput – 50% Read |
12.77 |
4,718.29 |
147.45 |
| Random 8K – 70% Read |
123.41 |
478.17 |
3.74 |
Please read further below for futher NFS testing results after applying EMC Celerra best practices
Fibre Channel Summary
Not surprisingly, Celerra over SAN fabric beats the pants off of the shared storage solutions I’ve had in the lab previously, HP MSA1000 and Openfiler 2.2 swISCSI before that, in all four IOMETER categories. I was, however, pleasantly surprised to find that Celerra over fibre channel was one of the top performing configurations among a sea of HP EVA, Hitachi, NetApp, and EMC CX series frames.
swISCSI Summary
Celerra over swISCSIwas only slightly faster than the Openfiler 2.2 swISCSI on HP Proliant ML570 G2 hardware I had in the past on the Max Throughput-100%Read test. In the other three test categories, however, the Celerra left the Openfiler array in the dust.
NFS Summary
Moving on to Celerra over NFS, performance results were consistent with swISCSI in two test categories (Max Throughput-100%Read and Max Throughput-50%Read), but NFS performance numbers really dropped in the remaining two categories as compared to swISCSI (RealLife-60%Rand-65%Read and Random-8k-70%Read).
What’s worth noting is that both the iSCSI and NFS datastores are backed by the same logical Disk Group and physical disks on the Celerra. I did this purposely to compare the iSCSI and NFS protocols, with everything else being equal. The differences in two out of the four categories are obvious. The question came to mind: Does the performance difference come from the Celerra, the VMkernel, or a combination of both? Both iSCSI and NFS have evolved into viable protocols for production use in enterprise datacenters, therefore, I’m leaning AWAY from the theory that the performance degradation over NFS stems from the VMkernel. My initial conclusion here is that Celerra over NFS doesn’t perform as well with Random Read disk I/O patterns. I welcome your comments and experience here.
Please read further below for futher NFS testing results after applying EMC Celerra best practices
CIFS
Although I did not test CIFS, I would like to take a look at its performance. CIFS isn’t used directly by VMware virtual infrastructure, but it can be a handy protocol to leverage with NFS storage. File management (ie. .ISOs, templates, etc.) on ESX NFS volumes becomes easier and more mobile and less tools are required when the NFS volumes are presented as CIFS shares on a predominantly Windows client network. Providing adequate security through CIFS will be a must to protect the ESX datastore on NFS.
If you’re curious about storage array configuration and its impact on performance, cost, and availability, take a look at this RAID triangle which VMTN Master meistermn posted in one of the performance threads:

The Celerra stroage is currently carved out in the following way:
| |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
|
| DAE 2 |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
FC |
DAE 2 |
| DAE1 |
NAS |
NAS |
NAS |
NAS |
NAS |
Spr |
Spr |
|
|
|
|
|
|
|
|
DAE 1 |
| DAE 0 |
Vlt |
Vlt |
Vlt |
Vlt |
Vlt |
NAS |
NAS |
NAS |
NAS |
NAS |
NAS |
NAS |
NAS |
NAS |
NAS |
DAE 0 |
| |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
|
FC = fibre channel Disk Group
NAS = iSCSI/NFS Disk Groups
Spr = Hot Spare
Vlt = Celerra Valut drives
I’m very pleased with the Celerra NS-120. With the first batch of tests complete, I’m starting to formulate ideas on when, where, and how to use the various storage protocols with the Celerra. My goal is not to eliminate use of the slowest performing protocol in the lab. I want to work with each of them on a continual basis to test future design and integration with VMware virtual infrastructure.
Update 1/30/10: New NFS performance numbers. I’ve begun working with EMC vSpecialist to troubleshoot the performance descrepancies between swISCSI and NFS protocols. A few key things have been identified and a new set of performance metrics have been posted below after making some changes:
- The first thing that the EMC vSpecialists (and others on the blog post comments) asked about was whether or not the file system uncached write mechanism was enabled. The uncached write mechanism is designed to improve performance for applications with many connections to a large file, such as a virtual disk file of a virtual machine. This mechanism can enhance access to such large files through the NFS protocol. Out of the box, the factory default is the uncached write mechanism is disabled on the Celerra. EMC recommends this feature be enabled with ESX(i). The beauty here is that the feature can be toggled while the NFS file system is mounted on cluster hosts with VMs running on it. VMware ESX Using EMC Celerra Storage Systems pages 99-101 outlines this recommendation.
- Per VMware ESX Using EMC Celerra Storage Systems pages 73-74, NFS send and receive buffers should be divisible by 32k on the ESX(i) hosts. Again, these advanced settings can be adjusted on the hosts while VMs are running and the settings do not require a reboot. EMC recommended a value of 64 (presumably for both).
- Use the maximum amount of write cache possible for Storage Processors (SPs). Factory defaults here: 598BM total read cache size, 32MB read cache size, 598MB total write cache size, 566MB write cache size.
- Specific to this test – verify that the ramp up time is 120 seconds. Without the ramp up the results can be skewed. The tests I originall performed were with a 0 second ramp up time.
The new NFS performance tests are below, using some of the recommendations above:
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: NFS
OTHER: Shared NetGear 1Gb SoHo Ethernet switch
New NFS Test After Enabling the NFS file system Uncached Write Mechanism
VMware ESX Using EMC Celerra Storage Systems pages 99-101
| Test Name |
Avg. Response Time |
Avg. I/O per Second |
Avg. MB per Second |
| Max Throughput – 100% Read |
17.39 |
3,452.30 |
107.88 |
| Real Life – 60% Rand / 65% Read |
20.28 |
2,816.13 |
22.00 |
| Max Throughput – 50% Read |
19.43 |
3,051.72 |
95.37 |
| Random 8K – 70% Read |
19.21 |
2,878.05 |
22.48 |
Significant improvement here!
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: NFS
OTHER: Shared NetGear 1Gb SoHo Ethernet switch
New NFS Test After Configuring
NFS.SendBufferSize = 256 (this was set at the default of 264 which is not divisible by 32k)
NFS.ReceiveBufferSize = 128 (this was already at the default of 128)
VMware ESX Using EMC Celerra Storage Systems pages 73-74
| Test Name |
Avg. Response Time |
Avg. I/O per Second |
Avg. MB per Second |
| Max Throughput – 100% Read |
17.41 |
3,449.05 |
107.78 |
| Real Life – 60% Rand / 65% Read |
20.41 |
2,807.66 |
21.93 |
| Max Throughput – 50% Read |
18.25 |
3,247.21 |
101.48 |
| Random 8K – 70% Read |
18.55 |
2,996.54 |
23.41 |
Slight change
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: NFS
OTHER: Shared NetGear 1Gb SoHo Ethernet switch
New NFS Test After Configuring IOMETER for 120 second Ramp Up Time
| Test Name |
Avg. Response Time |
Avg. I/O per Second |
Avg. MB per Second |
| Max Throughput – 100% Read |
17.28 |
3,472.43 |
108.51 |
| Real Life – 60% Rand / 65% Read |
21.05 |
2,726.38 |
21.30 |
| Max Throughput – 50% Read |
17.73 |
3,338.72 |
104.34 |
| Random 8K – 70% Read |
17.70 |
3,091.17 |
24.15 |
Slight change
Due to the commentary received on the 120 second ramp up, I re-ran the swISCSI test to see if that changeded things much. To fairly compare protocol performance, the same parameters must be used across the board in the tests.
SERVER TYPE: Windows Server 2003 R2 VM ON ESXi 4.0 U1
CPU TYPE / NUMBER: VCPU / 1 / 1GB Ram (thin provisioned)
HOST TYPE: HP DL385 G2, 16GB RAM; 2x QC AMD Opteron 2356 Barcelona
STORAGE TYPE / DISK NUMBER / RAID LEVEL: EMC Celerra NS-120 / 15x 146GB 15K 4Gb FC / 3x RAID 5 5×146
SAN TYPE / HBAs: swISCSI
OTHER: Shared NetGear 1Gb SoHo Ethernet switch
New swISCSI Test After Configuring IOMETER for 120 second Ramp Up Time
| Test Name |
Avg. Response Time |
Avg. I/O per Second |
Avg. MB per Second |
| Max Throughput – 100% Read |
17.79 |
3,351.07 |
104.72 |
| Real Life – 60% Rand / 65% Read |
14.74 |
3,481.25 |
27.20 |
| Max Throughput – 50% Read |
12.17 |
4,707.39 |
147.11 |
| Random 8K – 70% Read |
15.02 |
3,403.39 |
26.59 |
swISCSI is still performing slightly better than NFS on the Random Reads, however, the margin is much closer
At this point I am content, stroke, happy, (borrowing UK terminology there) with NFS performance. I am now moving on to ALUA, Round Robin, and PowerPath/VE testing. I set up NPIV over the weekend with the Celerra as well – look for a blog post coming up on that.
Thank you EMC and to the folks who replied in the comments below with your help tackling best practices and NFS optimization/tuning!