Recently I had a discussion about storage latency, the question was: “Is high storage latency always a bad thing?”.
When we look at latency in a vSphere environment the esxtop counters CMDS/s, DAVG/cmd, KAVG/cmd and GAVG/cmd parameters are particularly interesting:
|This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second) and other SCSI commands such
as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual machine being monitored.
In most cases CMDS/s = IOPS unless there are a lot of metadata operations (such as SCSI reservations)
|This is the average response time in milliseconds per command being sent to the device
|This is the amount of time the command spends in the VMkernel
|This is the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG = GAVG
The KAVG value should always remain low, let’s say 0 or maybe 1 ms. KAVG includes VMM, vSCSI and ESX Storage Stack latency. DAVG includes driver, HBA, fabric, array storage processor and device latency. Frank Denneman actually wrote a good article on this, including a nice diagram. The value for DAVG latency is determined by delays in your HBA adapter (queueing), the fabric, array storage processor and the array.
So the question is, what is an acceptable latency value for DAVG? I think most admins will accept about 20-30 ms as an acceptable latency, actually default SIOC values are around 20-30 ms depending on the used storage technology.
So, let’s say we have a latency that’s bigger then 30ms, do we have problem?
Not necessarily, the SCSI block size also determines the latency value. The SCSI block size is depending of the type of IO’s the Guest OS is sending to disk. In a “normal” situation you might observe 4k, 8k, or 16k IO blocks. But in some situations the Guest OS might be sending bigger IO sizes, Windows 2008 might send 1 MB (!) SCSI blocks to disk. The thing is that vSphere will measure IO latency for the whole IO SCSI block. This actually means that the size of an IO block will impact the latency value proportionally; thus the bigger the IO block, the higher the latency.
Nathan Small wrote an article on this topic what is worth reading. You can emulate this behavior using IO Meter; IO Meter will allow you to set a SCSI IO block size which will show you the relationship between block size and latency.
Next time you face a high latency, always try to determine what is happening in your Guest OS. Maybe the impact of this high latency is not that big, and can be explained because the SCSI block size used by the Guest OS.