Is high vSphere storage latency always a bad thing? Part II

A while ago I wrote an article on high storage latency. This article discussed the relationship between the amount of latency and IO blocksize used by the OS running in your virtual machine.

In this new article I want to discuss a latency message I recently faced in several vSphere 5 environments. The alert is like:

“Device x performance has deteriorated. I/O latency increased from average value of y microseconds to z microseconds.”

The alert shows up in both the vSphere client as well as in the vmkernel.log. X shows the naa ID of the storage device, y & z latency values in microseconds. The alert indicates that latency increased from the normal value ‘y’ (let’s say 10000 microseconds = 10 miliseconds) to values as high as 500000 microseconds (=500 ms), and sometimes even higher.

Of course latency values of 500 ms are not desired, so there might be problem with the storage device. A good way to determine what’s happening on your storage is to run an analysis using esxtop batch mode. After collecting performance data you can analyse the data using Windows perfmon. More information on that is available in this article.

Now look for the time of the alert and compare this with the actual values that the esxtop results show. Important: vmkernel.log and esxtop results always show UTC time, while the vSphere Client uses local time for messages and alerts.

My experience is that the displayed values in the “performance has deteriorated” alert (converted to milliseconds) are most of the time not the same with the values observed by esxtop. Actually, esxtop values are most of the time (2x-6x) lower than values reported in the alert. Important: don’t only investigate the latency value; it’s also important to investigate the behavior of the graph/latency. For example: a latency spike is something different than an increase in latency which slowly decreases. A latency spike is most of the time something not to worry about, it indicates an increase in latency for a very short period of time. An increase in storage latency which continues for a while could indicate a performance problem.

I’ve seen various “Device x performance has deteriorated. I/O latency increased from average value of y microseconds to z microseconds” alerts, which could most of the time be linked to latency spikes. Always be careful but in these cases there was not too much to worry about.

Moreover, when using perfmon for analyzing esxtop results you might also notice latency values fluctuating a little depending on the timeframe you’re analyzing. This has to do with the way perfmon works; perfmon will show you an average value in the graph…this value is dependending on the esxtop value and the timeframe you’re observing. If you have the maximum zoom level, esxtop will show you the actual esxtop values (which are still lower then in the “performance has deteriorated” alert).

I hope this post helps you thinking about storage latency. Good luck on resolving the issues you might face! Also check this VMware KB article.

Important remark: Although the alert discussed in this article can be a “false positive”, there are also numerous posts about a networking bug in vSphere 5 which can affect software iSCSI. More information is available on vmtoday.com.

Tags: latency vsphere

About the author

1

Manish
March 7, 2013 at 17:32
Please include the link to KB 1008205 to use esxtop for finding storage performance issues as well.

Reply

vHero interview with Richard Garsthagen

Thank you for voting! – viktorious.nl #43 blog + #4 new blog

About the author

viktorious

Related Articles

1 Comments

Manish

Leave a ReplyCancel reply

About viktorious.nl

vExpert

Subscribe to Blog via Email

Recent Comments

Niranjan on 26 Feb in: Setup Harbor Proxy Cache and Harbor Container Webhook to overcome Docker Hub Pull Rate Limits in Kubernetes

VSphere 7 Update 3 broadens app acceleration, cloud initiatives - TechTarget - ColorMag on 13 Mar in: Deploy a Tanzu Kubernetes cluster on vSphere 7

David Feng on 10 Mar in: Automated deployment of a NAT network with VMware Cloud Assembly and NSX-T

viktorious on 09 Mar in: Automated deployment of a NAT network with VMware Cloud Assembly and NSX-T

David Feng on 06 Mar in: Automated deployment of a NAT network with VMware Cloud Assembly and NSX-T

Is high vSphere storage latency always a bad thing? Part II

vHero interview with Richard Garsthagen

Thank you for voting! – viktorious.nl #43 blog + #4 new blog

About the author

viktorious

Related Articles

Setup Harbor Proxy Cache and Harbor Container Webhook to overcome Docker Hub Pull Rate Limits in Kubernetes

Navigating around TKG 2.1 management & workload clusters

January 25th 2023: SpringOne Watch Party @ VMware Office Amsterdam

1 Comments

Manish

Leave a ReplyCancel reply

About viktorious.nl

vExpert

Subscribe to Blog via Email

Recent Comments

Niranjan on 26 Feb in: Setup Harbor Proxy Cache and Harbor Container Webhook to overcome Docker Hub Pull Rate Limits in Kubernetes

VSphere 7 Update 3 broadens app acceleration, cloud initiatives - TechTarget - ColorMag on 13 Mar in: Deploy a Tanzu Kubernetes cluster on vSphere 7

David Feng on 10 Mar in: Automated deployment of a NAT network with VMware Cloud Assembly and NSX-T

viktorious on 09 Mar in: Automated deployment of a NAT network with VMware Cloud Assembly and NSX-T

David Feng on 06 Mar in: Automated deployment of a NAT network with VMware Cloud Assembly and NSX-T