I was planning for a long time to write this post but time is never enough and timing is always not the right one π
This time I managed to gather all the info and present a simple to follow, brief and explanatory list of most important performance counters used in SQL monitoring. I came up with it, by studying a lot of other similar documents from important SQL professionals and adding personal insight on each of them.
Let’s begin by saying that these counters are helpful when are collected throughout the whole day with a refresh interval set at 1 minute. It can be done easily if you go to: Control Panel\Administrative Tools\Performance Monitor\Data Collector Sets\User defined. Right click and open: New–> Data Collector Set –> check on Create Manually –> check on Performance counters –> start adding performance counters.
- Disk Transfers/sec
- slow, needs attention : < (less than) than 80 I/Oβs per second when "Physical Disk\Avg. Disk sec/Transfer" is > (greater than) 25ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.
- Avg. Disk sec/Read – average time, in seconds, of a read of data to the disk
β very good < 10 ms β okay 10-20 ms β slow, needs attention 20-50 ms β Serious I/O bottleneck > 50 ms
β very good < 10 ms β okay 10-20 ms β slow, needs attention 20-50 ms β Serious I/O bottleneck > 50 ms
To determine the impact of excessive paging on disk activity, multiply the values of the "Physical Disk\Avg. Disk sec/Transfer" and "Memory\Pages/sec counters. If the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time, which indicates that you need more RAM. If this occurs over a long period, you probably need more memory. However, "Page Life Expectancy" counter must be checked as well.
If the Perfmon capture interval is set for anything greater than one second, the average of the values captured is presented.
The above 5 are useful in analyzing any ups or downs for a certain period. It will help you know the nr of reads/writes in normal behavior and if a problem occurs you can link several values and see if the difference is notable.
- Available MBytes – amount of physical memory available to processes running on the computer, in Megabytes
- lowβ < than 10% available - very low β less than 5% available
- good - < 70 %
- good - < 10 - bad - > 15
- slow on a slow disk subsystem - 100 - 600 - slow on a powerful disk system - 600 - 2500 - very slow - 2500 - 5000 - extremely slow - > 5000
- Memory Grants pending - Number of waits on processes
should stay close to 0
It should be = (Total amount of RAM in the server) - (RAM used by Windows Server) - (RAM used by SQL Server) - (and any utilities you have running on the server). If much lower, than it may be possible that you aren't allowing SQL Server to dynamically allocate RAM, and instead have accidentally specified that SQL Server uses less RAM than it should have for optimal performance. This is not a critical counter for memory leak but it's worth looking at it.
- Buffer cache hit ratio - how oftem SQL Server can get data from buffer rather than disk
- good - > 90 % OLAP systems - good - > 95% OLTP systems
- good - < 2
- good - > 640
- good - < 20
-good - < 100
- good - < 90
- good - < 90
- good - > 300
Hint: Less than that on a regular basis is a clear indication that your SQL Server needs more RAM allocated to function properly.
- Processor Queue Length
- good - < 4 per cpu - also good - 4-6 - fair - 7 - 12 - bad - > 12 is
- high - > 5000 - very high - > 15000
- Processor time % (total & per process)
- Fair - <80%
- Fair - < 75% (higher bottleneck)
- bad - > 30 % (hardware or driver problem) - very bad - > 50%
- Forwarded records /sec - Occurs when a column's data gets expanded more (for varchar types) than the initially defined space. It is happening only for tables without clustered indexes.
- good - < 10 per 100 batch requests/sec
Hint: Evaluate creating clustered indexes for heap tables.
- bad - > 1000 and CPU high
Hint: Queries with missing indexes or too many rows requested will have a large number of logical reads and an increased CPU time.
For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations.
- fair - < 20 per 100 batch requests/sec
Hint: If higher, than you have a problem that can be solved by clustered index, good fill factor.
An increased fill factor helps to reduce page splits because there is more room in data pages before it fills up and a page split has to occur.
- Batch requests/sec - shows you how busy is your server.
- busy server - > 1000 - very busy - > 3000
Hint: A typical 100 Mbs network card is only able to handle about 3000 batch requests per second. If you have a server that is this busy, you may need
to have two or more network cards, or go to a 1 Gbs network card.
- good - as low as possible. - bad - > 100
- good - as low as possible. - bad - > 100
Hint: Evaluate the number of (re)compilations linked to the number of Batch requests/sec, and check if the resulting ratio is less than 30% or higher. If higher than you should start checking which Stored procedures, or ad-hoc queries are getting compiled / recompiled over and over again and causing CPU usage.
I wrote about this in detail at How to find out which Queries have non-reusable Query Plans? and Top 5 Steps To Determine Bad Performance of Queries
So basically these counters should give you an overall view if your system is low on CPU, Memory or Disk resources or if you should re-design some your queries, tables, indexes. The perfmon counters are a very good indication of how tuned is your system. However, these should be seen together with SQL statistics that can be gathered from the multitude of dynamic views available starting SQL 2005 + editions. In a next post I will write about those important SQL counters, how I prefer to call them. If you start collecting windows counters and SQL counters you should have no problem in identifying a certain bottleneck happened in a timeframe you where not at office or skipped the critical period and couldn't analyze on time everything.
Have a great time until then!