Tag Archives: monitor sql server

Key SQL performance counters – identify bottlenecks

I was planning for a long time to write this post but time is never enough and timing is always not the right one πŸ™‚
This time I managed to gather all the info and present a simple to follow, brief and explanatory list of most important performance counters used in SQL monitoring. I came up with it, by studying a lot of other similar documents from important SQL professionals and adding personal insight on each of them.

Let’s begin by saying that these counters are helpful when are collected throughout the whole day with a refresh interval set at 1 minute. It can be done easily if you go to: Control Panel\Administrative Tools\Performance Monitor\Data Collector Sets\User defined. Right click and open: New–> Data Collector Set –> check on Create Manually –> check on Performance counters –> start adding performance counters.

Analyze Disk
1. Logical Disk

  • Disk Transfers/sec
  • - slow, needs attention : < (less than) than 80 I/O’s per second when 
    "Physical Disk\Avg. Disk sec/Transfer" is > (greater than) 25ms. 
    This may indicate too many virtual LUNs using the same physical disks on a SAN. 

2. Physical Disk

  • Avg. Disk sec/Read – average time, in seconds, of a read of data to the disk
  • – very good	 	         < 10 ms 
    – okay               		 10-20 ms			
    – slow, needs attention   	 20-50 ms			 
    – Serious I/O bottleneck  	 > 50 ms 
  • Avg. Disk sec/Write – average time, in seconds, of a write of data to the disk
  • – very good	 	         < 10 ms 
    – okay               		 10-20 ms			
    – slow, needs attention   	 20-50 ms			 
    – Serious I/O bottleneck  	 > 50 ms 
  • Avg. Disk sec/Transfer – Disk latency
  • To determine the impact of excessive paging on disk activity, 
    multiply the values of the "Physical Disk\Avg. Disk sec/Transfer"
    and "Memory\Pages/sec counters.  If the product of these counters 	
    exceeds 0.1, paging is taking more than 10 percent of disk access time, 
    which indicates that you need more RAM.  If this occurs over a long 
    period, you probably need more memory. 
    However, "Page Life Expectancy" counter must be checked as well. 
  • Disk Bytes/sec – total number of bytes sent to the disk (write) and retrieved from the disk (read) over a period of one second.
    If the Perfmon capture interval is set for anything greater than one second, the average of the values captured is presented.
  • Disk Read Bytes/sec
  • Disk Write Bytes/sec
  • Disk Reads/sec
  • Disk Writes/sec

The above 5 are useful in analyzing any ups or downs for a certain period. It will help you know the nr of reads/writes in normal behavior and if a problem occurs you can link several values and see if the difference is notable.

Analyze Memory
1. Memory

  • Available MBytes – amount of physical memory available to processes running on the computer, in Megabytes
    - low– < than 10% available
    - very low – less than 5% available
  • Page file usage %
  • - good - < 70 % 
  • Pages input/sec - the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk.
  • - good - < 10
    - bad  - > 15 

  • Pages/sec - rate at which pages are read/written to disk
  • - slow on a slow disk subsystem  - 100 - 600 
    - slow on a powerful disk system - 600 - 2500 
    - very slow                      - 2500 - 5000
    - extremely slow                 - > 5000 

2. SQL Server: Memory Manager

  • Memory Grants pending - Number of waits on processes
  • should stay close to 0 
  • SQL Cache Memory (KB) - how much physical RAM is devoted to SQL Server’s data cache
  • It should be = (Total amount of RAM in the server) - (RAM used by Windows Server) 
    - (RAM used by SQL Server) - (and any utilities you have running on the server). 
    If much lower, than it may be possible that you aren't allowing SQL Server to dynamically
    allocate RAM, and instead have accidentally specified that SQL Server uses less RAM 
    than it should have for optimal performance. This is not a critical counter for memory
    leak but it's worth looking at it. 

3. SQL Server : Buffer Manager

  • Buffer cache hit ratio - how oftem SQL Server can get data from buffer rather than disk
  • - good - > 90 % OLAP systems 
    - good - > 95% OLTP systems
  • Free list stalls/sec - the frequency of requests for database buffer pages are suspended because there's no buffer available
  • - good - < 2 
  • Free pages - total nr of 8k data pages on all free lists
  • - good -  > 640 
  • Lazy writes / sec - nr of times per sec that lazy writer moves dirty pages from buffer to disk to free buffer space
  • - good - < 20 
  • Page lookups/sec - nr of requests to find a page in the buffer. This counter should be analyzed in relation with "SQL statistics\ Batch Requests/sec" counter.

  • (Page Lookups/sec) / (Batch Requests/sec) - reveals if query plan is efficient or inefficient
  • -good - < 100 
  • Page reads/sec - reveals index, memory problems
  • - good - < 90 
  • Page writes/sec
  • - good - < 90
  • Page Life Expectancy - time (seconds) that a data page set is kept into buffer memory.
  • - good - > 300 

    Hint: Less than that on a regular basis is a clear indication that your SQL Server needs more RAM allocated to function properly.

Analyze CPU
1. System

  • Processor Queue Length
  • - good         - < 4 per cpu 
    - also good    - 4-6  
    - fair         - 7 - 12 
    - bad          - > 12 is 
  • Context Switches/sec - happens when a higher priority thread preempts a lower priority thread that is currently running or when a high priority thread blocks
    - high        - > 5000 
    - very high   - > 15000 

2. Processor

  • Processor time % (total & per process)
  • - Fair - <80% 
  • Privileged time % (total & per process)
  • - Fair - < 75% (higher bottleneck)
  • Interrupt Time % - indicates the percentage of time the processor spends receiving and servicing hardware interrupts
  • - bad         - > 30 % (hardware or driver problem) 
    - very bad    - > 50%

Analyze Query Performance
1. SQL Server : Access Methods IO Intensive Operations

  • Forwarded records /sec - Occurs when a column's data gets expanded more (for varchar types) than the initially defined space. It is happening only for tables without clustered indexes.
  • - good - < 10 per 100 batch requests/sec

    Hint: Evaluate creating clustered indexes for heap tables.

  • (Index searches /sec) / (Full scans/sec) - Monitors the ratio between Index searches (good) over Full Scans (bad) on base tables or indexes
  • - bad - > 1000 and CPU high 

    Hint: Queries with missing indexes or too many rows requested will have a large number of logical reads and an increased CPU time.
    For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations.

  • Pages splits / sec - Occurs when a 8k page fills and must be split to a new 8k page
  • - fair - < 20 per 100 batch requests/sec 

    Hint: If higher, than you have a problem that can be solved by clustered index, good fill factor.
    An increased fill factor helps to reduce page splits because there is more room in data pages before it fills up and a page split has to occur.

2. SQL Statistics

  • Batch requests/sec - shows you how busy is your server.
  • - busy server  - > 1000 
    - very busy    - > 3000 

    Hint: A typical 100 Mbs network card is only able to handle about 3000 batch requests per second. If you have a server that is this busy, you may need
    to have two or more network cards, or go to a 1 Gbs network card.

  • SQL Compilations/Sec - The number of times per second that SQL Server compilations have occurred.
  • - good - as low as possible.
    - bad  -  > 100 
  • SQL ReCompilations/Sec - The number of times per second that SQL Server recompilations have occurred.
  • - good - as low as possible.
    - bad  -  > 100 

    Hint: Evaluate the number of (re)compilations linked to the number of Batch requests/sec, and check if the resulting ratio is less than 30% or higher. If higher than you should start checking which Stored procedures, or ad-hoc queries are getting compiled / recompiled over and over again and causing CPU usage.
    I wrote about this in detail at How to find out which Queries have non-reusable Query Plans? and Top 5 Steps To Determine Bad Performance of Queries

So basically these counters should give you an overall view if your system is low on CPU, Memory or Disk resources or if you should re-design some your queries, tables, indexes. The perfmon counters are a very good indication of how tuned is your system. However, these should be seen together with SQL statistics that can be gathered from the multitude of dynamic views available starting SQL 2005 + editions. In a next post I will write about those important SQL counters, how I prefer to call them. If you start collecting windows counters and SQL counters you should have no problem in identifying a certain bottleneck happened in a timeframe you where not at office or skipped the critical period and couldn't analyze on time everything.

Have a great time until then!