Category Archives: Performance & Tunning

sysprocesses table – a handy tool for monitoring when you are not there

There are several SQL native queries to collect information about the server while confronting a bottleneck and I mean:

  • sp_who2 active – check active processes and if there are blockages
  • dbcc inputbuffer (spid) – see what the process behind a specific spid number is doing. This command is similar to using the ::fn_get_sql function that converts the sql_hadle into humanly readable text.
  • querying sys.dm_exec_requests and the other related DMVs

But all of these are practically transforming in a better form the output from master.dbo.sysprocesses table.
This table has a lot of useful information, but some of it may not be used at its real value by all of you. I want to show how you can use most of it and really understand its contents. I will not speak about the obvious columns which are displayed by the sp_who2 procedure call, such as: SPID, Status, Login, BlkBy, DBName, Command, CPUTime, DiskIO, LastBatch, ProgramName. There are all very nice and give you fast overview over your busy system if executed when the problem is noticed. I personally use it each time somebody is complaining about a heavy workload. But what if the workload happens when you are not at work? or not in front of a computer and receiving alerts? or maybe, you found that sometimes between 1 AM and 4 AM during the off hours, your system gets very busy, and you don’t have any extra expensive tool like (Diagnostic manager, Spotlight, Redgate etc) installed that will show you next day what really happened when your system was overused? So how do you explain to your IT director what happened and what can be done to overcome the future problems?

Here I will show you what you can do. It is free and I will be using only SQL tools, meaning the output from sysprocesses.
So, basically when you execute sp_who2 you will not see the underlining query that is behind a specific SPID. I bet you know that, this information is held inside sysprocesses under the same SPID number but in binary form. something like: 0x01000100099BA83AA0E4BEAD3100000000000000. Also you probably know that you can retrieve the sql text using the above mentioned function ::fn_get_sql. There are 2 other columns that can give you the exact statement if the statement that is executing and causing problems is inside a Stored Procedure. Then, you’ll know what part of your stored procedure needs more attention. These are stmt_start and stmt_end. They mark the starting and ending offset of the current SQL statement for the specified sql_handle.

What I am trying to say is that you could make a job inside the SQL Agent that will start at a specified hour and execute a stored procedure containing a Time based Cursor (from 1 AM to 4 AM) with a waiting delay of 3-4 seconds, that will insert the output from sysprocesses into a user table, available for you to query and see the history of all queries. Be careful though, when scripting the algorithm of the insert, because some SPID numbers can execute at a given time one query and after completion the SPID will be taken by another thread, other database. In order to not have a bunch of duplicate values and make the history table very hard to query, my advice is to insert once all the SPID numbers found in sysprocesses which don’t exist in your table, then that start the cursor based on existing SPID numbers from your table and insert only changed values for same SPID in: Status, loginame, hostname, cmd, dbid, and blocked from sysprocesses.

Example: MUST BE FILLED with right values.


/*
CREATE TABLE [dbo].[your_table](
	[date] datetime,
	[spid] [smallint] NOT NULL,
	[kpid] [smallint] NOT NULL,
	[blocked] [smallint] NOT NULL,
	[waittype] [binary](2) NOT NULL,
	[waittime] [bigint] NOT NULL,
	[lastwaittype] [nchar](32) NOT NULL,
	[waitresource] [nchar](256) NOT NULL,
	[dbid] [smallint] NOT NULL,
	[uid] [smallint] NULL,
	[cpu] [int] NOT NULL,
	[physical_io] [bigint] NOT NULL,
	[memusage] [int] NOT NULL,
	[login_time] [datetime] NOT NULL,
	[last_batch] [datetime] NOT NULL,
	[ecid] [smallint] NOT NULL,
	[open_tran] [smallint] NOT NULL,
	[status] [nchar](30) NOT NULL,
	[sid] [binary](86) NOT NULL,
	[hostname] [nchar](128) NOT NULL,
	[program_name] [nchar](128) NOT NULL,
	[hostprocess] [nchar](10) NOT NULL,
	[cmd] [nchar](16) NOT NULL,
	[nt_domain] [nchar](128) NOT NULL,
	[nt_username] [nchar](128) NOT NULL,
	[net_address] [nchar](12) NOT NULL,
	[net_library] [nchar](12) NOT NULL,
	[loginame] [nchar](128) NOT NULL,
	[context_info] [binary](128) NOT NULL,
	[sql_handle] [binary](20) NOT NULL,
	[stmt_start] [int] NOT NULL,
	[stmt_end] [int] NOT NULL,
	[request_id] [int] NOT NULL
) ON [PRIMARY]

*/
declare @datefrom as datetime 
declare @dateto as datetime
set @datefrom = getdate () 
set @dateto = dateadd(second, 120, @datefrom) - the loop will run for 2 minutes.

WHILE @datefrom < @dateto 

BEGIN

waitfor delay '00:00:03'

Insert into [yourtable] (date, column_list) select getdate(), column_list from sysprocesses
where SPID not in (select SPID from your_table) -- you can add filters for background processeses. 

DECLARE spid_cursor for 
select SPID from sysprocesses where SPID in (select SPID from your_table) 


[...] - open, fetch, while clauses 

Insert into your_table () 
select  
    from sysprocesses a 
    where spid = @spid and 
    (a.status not in (select status from your_table b where b.spid=@SPID) or 
    [...] - the rest of the columns that must be compared 

[...] fetch next , close deallocate clauses 

set @datefrom = GETDATE ()
IF @@ROWCOUNT = 0 BREAK
END

But there is one catch of course. sql_handle contains the MD5 hash of the entire batch text and is stored in the SQL Manager Cache (SQLMGR). So you could make an easy cursor that will insert the output from “select * from sysprocesses” but the actual SQL query might or might not be in Cache when you arrive at work and analyze them.
What you need to do is to convert the sql_handle into readable SQL text commands as it is collected. One way to do it is to create 2 user functions (one for the statement, one for the Stored Procedure name). These will serve as formula for 2 new computed columns inside your table.

create FUNCTION [dbo].[text_handle] (@handle binary(20), @start int, @end int)
RETURNS nvarchar (max) 
AS
BEGIN
declare @text nvarchar(max) 
set @text = (SELECT  substring(text, (@start + 2)/2, 
CASE @end WHEN -1 THEN (datalength(text)) ELSE (@end -@start + 2)/2 END) 
FROM ::fn_get_sql(@handle))
return @text
END
go

create FUNCTION [dbo].[proc_handle] (@handle binary(20), @db int) 
RETURNS nvarchar (max) 
AS
BEGIN
declare @proc nvarchar(max) 
set @proc = (SELECT 'DB:' + convert (nvarchar(4),@db) + ', SP: ' +  convert (nvarchar(max),text) FROM ::fn_get_sql(@handle))
return @proc
END
go 

ALTER TABLE your_table
    ADD sql_text AS dbo.text_handle (sql_handle,stmt_start,stmt_end)

ALTER TABLE your_table	
    ADD sql_proc AS dbo.proc_handle (sql_handle,dbid)

That should be enough for you to start narrowing the problem and mauybe you will want to collect some perfmon counters as well, because a bottleneck usually doesn’t come from a single source. There might be disk pressure, cpu or memory leak that makes the SQL processes run poorly.

7 Common Design Errors – that can be avoided

Being a DBA usually means fixing a lot of design issues when confronting with performance problems. If the design is done with good planning ahead, then the DBA doesn’t have much to do at the design part. Developers, in the absence of a Database Architect are preoccupied mostly about functional scripting rather that functional + optimal scripting. It is not developers fault, it is because a database design should be performed together with a Database Specialist who can figure out the possible issues at the design phase. A lot of firms don’t invest in this position but rely on developer’s professional experience. I came up with a list of possible and frequent Developer errors that should be avoided in order to prevent SQL thinking twice or more before doing a simple operation.

  1. When filtering the results by the WHERE clause, in equality or unequality operators (=, Like, <>), pay attention when assigning to Non-Unicode types of columns (varchar, char, text) values of Unicode type (nvarchar, nchar, ntext). Doing so, SQL will add a CONVERT_IMPLICIT operation at the execution plan which is a pointless thing to do. Example:
    DECLARE @table AS TABLE
     (varchar_column varchar(50) PRIMARY KEY)
     INSERT INTO @table VALUES ('Maria')
     
    DECLARE @varchar_param AS varchar(50) = 'Maria'
     
    SELECT *
    FROM @table
    WHERE varchar_column = @varchar_param
    
    DECLARE @Nvarchar_param AS Nvarchar(50) = N'Maria'  
    
    SELECT *
    FROM @table
    WHERE varchar_column = @Nvarchar_param
    
    

  2. Based on the same reason as the upper one (CONVERT_IMPLICIT), always use in JOIN operations same types of columns, not different ones which SQL can auto convert without an explicit CAST or CONVERT in your code. Here are all implicit and explicit conversions done by SQL: Data Type Conversion
  3. Avoid using In WHERE clauses and JOIN conditions – functions. If a function is called for column that has an index on it, than the index might not be used as it should be. Or if you do, analyze the execution plan so that optimal index usage is made. The below query is one example of a bad usage of functions inside WHERE clause:

  4. NOLOCK hint is overused. Please be aware that NOLOCK causes dirty reads. This means that at the beginning of the transaction you can retrieve one set of data which by the end of the transaction might get modified but you don’t care about it and go along with it. In the end, the transaction conditions might or might not apply to some data that shouldn’t be touched after all. An alternative to NOLOCK hint is to enable Snapshot Isolation in SQL Server.
  5. Some Stored Procedures are created with parameters that have the datatype different than the assigned value. Same is valid for Variables declaration. Keep the datatype identical for each type of equality operation or assignment.
  6. You can choose some datatypes in favor of other considering: storage and data inserted. Sometimes there is no point in using INT if you will have values up to 255, same as using INT when you expect a lot of data – more that 2 billion. A Column Alter later on is not a valid solution. This means you should use TINYINT datatype in first case or BIGINT in second.
    Also, if you will not handle strings with special characters or chinese/japanese words, than you should use any of the non-unicode datatypes: char, varchar, text rather than the unicode ones because of the doubled size in storage. For example same string stored as NVARCHAR format will be twice as big the VARCHAR format, only because SQL will use one extra byte to encode each character. It causes more disk writes and therefore more disk reads, in the end less performance.

  7. Usage of Common Table Expression: ;WITH (CTE). It is very simple to write and easy to understand but, there is something to think about before using it everywhere.
    First of all, all recursive table definitions must be evaluated for each following operation, each time. The data is not evaluated once for re-use later on and is used only inside the scope of the query. The SQL treats the whole expression more like a view and is held in memory; it is not treated like a temporary table. The advantage with the #temp table is that, the data gets evaluated once and then the results are re-used in the join.
    Second, you can not create Indexes on recursive table definitions used in following Joins or filter operations. In conclusion, the CTE expression is advisable to be used when you have a small data set to use in joins but when you want to join millions of rows, than CTE is not a good option. You should choose temporary tables. It will always run faster.

Analyze Indexes – with or without them?

SQL Indexes – what a great subject to discuss on! I just love them. It is incredible to find out each time, what a proper index can do to your database…

As far as I can think of there are 3 issues related to Indexes:

  1. Fragmentation
  2. Query Optimization
  3. Storage problems – I already blogged about it and if you follow the link you will see the whole story.

I. Fragmentation

Inside each DBA’s maintenance tasks, should be included a step which will collect fragmentation on all Production most critical databases.
You can schedule the Index part once per month, or as frequent as you consider necessary.

The SQL query is rather simple:

  1. Create a table with structure:
     

    use tempdb
    go
    CREATE TABLE #indexes (
       ObjectName CHAR (255),
       ObjectId INT,
       IndexName CHAR (255),
       IndexId INT,
       Lvl INT,
       CountPages INT,
       CountRows INT,
       MinRecSize INT,
       MaxRecSize INT,
       AvgRecSize INT,
       ForRecCount INT,
       Extents INT,
       ExtentSwitches INT,
       AvgFreeBytes INT,
       AvgPageDensity INT,
       ScanDensity DECIMAL,
       BestCount INT,
       ActualCount INT,
       LogicalFrag DECIMAL,
       ExtentFrag DECIMAL)
    
  2. Open a cursor that contains 2 parameters: schema name and table name selected from INFORMATION_SCHEMA.TABLES and execute for each:
     

     
     INSERT INTO #indexes
     EXEC ('DBCC SHOWCONTIG ('schemaname.tablename') WITH FAST, TABLERESULTS, ALL_INDEXES, NO_INFOMSGS')
    
  3. The execution output will be stored in a table with same structure as the output’s columns, used for future analysis.
    I only focus on Indexes that have a fragmentation higher than 5% and have more than 10000 pages. In theory, there is no point in rebuilding an index that occupies less than 24 pages. In practice, it’s generally recommended not to do anything for indexes with less than 10000 pages.
    Next step:
     

    delete from #indexes where LogicalFrag <= 5 or CountPages <= 10000
    

The list resulted can be split in two action strategies:

  • Reorganize Indexes that have a fragmentation rate > 5% and < 30%
  • Rebuild Indexes that have a fragmentation rate > 30%

 
Check this also for more info on fill factor importance and Alter Index syntaxes:

Managing SQL Server Database Fragmentation
How to check Fragmentation on SQL Server 2005


II. Query Optimization

I have been asked so many times, questions like: "Why this procedure is running so slowly?" or "why so many blockages?", "I have a new script and it takes ages until it is completed!" and I can go on. Almost every time, in a bigger or a smaller part - a lack or surplus of one/many indexes is involved.

How to know if you have unused indexes

Indexes in surplus - causes SQL to do as many index updates as many indexes are there on a table that include the same column which is modified in a single statement.
I saw many cases with tables having tens of indexes. Some of them were doubled and some of them were created just following SQL hints from SQL Query Optimizer. There is more damage than good, when you follow literary SQL's hints. First of all, multiple updates on those many indexes which slow down the query, secondarily index disk usage which can get substantial.

To understand which indexes are really unused and should be dropped at next maintenance window, SQL Server must not be restarted for some time (1 month - 6 months), considering that it will have exhausted all possible recurrent user queries, recurrent stored procedures, application transactions that run on it. This is because SQL will hold the statistics until next reboot. In order to have valuable statistics the SQL should have run for an enough period of time.

  1. Look at existing Indexes in your database:

    The script will avoid Primary Keys, Unique keys and already Disabled indexes.

    use your_database
    go
    select i.name index_name, o.name [object_name], i.type_desc, o.type_desc , i.is_disabled, i.is_unique
    from sys.indexes i inner join sys.objects o on 
    i.[object_id] = o.[object_id] where o.type_desc = 'USER_TABLE' and i.name is not null 
    and i.is_primary_key = 0 and i.is_disabled = 0 and i.is_unique = 0
    order by o.name
    
  2. Check how many seeks/scans are done on above extracted indexes. You can filter the query by specifying one or more object names and index names. This select will display also each table's row count.

    Example 1: you see a number of 1867777 scans on a 2000000 rows table and close to 0 seeks. Big problem. It means the index is not well designed. It is used but not as it should be.
    Example 2: you see a number close to 0 both for seeks and scans columns. Why? The index has never been used since SQL was restarted.

    In conclusion, seeks column should filled with numbers. If both seeks and scans have numbers, the indexes should be left there. user lookups are used mostly by primary keys or clustered indexes, so it is unlikely you will something there.

    use your_database 
    go
    IF (object_id('tempdb..#temp') IS NOT Null)
       DROP TABLE #temp   
    
    CREATE TABLE #temp (
    table_name sysname ,
    row_count INT,
    reserved_size VARCHAR(50),
    data_size VARCHAR(50),
    index_size VARCHAR(50),
    unused_size VARCHAR(50))
    SET NOCOUNT ON
    INSERT #temp
    EXEC sp_msforeachtable 'sp_spaceused ''?'''
    
    
    SELECT   OBJECT_NAME(S.[OBJECT_ID]) AS [OBJECT NAME],S.[OBJECT_ID],
             I.[NAME] AS [INDEX NAME],
             USER_SEEKS,
             USER_SCANS,
             USER_LOOKUPS,
             USER_UPDATES, t1.row_count
    FROM     SYS.DM_DB_INDEX_USAGE_STATS AS S
             INNER JOIN SYS.INDEXES AS I
               ON I.[OBJECT_ID] = S.[OBJECT_ID]
                  AND I.INDEX_ID = S.INDEX_ID
                  inner join #temp t1 on S.object_id = object_id (t1.table_name)
                  
    WHERE    OBJECTPROPERTY(S.[OBJECT_ID],'IsUserTable') = 1  
    and is_disabled = 0 and is_unique = 0 and i.name is not null -- and user_seeks =0
    and  OBJECT_NAME(s.[OBJECT_ID]) in ('object1') 
    and I.[name] IN ('index1', 'index2','index3')
    order by  OBJECT_NAME(s.[OBJECT_ID]) ,i.name
    
  3. Check leaf updates, inserts and deletes on indexes identified in previous step with zero scans/seeks.
    This is the ultimate proof that an index which is not used in optimization, is used to be updated at insert/update/delete operations --> bad performance.

    use your_database 
    go
    SELECT OBJECT_NAME(A.[OBJECT_ID]) AS [OBJECT NAME],
           I.[NAME] AS [INDEX NAME],
           A.LEAF_INSERT_COUNT,
           A.LEAF_UPDATE_COUNT,
           A.LEAF_DELETE_COUNT
    FROM   SYS.DM_DB_INDEX_OPERATIONAL_STATS (NULL,NULL,NULL,NULL ) A
           INNER JOIN SYS.INDEXES AS I
             ON I.[OBJECT_ID] = A.[OBJECT_ID]
                AND I.INDEX_ID = A.INDEX_ID
    WHERE  OBJECTPROPERTY(A.[OBJECT_ID],'IsUserTable') = 1 
    and OBJECT_NAME(A.[OBJECT_ID]) in ('object1') 
    
    order by  I.[NAME]
    
  4. At last, identify Indexes that don't do any harm, but overcrowd the Indexes list and use space, these are indexes completely useless and should be dropped.
     

    DECLARE @dbid INT
    SELECT @dbid = DB_ID(DB_NAME())
    SELECT OBJECTNAME = OBJECT_NAME(I.OBJECT_ID),
    INDEXNAME = I.NAME,
    I.INDEX_ID, i.object_id
    FROM SYS.INDEXES I
    JOIN SYS.OBJECTS O
    ON I.OBJECT_ID = O.OBJECT_ID
    WHERE OBJECTPROPERTY(O.OBJECT_ID,'IsUserTable') = 1
    AND I.INDEX_ID NOT IN (
    SELECT S.INDEX_ID
    FROM SYS.DM_DB_INDEX_USAGE_STATS S
    WHERE S.OBJECT_ID = I.OBJECT_ID
    AND I.INDEX_ID = S.INDEX_ID
    AND DATABASE_ID = @dbid) and I.NAME is not null 
    and is_unique =0 and is_disabled = 0
    ORDER BY OBJECTNAME,
    I.INDEX_ID,
    INDEXNAME ASC
    GO
    

Key SQL performance counters – identify bottlenecks

I was planning for a long time to write this post but time is never enough and timing is always not the right one ๐Ÿ™‚
This time I managed to gather all the info and present a simple to follow, brief and explanatory list of most important performance counters used in SQL monitoring. I came up with it, by studying a lot of other similar documents from important SQL professionals and adding personal insight on each of them.

Let’s begin by saying that these counters are helpful when are collected throughout the whole day with a refresh interval set at 1 minute. It can be done easily if you go to: Control Panel\Administrative Tools\Performance Monitor\Data Collector Sets\User defined. Right click and open: New–> Data Collector Set –> check on Create Manually –> check on Performance counters –> start adding performance counters.

Analyze Disk
 
1. Logical Disk

  • Disk Transfers/sec
  • - slow, needs attention : < (less than) than 80 I/Oโ€™s per second when 
    "Physical Disk\Avg. Disk sec/Transfer" is > (greater than) 25ms. 
    This may indicate too many virtual LUNs using the same physical disks on a SAN. 

2. Physical Disk

  • Avg. Disk sec/Read – average time, in seconds, of a read of data to the disk
  • โ€“ very good	 	         < 10 ms 
    โ€“ okay               		 10-20 ms			
    โ€“ slow, needs attention   	 20-50 ms			 
    โ€“ Serious I/O bottleneck  	 > 50 ms 
    
  • Avg. Disk sec/Write – average time, in seconds, of a write of data to the disk
  • โ€“ very good	 	         < 10 ms 
    โ€“ okay               		 10-20 ms			
    โ€“ slow, needs attention   	 20-50 ms			 
    โ€“ Serious I/O bottleneck  	 > 50 ms 
    
  • Avg. Disk sec/Transfer – Disk latency
  • To determine the impact of excessive paging on disk activity, 
    multiply the values of the "Physical Disk\Avg. Disk sec/Transfer"
    and "Memory\Pages/sec counters.  If the product of these counters 	
    exceeds 0.1, paging is taking more than 10 percent of disk access time, 
    which indicates that you need more RAM.  If this occurs over a long 
    period, you probably need more memory. 
    However, "Page Life Expectancy" counter must be checked as well. 
    
  • Disk Bytes/sec – total number of bytes sent to the disk (write) and retrieved from the disk (read) over a period of one second.
    If the Perfmon capture interval is set for anything greater than one second, the average of the values captured is presented.
  • Disk Read Bytes/sec
  • Disk Write Bytes/sec
  • Disk Reads/sec
  • Disk Writes/sec

The above 5 are useful in analyzing any ups or downs for a certain period. It will help you know the nr of reads/writes in normal behavior and if a problem occurs you can link several values and see if the difference is notable.

Analyze Memory
 
1. Memory

  • Available MBytes – amount of physical memory available to processes running on the computer, in Megabytes
  •  
    - lowโ€“ < than 10% available
    - very low โ€“ less than 5% available
    
  • Page file usage %
  • - good - < 70 % 
  • Pages input/sec - the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk.
  • - good - < 10
    - bad  - > 15 

  • Pages/sec - rate at which pages are read/written to disk
  • - slow on a slow disk subsystem  - 100 - 600 
    - slow on a powerful disk system - 600 - 2500 
    - very slow                      - 2500 - 5000
    - extremely slow                 - > 5000 
    

2. SQL Server: Memory Manager

  • Memory Grants pending - Number of waits on processes
  • should stay close to 0 
  • SQL Cache Memory (KB) - how much physical RAM is devoted to SQL Serverโ€™s data cache
  • It should be = (Total amount of RAM in the server) - (RAM used by Windows Server) 
    - (RAM used by SQL Server) - (and any utilities you have running on the server). 
    If much lower, than it may be possible that you aren't allowing SQL Server to dynamically
    allocate RAM, and instead have accidentally specified that SQL Server uses less RAM 
    than it should have for optimal performance. This is not a critical counter for memory
    leak but it's worth looking at it. 

3. SQL Server : Buffer Manager

  • Buffer cache hit ratio - how oftem SQL Server can get data from buffer rather than disk
  • - good - > 90 % OLAP systems 
    - good - > 95% OLTP systems
  • Free list stalls/sec - the frequency of requests for database buffer pages are suspended because there's no buffer available
  • - good - < 2 
  • Free pages - total nr of 8k data pages on all free lists
  • - good -  > 640 
  • Lazy writes / sec - nr of times per sec that lazy writer moves dirty pages from buffer to disk to free buffer space
  • - good - < 20 
  • Page lookups/sec - nr of requests to find a page in the buffer. This counter should be analyzed in relation with "SQL statistics\ Batch Requests/sec" counter.
  •  

  • (Page Lookups/sec) / (Batch Requests/sec) - reveals if query plan is efficient or inefficient
  • -good - < 100 
  • Page reads/sec - reveals index, memory problems
  • - good - < 90 
  • Page writes/sec
  • - good - < 90
  • Page Life Expectancy - time (seconds) that a data page set is kept into buffer memory.
  • - good - > 300 

    Hint: Less than that on a regular basis is a clear indication that your SQL Server needs more RAM allocated to function properly.

Analyze CPU
 
1. System

  • Processor Queue Length
  • - good         - < 4 per cpu 
    - also good    - 4-6  
    - fair         - 7 - 12 
    - bad          - > 12 is 
  • Context Switches/sec - happens when a higher priority thread preempts a lower priority thread that is currently running or when a high priority thread blocks
  •  
    - high        - > 5000 
    - very high   - > 15000 
    

2. Processor

  • Processor time % (total & per process)
  • - Fair - <80% 
  • Privileged time % (total & per process)
  • - Fair - < 75% (higher bottleneck)
  • Interrupt Time % - indicates the percentage of time the processor spends receiving and servicing hardware interrupts
  • - bad         - > 30 % (hardware or driver problem) 
    - very bad    - > 50%
    

Analyze Query Performance
 
1. SQL Server : Access Methods IO Intensive Operations

  • Forwarded records /sec - Occurs when a column's data gets expanded more (for varchar types) than the initially defined space. It is happening only for tables without clustered indexes.
  • - good - < 10 per 100 batch requests/sec

    Hint: Evaluate creating clustered indexes for heap tables.

  • (Index searches /sec) / (Full scans/sec) - Monitors the ratio between Index searches (good) over Full Scans (bad) on base tables or indexes
  • - bad - > 1000 and CPU high 

    Hint: Queries with missing indexes or too many rows requested will have a large number of logical reads and an increased CPU time.
    For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations.

  • Pages splits / sec - Occurs when a 8k page fills and must be split to a new 8k page
  • - fair - < 20 per 100 batch requests/sec 
    

    Hint: If higher, than you have a problem that can be solved by clustered index, good fill factor.
    An increased fill factor helps to reduce page splits because there is more room in data pages before it fills up and a page split has to occur.

2. SQL Statistics

  • Batch requests/sec - shows you how busy is your server.
  • - busy server  - > 1000 
    - very busy    - > 3000 
    

    Hint: A typical 100 Mbs network card is only able to handle about 3000 batch requests per second. If you have a server that is this busy, you may need
    to have two or more network cards, or go to a 1 Gbs network card.

  • SQL Compilations/Sec - The number of times per second that SQL Server compilations have occurred.
  • - good - as low as possible.
    - bad  -  > 100 
    
  • SQL ReCompilations/Sec - The number of times per second that SQL Server recompilations have occurred.
  • - good - as low as possible.
    - bad  -  > 100 
    

    Hint: Evaluate the number of (re)compilations linked to the number of Batch requests/sec, and check if the resulting ratio is less than 30% or higher. If higher than you should start checking which Stored procedures, or ad-hoc queries are getting compiled / recompiled over and over again and causing CPU usage.
    I wrote about this in detail at How to find out which Queries have non-reusable Query Plans? and Top 5 Steps To Determine Bad Performance of Queries

So basically these counters should give you an overall view if your system is low on CPU, Memory or Disk resources or if you should re-design some your queries, tables, indexes. The perfmon counters are a very good indication of how tuned is your system. However, these should be seen together with SQL statistics that can be gathered from the multitude of dynamic views available starting SQL 2005 + editions. In a next post I will write about those important SQL counters, how I prefer to call them. If you start collecting windows counters and SQL counters you should have no problem in identifying a certain bottleneck happened in a timeframe you where not at office or skipped the critical period and couldn't analyze on time everything.

Have a great time until then!

msdb.dbo.sp_readrequest;1 – long running process

When monitoring the new installed version of SQL 2008 R2 SP1,ย I have encountered that Database mail leaves a hung process few minutes. This doesn’t do any blockages or other trouble to your SQL installation, either that it is anoying to see over and over again this long running process. In order to get rid off it, just check the system parameter value for “DatabaseMailExeMinimumLifeTime” running the following query:

Use msdb
GO
exec sysmail_help_configure_sp 'DatabaseMailExeMinimumLifeTime'
GO

The default setting will be 600. You need to change it to a lower period.

Use msdb
GO
exec sysmail_configure_sp 'DatabaseMailExeMinimumLifeTime', 30
GO

That’s all!

 

Table variables or Temporary tables for big tables?

Since Microsoft first introduced table variable in SQL Server, its usage became more and more popular among developers due to its well known benefits: less locking and logging ressources, limited transaction scope and reduced re-compilation when used inside stored procedures.

Considering Microsoft reference of FAQ about table variables, I centralized the main characteristics (advantages and disadvantages) of table variables:

  1. Table variables, such as local variables, have a well defined scope at the end of which they are automatically cleared.
  2. Table variables result in fewer recompilations of a stored procedure as compared to temporary tables.
  3. Transactions that involve table variables last only for the duration of an update on the table variable. Therefore, table variables require less locking and logging resources.
  4. Because table variables have limited scope and are not part of the persistent database, transaction rollbacks do not affect them.
  5. Table variables DO NOT reside only in memory. They are created in the tempdb database, since the table variable could hold more data than could fit into memory. Both table variables and temporary tables will be created and processed in memory if enough memory is available. However, they can both be flushed to disk at anytime.
  6. Non-Clustered indexes which are not Primary or Unique keys, cannot be created on table variables in order to optimize the query execution plan. Only Primary or Unique key can be added at the declare table statement.
  7. Table variables do not maintain statistics like temporary tables can. Therefore, for complex queries on large tables, the lack of statistics may prevent the optimizer to determine the best plan for a query, thus affecting the performance of that query.

There have been over time a lot of discussions whether table variables are better than temporary tables. I am not here to argue any of those, because the reasons are very different for everyone. In my personal view both are very good if used in the right type of queries. Both can perform very good. The problem is that it is hard to know what is the right type of query.

Some time ago, I had to test an SQL query that was developed initially to use table variables. The query used 1 table variable that was populated with data from a join of different standard tables. The result of the insert was small (few rows) but two of the tables involved in the join had 2 and 60 million rows.

This is a part of the query :

declare @temp table 
([...]
)
 INSERT INTO @temp 
        ( [...]
        )
        SELECT  [...] -- columns
        FROM    DWtable2
 		INNER JOIN DWtable DT on [...]
                INNER JOIN SQLTableSent  on [...] 
                INNER JOIN SQLTableCampaign  [...]
                INNER JOIN SQLTableTemplate on [...]
                INNER JOIN SQLTableCulture on [...]
        WHERE   [...] -- apply filters
        GROUP BY [...] 

The duration was extremely long. Following a personal guess, I changed the insert into a table variable (@temp) with an insert into a temporary table (#temp). The query performed 20 times faster, using temporary tables instead of table variables. So, basically with no other improvement, the single use of the temporary table did the trick.
I started to ask why this huge difference in duration.

At first impression:

– The lack of indexes and statistics on the table variable where immediatelly out of the question as it was not the case. An insert does not need any index or statistics on the table populated in order to behave better.
– The select part from the insert was well optimized, because the result would be returned in less than 1 sec.

Next, I started to analyze more deeply. The things that I compared in both cases where:

  • The output from “set statistics io on” when executing the query:
  • I noticed that the insert into table variable had many logical reads for the SQLtableSent object (1640612) and read-ahead reads (919353) compared to 0 for the same object when using temporary table. That table has 60,000,000 rows. It was clear for me that the execution plan was very different because the order of the tables processed and their values where different.

  • The output of “Include Actual Execution Plan” option when executing the query:
  • Comparing those two query execution plans, it hit me – the usage of table variable in modification statements (insert, update, delete) prevents the creation of parallel query processing. An online search confirmed my suspicions. It is written right here at Remarks section:

    “Queries that modify table variables do not generate parallel query execution plans. Performance can be affected when very large table variables, or table variables in complex queries, are modified. In these situations, consider using temporary tables instead. ”

    What it means? During query optimization, SQL Server looks for queries that might benefit from parallel execution on computers that have more than one microprocessor. This is because SQL Server can perform a query in parallel by using several operating system threads and the operation can be completed quickly and efficiently.

    When table variable is used, the query will be processed serially. A serial execution plan, used by a nonparallel query, uses only one thread for its execution and the performance is IN THIS CASE, much worse than in a parallel one. From my experience, I have seen cases when limited parallelism or none, can do wonders on other type of queries.

    In conclusion, it is important to test your query duration/execution plans for both scenarios (table variables and temporary tables), because in some cases non-parallel processing can be better than the parallel one. There are also cases when a non-parallel processing behaves exactly the same as a parallel one. It is a already a subject for another post and will not get into it now, but it is good to keep in mind when working with table variables:

    the key factors that can change the performance:

    1. number of rows manipulated and table sizes
    2. number of tables joined
    3. complexity of filtering columns
    4. complexity of query: group by, having clauses used

    Please check also other usefull links on the subjects:

    http://blogs.msdn.com/b/sqlserverstorageengine/archive/2008/03/30/sql-server-table-variable-vs-local-temporary-table.aspx
    http://www.sqlservercentral.com/articles/Temporary+Tables/66720/

SQL Server Partitioning for large production tables

SQL has introduced from SQL 2005 versions, data partitioning. It is a new feature meant to improve the performance of SQL queries, data management, index management and storage management. The greatest benefit of Data partitioning is that you can have a highly available system while extracting, moving, removing information from tables with hundreds of millions of rows without creating blockages.

Due to a wrong estimation of number of rows or no estimation at all when creating a table, the Database Administrator can be confronted at some point with a big problem: the table has too many rows and indexing doesn’t help anymore to retrieve the data in reasonable time accepted by the whole IT structure. Querying very large tables start to cause big performance issues as a direct efect of the big I/O put each time to extract the data. You can be in the situation that old data can’t be even removed from that table without generating long time locks on the whole table.

Also, If you migrated from SQL 2000 system to SQL2005, it is time to consider partitioning for your biggest tables.

So let’s get into the facts and add partitions to a new table that will replace the un-partitioned one. Of course, I don’t suggest that this can be done easy and with no downtime for the Production environment, but with good planning ahead, it is possible and worth it. I had to deal with a table that had over 150 milllions of rows, measuring 60 GB in data+index size. The actual downtime was 1 hour. Because the physical storage of a table is set at create table statement, you can not add partitioning without re-creating the table. The downtime I was speaking about, refers to the duration of moving the data from the old un-partitioned table to the new partitioned table.

Pre-Requistites

For the demonstration purpose, the table in question will have the following structure:

1. Create the table that does not have any partitions

CREATE TABLE [dbo].[Table_without_partitions](
	[Id] [bigint] IDENTITY(1,1) NOT NULL,
	[Username] [varchar](255) NULL,
	[CreatedDate] [datetime] NOT NULL,
	 CONSTRAINT [PK_Table_without_partitions] PRIMARY KEY CLUSTERED 
([Id] ASC) 
)

2. Populate with values – Insert rows for 4 days

INSERT INTO [DB_Test].[dbo].[Table_without_partitions] ([Username],[CreatedDate])
VALUES      ('USER 1',GETDATE ())
GO 20 

INSERT INTO [DB_Test].[dbo].[Table_without_partitions] ([Username],[CreatedDate])
VALUES      ('USER 2',GETDATE ()+1)
GO 20 

INSERT INTO [DB_Test].[dbo].[Table_without_partitions] ([Username],[CreatedDate])
VALUES      ('USER 3',GETDATE ()+2)
GO 20

INSERT INTO [DB_Test].[dbo].[Table_without_partitions] ([Username],[CreatedDate])
VALUES      ('USER 4',GETDATE ()+3)
GO 20

You need to select a partitioning column that would serve as the limiting interval for a certain partition. This would be the most used column as “filter” in your queries and the most logical to separate various portions of data. In my scenario, [CreatedDate] column will be the partitioning column. Partitioning column can be also any other data type. For example price for a product can be a partitioning column. You will have in that case different partitions for products with the price from: 1 to 1000, 1000 to 2000, 2000 to 3000 etc.

Create the partition function and the partition schema

Method 1:

First I have to create the partition Function. It defines the method SQL Server will use on HOW to split the data inside the table.
In this method I will create a Partition Function that will contain all 4 boundaries, meaning 4 partitions for the 4 days inside our table.

a) Find out what will be the function boundaries:

 select distinct convert(varchar, CreatedDate, 112) from table_without_partitions

------------------------------
20111129
20111130
20111201
20111202

b) Create the Partition Function that will contain all partition boundaries:


CREATE PARTITION FUNCTION [Table_with_partitions_PartFunc](bigint) AS RANGE LEFT FOR VALUES 
(20111129,20111130,20111201,20111202)
GO

c) Create a partition scheme that specifies a ‘NEXT USED’ filegroup:

The partition schema is the second step in preparing the new table for partitioning because it will let SQL Server know WHERE you want to partition the data. You can select a different filegroup from the PRIMARY one. The big improvement will be in the SPEED of data access. To achieve this, you can add more filegroups to the database that will contain secondary data files (.ndf) located on different disks than the primary data file (.mdf) – possibly much faster. Once the filegroups are created these can be assigned to different partition boundaries, considering the most accessed partitions.
In my example though, I will use the default filegroup for all boundaries: PRIMARY.

You will notice that there are 5 filegroups specified instead of 4 – the number of partitions. This is because I specified the next file group that will be used by the next partition added.

CREATE PARTITION SCHEME [Table_with_partitions_PartScheme] 
AS PARTITION [Table_with_partitions_PartFunc] TO ([PRIMARY], [PRIMARY], [PRIMARY],[PRIMARY],[PRIMARY])
GO

ALTER PARTITION SCHEME [Table_with_partitions_PartScheme] 
              NEXT USED [PRIMARY]

Method 2:

Create the Partition Function for a single boundry – the lowest, and then alter the partition function in order to add the next needed boundaries, using an SQL script that will generate the partitions values based on your table values. I used this method while doing the partitioning in Production environment, as there where over 200 boundaries and it was not very fun to write down a list of boundaries for the partition function.

a) Find out the lowest boundry:

 

select min(convert(varchar, CreatedDate, 112)) from table_without_partitions

------------------------------
20111129
 

b) Create the partition function that will contain only the lowest boundry:

CREATE PARTITION FUNCTION [Table_with_partitions_PartFunc](bigint) AS RANGE LEFT FOR VALUES 
(20111129)
GO

c) Create the partiton schema that specifies a ‘NEXT USED’ filegroup:

CREATE PARTITION SCHEME [Table_with_partitions_PartScheme] 
AS PARTITION [Table_with_partitions_PartFunc] TO ([PRIMARY], [PRIMARY])
GO

ALTER PARTITION SCHEME [Table_with_partitions_PartScheme] 
              NEXT USED [PRIMARY]

d) Alter the Partition Function and add the next needed boundaries:

The following script will generate and execute the SQL statements for the next 3 partition boundaries.

Declare @StartDate datetime
Declare @EndDate datetime
Declare @Script nvarchar (4000)
DECLARE @PartValue BIGINT

Set @StartDate = (select min(CreatedDate)+1 from table_without_partitions)
Set @EndDate = (SELECT max(CreatedDate) from table_without_partitions)

WHILE @StartDate <= @EndDate 
Begin	
		SET @PartValue = Convert(bigint,Convert(varchar(8),@StartDate,112))				
		Set @Script = 'ALTER PARTITION FUNCTION [Table_with_partitions_PartFunc] () split RANGE ('+ Convert(VARCHAR,@PartValue) +')
		ALTER PARTITION SCHEME [Table_with_partitions_PartScheme]
					  NEXT USED [PRIMARY] '
		print @Script   		   
		exec sp_executesql @statement = @Script   
		Set @StartDate = @StartDate +1   
End

/*Results:
ALTER PARTITION FUNCTION [Table_with_partitions_PartFunc] () split RANGE (20111130)
ALTER PARTITION SCHEME [Table_with_partitions_PartScheme] NEXT USED [PRIMARY] 

ALTER PARTITION FUNCTION [Table_with_partitions_PartFunc] () split RANGE (20111201)
ALTER PARTITION SCHEME [Table_with_partitions_PartScheme] NEXT USED [PRIMARY] 

ALTER PARTITION FUNCTION [Table_with_partitions_PartFunc] () split RANGE (20111202)
ALTER PARTITION SCHEME [Table_with_partitions_PartScheme] NEXT USED [PRIMARY] 
*/

In both methods you will end up having one partition function for all 4 boundry values and one partition schema created.

Partition a table

1. Create the new table that will replace the old one without partitions

The new table will be created ON the new partition schema, which will point to the partition function.
The table will have a new BIGINT computed column named [Partition_column] - based on the existing [CreatedDate].

CREATE TABLE [dbo].[Table_with_partitions](
	[Id] [bigint] IDENTITY(1,1) NOT NULL,
	[Username] [varchar](255) NULL,
	[CreatedDate] [datetime] NOT NULL,
	[Partition_column]  AS (CONVERT([bigint],CONVERT([varchar],[CreatedDate],(112)),0)) PERSISTED,
 CONSTRAINT [PK_Table_with_partitions] PRIMARY KEY CLUSTERED  ([Id] ASC, [Partition_column] ASC)
 ) ON Table_with_partitions_PartScheme ([Partition_column])

Go to table properties and check that the table is partitioned:

Partitioned table

Partitioned table

Migrate rows from the un-partitioned table to the partitioned table

1. Check the distribution of rows per each partition from the [Table_with_partitions]

SELECT partition_number, sum(rows) FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('Table_with_partitions') group by partition_number order by partition_number

partition_number SUM_ROWS
---------------- --------------------
1                0
2                0
3                0
4                0
5                0

2. Insert all the rows from [Table_without_partitions] to [Table_with_partitions]

This can be done by various methods:

  • SQL Server Export/Import Wizard - you may need to check the "Enable Identity Insert" option from "Edit Mappings" window.
  • SQL statement: Insert into [...] Select from [...]
    
    SET IDENTITY_INSERT [Table_with_partitions] On
    GO
    insert into [Table_with_partitions] ([Id],[Username],[CreatedDate])
    select [Id], [Username], [CreatedDate]
    from [Table_without_partitions] 
    SET IDENTITY_INSERT [Table_with_partitions] Off
    
  • BCP utility to export and BULK INSERT to import. I noticed that BULK INSERT is faster than BCP utility in import operations.
    EXEC master..xp_cmdshell 'BCP "SELECT [Id], [Username], [CreatedDate], ''1111'' as [PARTITION_COLUMN] FROM [DATABASE].[dbo].[Table_without_partitions]" queryout D:\BCP_export_file.TXT  -w -t\t -r\n  -SSERVERNAME -T'
    
    BULK INSERT [DATABASE].[dbo].Table_with_partitions FROM 'D:\BCP_export_file.txt' WITH (KEEPIDENTITY, FIELDTERMINATOR = '\t', ROWTERMINATOR ='\n') 
    
    

You are free to select the one that suits you best considering the amount of rows needed to be migrated. From my experience first two methods are efficient while handling up to 5-10 millions of rows. When we are speaking about tens or hundreds of millions I strongly suggest you use the BCP method. It is the fastest and the safest for your SQL Server because it doesn't put any locks.

3. Check again the distribution of rows per each partition from the [Table_with_partitions]

SELECT partition_number, sum(rows) FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('Table_with_partitions') group by partition_number order by partition_number

partition_number SUM_ROWS
---------------- --------------------
1                20
2                20
3                20
4                20
5                0

Now, all rows have been distributed per each created partition and the 5'th one is empty waiting to be populated with the next partition data.

Deploy in Production

1. Add the next partition

You should create a SQL Job that will run daily and add the partition for the next day, so that the rows inserted for 2011/12/03 will be assigned to the right partition. Of course you can create more partitions ahead and schedule the job to run weekly or monthly.

ALTER PARTITION FUNCTION [Table_with_partitions_PartFunc] () split RANGE (20111203)
ALTER PARTITION SCHEME [Table_with_partitions_PartScheme]  NEXT USED [PRIMARY]

2. Testing

Insert 10 rows for the next day into the new partitioned table [Table_with_partitions].

INSERT INTO [DB_Test].[dbo].[Table_with_partitions] ([Username],[CreatedDate])
VALUES      ('USER 5',GETDATE ()+4)
GO 10

3. Checking

The new inserted 10 rows are assigned to the 5'th partition.

SELECT partition_number, sum(rows) FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('Table_with_partitions') group by partition_number order by partition_number

partition_number sum_rows
---------------- --------------------
1                20
2                20
3                20
4                20
5                10
6                0

That's everything you have to do in order to get rid off the old un-partitioned table.

[UPDATE....]

A small update is in place here, because I missed one detail about creating clustered indexes on partitioned tables.

Create Indexes on partitioned table

Creating Primary Key/Unique Key on partitioned table

When speaking about clustered indexes, I am referring to either the Primary keys or the Unique keys. For a partitioned table the basic behavior is different from a non-partitoned table. Unlike the non-partitioned table that can have a primary key composed of one column that will uniquely identify a certain row in the table, the partitioned table Primary Key must contain the partition column as well which would be a subset of the index key. The default behavior of clustered indexes built on partitioned tables is to align the index with the partitioned table on the same scheme.

That is the reason the Primary Key from my example above is composed of [Id] and [Partition_column]. It ensures that the Index will also be partitioned.

Whereas this is available for Clustered indexes, Non-clustered indexes can be PARTITIONED or NON-PARTITIONED indexes. You define this by specifying the ON clause when creating the index. If you don't specify anything, by default it will be created as partitioned index on the partitioning scheme. To create an non-partitioned index you need to specify the ON PRIMARY clause instead.

I found a very good article written by Michelle Ufford, and she explains very nice the performance impact of having both partitioned and non-partitioned indexes.

Creating Foreign Key referencing a partitioned table

Since the Primary Key on a partitioned table contains also the partionining column, when you try to create a foreign key that will reference a column from the partitioned table that is part of the Primary Key (other than partitoning column) you will receive the below error:

There are no primary or candidate keys in the referenced table ' ' that match the referencing column list in the foreign key ' '.

To overcome this issue, you will need first to create a Unique Non-Clustered Index on the referenced column from the partitioned table. A FOREIGN KEY constraint does not have to be linked only to a PRIMARY KEY constraint in another table; it can also be defined to reference the columns of a UNIQUE constraint in another table.

How to force SQL to use ad-hoc created Indexes on #temp tables

Inside a stored procedure, one simple and common way of querying/modifying some production data and avoiding unnecessary locks on production tables is to use temporary tables. These tables can be queried/modified and joined with base tables without creating blockages; resulting information can be manipulated easier and faster.

There is a problem though, when working with temporary tables. I am speaking about tables created with # in front which are removed at each session’s closure and only those that are part of a stored procedure code.

If you use a certain temporary table in a join operation with another table, and you have many rows implicated – than you would probably see a performance issue, since you don’t have an index covering both joining columns.
This behavior is also available in order by statements or filter clauses (where involved).

You will see table scans on #(temporary) table and the performance will be very poor. The next logical step is to create the proper indexes right after the table creation.

You are absolutely right to do so, but SQL will ignore this index and exclude it from the Execution Plan, because the Execution Plan was already created when the index didn’t exist. When a stored procedure is called, it is first compiled and Query Optimizer creates an Execution Plan for it. It does not take into consideration the indexes created on temporary tables.

In order to achieve this you can do this by following one of the below methods:

  • Create Primary/Unique keys when creating the temporary table. – this is only efficient if you have unique columns involved in order by or filtering options.
  • Specify the table hint: with (index(name_of_the_index)) right after mentioning the table in a select statement.
     

    Example: Select name from #table with (index (index_id)) where id>1000

RangeX-X Key-Range Locks in SERIALIZABLE isolation level (part 3)

I will continue in this part describing other types of Key – Range locks: RangeX-X combined with RangeS-U locks.

As a reference I will use the same table created in part 1 – range_lock.

RangeX-X Locks

RangeX-X locks are exclusive resource locks acquired on Index key values from a specific range when the transaction modifies a key value from the Index. The locks are exclusive locks meaning, no other type of transaction (Select/Update/Insert/Delete) can be executed until the first transaction is completed/commited on the range key values modified.

Start Serializable Transaction

Open a new SQL session and issue below query without commiting the transaction.
Until the end of this blog this session will be known to readers as session 55.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE 
go
begin transaction
go

update range_lock set rname ='ana' where rname = 'anna' 


Analyze the Query Execution Plan

RangeX-X Execution Plan

RangeX-X Execution Plan

 
 
 
 
 
 
You will see a Clustered Index Update, and an Index seek on ix_rname (rname column) – this defines the range of the key values to seek in.
At first look, the execution plan looks exactly as it did on RangeS-U locks – with a slight difference on the operator cost (%): 75% on RangeS-U and 86% on RangeX-X. Why? If I mouse over the Clustered Index Update operation I see 2 indexes listed for Update operation: Primary Key and ix_rname index.

RangeX-X Execution plan - highlight Index Update operation

RangeX-X Execution plan - highlight Index Update operation


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Both Indexes are planned for update because:

  • PK_range_locks – in this case is a Clustered Index that orders the data rows in the table based on its (Primary Key) key values. The update is made on another column than the defined Primary key column. All data rows from a table are called the leaf level of the index. The Primary Key values contain pointers to all data rows from the table. Any update on another column than the primary key will result to an update on the leaf level of the Clustered Index.
  • ix_rname – Update is made on a key value from the Index, so the index should be updated with a different value.

Check Held Locks

Execute:

sp_lock 55

Here is what you would get:

RangeX-X for equality operation

RangeX-X for equality operation

I see that one Exclusive lock (X) is held on IndId=1 (Primary Key) which is needed for the Index update, and I see that RangeX-X lock is acquired on IndId = 2 (ix_rname) on one key value. Also RangeS-U is acquired on IndId = 2 (ix_rname) on one key value. If I go back in part 1, and take a look on table rows and their order, I will see that after “anna” is “antony” row. I can take a smal guess that the RangeS-U lock is on the next key value after “anna” – “antony”. How can I prove that?
 
 
 
Create Blockage

Open a second SQL session (here will be known as session 58) and execute:

begin transaction 

select rname from range_lock where 	
rname = 'antony' 

This statement will work fine. RangeS-U locks allows Select statements to be executed. Rollback this transaction.

Inside the same session execute:

begin transaction 

update range_lock set rname = 'antoni' where
rname = 'antony' 

You will see that this transaction will be blocked by the first one.

Session 58 is blocked by session 55

Session 58 is blocked by session 55


 
 
 
 

Check Locked resources on both transactions

To figure out why the update operation on rname column for the “antony” row is blocked by the first update on rname column for the “anna” rows check this out:

Execute:

sp_lock 55 
go
sp_lock 58
go
RangeX-X for equality operation

RangeX-X for equality operation

I see one WAIT type of lock for the second transaction on same resource the first transaction has put RangeS-U lock. It is clear that we are speaking about same row: “antony”.

Microsoft explains WAIT locks as:

WAIT – Lock is blocked by another process.

Execute Rollback on both sessions (55 and 58) to preserve the initial image of the table.
The conclusion I get is: RangeX-X locks will only be acquired on those columns that are actually modified from the range. Even if the range contains the “next” value from the table list of rows, RangeS-U locks will be held on that value (s), in order to maintain the integrity of data until the transaction is commited.

 

RangeS-U – Key-Range Locks in SERIALIZABLE isolation level – (part 2)

I will continue in this part describing other types of Key – Range locks.

As a reference I will use the same table created in part 1 – range_lock.

RangeS-U Locks

RangeS-U locks are locks acquired on Index key values from a specific range when the following conditions are met:

  • The transaction contains an UPDATE statement
  • WHERE clause is used to specify a condition that limit the rows that are updated. The condition will include at least one column from the Index. This will define the range of key value(s) from the Index. In this case, locks are needed on defining key value(s) that refer to the rows updated.
  • The UPDATE modifies a column(s) which is(are) not contained in the Index definition or:
  • The UPDATE modifies a column(s) which is(are) contained in the Index definition.

I hope from the below example the explanation will more straightforward. I will use a larger range of values in this example, meaning I will use an inequality operation. The equality operation will have the same types of locks.

Start Serializable Transaction

Open a new SQL session and issue below query without commiting the transaction.
Until the end of this blog this session will be known to readers as session 55.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE 
go
begin transaction
go

update range_lock set sname ='surname' where rname between 'anna' and 'arlen'

Analyze the Query Execution Plan

RangesS-U - Execution Plan

RangesS-U - Execution Plan

 
 
 
 
 
 
You will see a Clustered Index Update, and an Index seek on ix_rname (rname column) – this defines the range of the key values to seek in.

The update on PK_range_locks happens because in this case the PK is a Clustered Index that orders the data rows in the table based on its (Primary Key) key values. The update is made on another column than the defined Primary key column. All data rows from a table are called the leaf level of the index. The Primary Key values contain pointers to all data rows from the table. Any update on another column than the primary key will result to an update on the leaf level of the Clustered Index.

Check Held Locks

Execute:

sp_lock 55

Here is what you would get:

RangeS-U for inequality operation

RangeS-U for inequality operation

I see 3 Exclusive locks (X) are held on IndId=1 (Primary Key) which are needed for the Clustered Index update, and I see that RangeS-U locks are acquired on IndId = 2 (ix_rname) on four key values. If I go back in part 1, and take a look on table rows and their order, I will see that between “anna” and “arlen” are 3 rows, not 4. The fourth lock is on the next key value after “arlen” – “barry”.
 
 
 
 
 
 
 
 
As the name suggests Shared – Update locks mean that:

  • Other transactions that need to Select the same rows inside the range will be permitted.
  • Other transactions that perform an update on any of the key values from the range – will be blocked by the first one until the first one is commited or rollbacked.

Create Blockage

Open a second SQL session (here will be known as session 58) and execute:

begin transaction 

select rname from range_lock where 	
rname = 'barry' 

This statement will work fine. Rollback this transaction.

Inside the same session execute:

begin transaction 

update range_lock set rname = 'barri' where
rname = 'barry'  

You will see that this transaction will be blocked by the first one.

Session 58 is blocked by session 55

Session 58 is blocked by session 55


 
 
 
 

Check Locked resources on both transactions

To figure out why the update operation on rname column for the “barry” row is blocked by the first update on sname column for the rows between “anna” and “arlen” check this out:

Execute:

sp_lock 55 
go
sp_lock 58
go
RangeS-U for inequality operation

RangeS-U for inequality operation

I see one WAIT type of lock for the second transaction on same resource the first transaction has put RangeS-U lock. It is clear that we are speaking about same row: “barry”.

Microsoft explains WAIT locks as:

WAIT – Lock is blocked by another process.

Execute Rollback on both sessions (55 and 58) to preserve the initial image of the table.
This is all about RangeS-U locks and further in part 3, I will detail RangeX-X locks and RangeX-X combined with Range S-U locks.