Monthly Archives: January 2012

Table variables or Temporary tables for big tables?

Since Microsoft first introduced table variable in SQL Server, its usage became more and more popular among developers due to its well known benefits: less locking and logging ressources, limited transaction scope and reduced re-compilation when used inside stored procedures.

Considering Microsoft reference of FAQ about table variables, I centralized the main characteristics (advantages and disadvantages) of table variables:

  1. Table variables, such as local variables, have a well defined scope at the end of which they are automatically cleared.
  2. Table variables result in fewer recompilations of a stored procedure as compared to temporary tables.
  3. Transactions that involve table variables last only for the duration of an update on the table variable. Therefore, table variables require less locking and logging resources.
  4. Because table variables have limited scope and are not part of the persistent database, transaction rollbacks do not affect them.
  5. Table variables DO NOT reside only in memory. They are created in the tempdb database, since the table variable could hold more data than could fit into memory. Both table variables and temporary tables will be created and processed in memory if enough memory is available. However, they can both be flushed to disk at anytime.
  6. Non-Clustered indexes which are not Primary or Unique keys, cannot be created on table variables in order to optimize the query execution plan. Only Primary or Unique key can be added at the declare table statement.
  7. Table variables do not maintain statistics like temporary tables can. Therefore, for complex queries on large tables, the lack of statistics may prevent the optimizer to determine the best plan for a query, thus affecting the performance of that query.

There have been over time a lot of discussions whether table variables are better than temporary tables. I am not here to argue any of those, because the reasons are very different for everyone. In my personal view both are very good if used in the right type of queries. Both can perform very good. The problem is that it is hard to know what is the right type of query.

Some time ago, I had to test an SQL query that was developed initially to use table variables. The query used 1 table variable that was populated with data from a join of different standard tables. The result of the insert was small (few rows) but two of the tables involved in the join had 2 and 60 million rows.

This is a part of the query :

declare @temp table 
        ( [...]
        SELECT  [...] -- columns
        FROM    DWtable2
 		INNER JOIN DWtable DT on [...]
                INNER JOIN SQLTableSent  on [...] 
                INNER JOIN SQLTableCampaign  [...]
                INNER JOIN SQLTableTemplate on [...]
                INNER JOIN SQLTableCulture on [...]
        WHERE   [...] -- apply filters
        GROUP BY [...] 

The duration was extremely long. Following a personal guess, I changed the insert into a table variable (@temp) with an insert into a temporary table (#temp). The query performed 20 times faster, using temporary tables instead of table variables. So, basically with no other improvement, the single use of the temporary table did the trick.
I started to ask why this huge difference in duration.

At first impression:

– The lack of indexes and statistics on the table variable where immediatelly out of the question as it was not the case. An insert does not need any index or statistics on the table populated in order to behave better.
– The select part from the insert was well optimized, because the result would be returned in less than 1 sec.

Next, I started to analyze more deeply. The things that I compared in both cases where:

  • The output from “set statistics io on” when executing the query:
  • I noticed that the insert into table variable had many logical reads for the SQLtableSent object (1640612) and read-ahead reads (919353) compared to 0 for the same object when using temporary table. That table has 60,000,000 rows. It was clear for me that the execution plan was very different because the order of the tables processed and their values where different.

  • The output of “Include Actual Execution Plan” option when executing the query:
  • Comparing those two query execution plans, it hit me – the usage of table variable in modification statements (insert, update, delete) prevents the creation of parallel query processing. An online search confirmed my suspicions. It is written right here at Remarks section:

    “Queries that modify table variables do not generate parallel query execution plans. Performance can be affected when very large table variables, or table variables in complex queries, are modified. In these situations, consider using temporary tables instead. ”

    What it means? During query optimization, SQL Server looks for queries that might benefit from parallel execution on computers that have more than one microprocessor. This is because SQL Server can perform a query in parallel by using several operating system threads and the operation can be completed quickly and efficiently.

    When table variable is used, the query will be processed serially. A serial execution plan, used by a nonparallel query, uses only one thread for its execution and the performance is IN THIS CASE, much worse than in a parallel one. From my experience, I have seen cases when limited parallelism or none, can do wonders on other type of queries.

    In conclusion, it is important to test your query duration/execution plans for both scenarios (table variables and temporary tables), because in some cases non-parallel processing can be better than the parallel one. There are also cases when a non-parallel processing behaves exactly the same as a parallel one. It is a already a subject for another post and will not get into it now, but it is good to keep in mind when working with table variables:

    the key factors that can change the performance:

    1. number of rows manipulated and table sizes
    2. number of tables joined
    3. complexity of filtering columns
    4. complexity of query: group by, having clauses used

    Please check also other usefull links on the subjects: