jump to navigation

Index Fragmentation and dm_db_index_physical_stats December 12, 2007

Posted by furrukhbaig in DMV's, dm_db_index_physical_stats, Fragmentation, Index Fragmentation, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, SQL Server 2005, SQL Server 2005 features, Tips.
Tags: , , , , , ,
1 comment so far

Fragmentation can cause serious performance issues and its vital to understand how this can impact performance before I tell you how to fix it.

Fragmentation can cause slow running queries due to more disk IO then usual which itself cause by page spliting. Fragmentation not only cause query performance it can also slow down write operation such as INSERT, DELETE and UPDATE.

SQL Server write data on 8k data pages. This limit can be reached very quickly, for wide tables storing lots of varchar/nvarchar columns (nvarchar take 2 bytes to store 1 character), will cause request for more data pages. When SQL Sever uses whole page then it occupy another page to store more data. If table has got cluster index then data needs to be stored in sorted order as cluster index ensure physical order of data according to the cluster key and thats why you can only have one clustered index per table.

Now assume you have compound clustered index on CountryID and CustomerNo colums on a transactional table which recieve 10 transactions per minute for an online car rental portal. Clustered index will make sure that data is stored in sorted order on CountryID and CustomerNo. Having millions of rows in table consider what will happen when you recieve transaction row which needs to be inserted on page 1 (CountryId = 1, CustomerId = 100) . Above will require to make some room on 1st data page (depending on FILLFACTOR, default is 100%). If require data page has not got enough room for the row to fit then rows on the data pages needs to be move forward (called PageSplit) in order to make some room for new row to fit on page in sorted order. This can cause lot of page split and fragmentation as new pages required (caused by page splits on existing data pages)  can not be guarantee to be in sequence with other page. This can seriously hurt performance for data modification (due to pagesplit) and data read (due to out of order pages) operations.

dm_db_index_physical_stats can be used to check fragmentation.

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(N’YourDB’), OBJECT_ID(N’dbo.CarRental’), NULL, NULL , ‘DETAILED’);

The above SQL will return fragmentation information for given Database name (Parameter 1) and Table Name (Parameter 2). Last parameter suggest the mode of analysis.

It is good practice to create clustered indexes on incrementing value (e.g. IDENTITY) to avoid page splits and out of order pages.

Statement Level Recompile August 22, 2007

Posted by furrukhbaig in Performance, RECOMPILE hint, SQL Server 2005, stored procedure, TSQL.
Tags: , , ,
1 comment so far

In SQL Server 2000, the unit of compilation was the whole stored procedure. So even if you wanted just one particular query to be recompiled, you couldn’t request it. If you created the stored procedure with the RECOMPILE option, the whole procedure went through recompilation every time you invoked it.SQL Server 2005 supports statement-level recompile. Instead of having all queries in the stored procedure recompiled, SQL Server can now recompile individual statements. You’re provided with a new RECOMPILE query hint that allows you to explicitly request a recompilation of a particular query. This way, other queries can benefit from reusing previously cached execution plans if there’s no reason to recompile them every time the stored procedure is invoked.

Parameter Sniffing and OPTIMIZE FOR August 22, 2007

Posted by furrukhbaig in Optimize, Parameter Sniffing, Performance, Query hint, SQL Server 2005, stored procedure, TSQL.
Tags: , , , , ,

Parameter Sniffing refers to a process whereby SQL Server’s execution environment “sniffs” the parameter values during first invocation, and passes it along to the query optimizer so that they can be used to generate optimal query execution plans.

“First invocation” also refers to the first invocation after a plan was removed from cache for lack of reuse or for any other reason. The optimizer “knows” what the values of the input parameters are, and it generates an adequate plan for those inputs parameters. SQL Server internally maintains the statistics and distribution of the values in the columns used for filtering.

While parameter sniffing is certainly a powerful feature, it can cause problems when a procedure’s plan happens to have been kicked out of the procedure cache (or was never in it) just prior to the procedure being called with atypical parameter values. This can result in a plan that is skewed toward atypical use, one that is not optimal when called with typical values. Since, once cached, a query plan can be reused for parameter values that vary widely, the ideal situation is to have a plan in the cache that covers the typical usage of the procedure as much as possible. If a plan makes it into the cache that is oriented toward atypical parameter values, it can have a devastating effect on performance when executed with typical values.

An example would probably help here. Suppose we had a stored procedure that returns sales data by country. In our case, three-fourths of our sales is in the UK. The procedure takes a single parameter, @country, indicating the country for which to return sales info. It uses this parameter to filter a simple SELECT statement that returns the requested sales data.

(@Country Varchar(50))
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @Country 

 The optimizer would most likely to choose to do a clustered index scan when creating execution plan for this query because (given that “UK” would normally be passed in for @country) so much of the table would be traversed anyway that scanning it would require less I/O and be faster than repeated nonclustered index lookups. However, what happens if the plan happens to have been kicked out of the cache (let’s say due to an auto-statistics update) just prior to a user calling it with, say, “Spain”, where we have almost no sales? Assuming a suitable index exists, the optimizer may decide to use a nonclustered index seek in the new query plan. Subsequent executions of the procedure would reuse this plan, even if they passed in “UK” for @country. This could result in performance that is very slower than the scan-based plan.

As a workaround prior to SQL Server 2005, local variables can be used instead of stored procedure parameters. Please note SQL Server can not sniff the value of local variable. This will lead SQL Server to use statistics on filter column and create a plan which is best for average values in that column. This can also lead to serious performance when same procedure called with atypical value but will do best for typical values.

(@Country Varchar(50))
DECLARE @_Country Varchar(20)
SET @_Country = @Country 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @_Country 

There’s a new query hint provided in SQL Server 2005 to tackle the problem—the OPTIMIZE FOR query hint. This hint allows you to provide SQL Server with a literal that reflects the selectivity of the variable, in case the input is typical. For example, if you know that the variable will typically end up with a highly selective value, you can provide the literal which reflects the typical value. for example.

CREATE PROCEDURE uspGetCountrySale (@Country Varchar(50))
DECLARE @_Country Varchar(20)
SET @_Country = @Country 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @_Country
OPTION (OPTIMIZE FOR(@Country = ‘UK’)); 



Stored Procedures !! Some Facts you should know August 22, 2007

Posted by furrukhbaig in Performance, SQL Server 2005, TSQL.
Tags: , ,

Most of us are familliar with the stored procedures and use them on daily bases. But there are some facts which can cause some performance issues and here I just want to shed some light on them.

  • Always use qualified name when calling sprocs for example EXEC dbo.Your_Proc

This is very common mistake which cause an extra trip to procedure cache to get execution plan for execution. SQL Server compiles stored procedure on its first execution and store the execution plan in procedure cache to be reuse in subsequent call for same sproc. In order to get the execution plan it require qualified stored procedure name e.g. dbo.My_Proc (fully qualified name contain Server.Database.Owner.My_Proc). When owner name is not specified then initial cache lookup by object name fails as owner name was not specified. SQL Server then acquire exclusive compile lock on stored procedure and all the referenced objects including tables for recompilation. Next step SQL Server will resolve the object name to a objectID and before compilation it makes another trip to procedure cache by using object id that can result finding previous exection plan. But as you noticed this can cause blocking in certain situation where many SPIDs are calling same sproc frequently and while its is lock for compilation all the caller have to wait until sql server find the execution plan in cache or recompile the sproc.

So the rule of thumb is Always qualify your objects (sproc, tables, views, functions) with owner name.

This behaviour can be captured using profiler by capturing following events.

The SP:CacheMiss event occurs when the cache lookup by name fails. The subsequent SP:ExecContextHit indicates that a matching cached plan was ultimately found in cache once the ambiguous object name had been resolved to an object ID. Depending on the circumstances, SP:CacheHit may appear in place of SP:ExecContextHit.

for more detail see http://support.microsoft.com/kb/263889


This can cause extra network traffic and can have some serious impact on performance when sproc get called frequently.

  • Don’t use sp_ prefix in stored procedure name

As sp_ prefix is reserved for system stored procedure and any stored procedure which has sp_ prefix will cause an extra lookup in MASTER database. There is another point to note that if a stored procedure uses same name, in user database as system stored procedure in master database, the stored procedure in user database will never get executed as SQL Server will always look first in master database and will execute that one rather one in user database.

  • Avoid using temp tables and DDL statements

This can cause stored procedure recompile when temp table, created in stored procedure, get referenced first time within stored procedure. Due to non-existense of statistics optimizer will not be able to reuse or create execution plan for queries using temp table which is created within stored procedure. Same applies on DDL statement as they also force stored procedure to recompile.

When a batch is recompiled in SQL Server 2000, all of the statements in the batch are recompiled, not just the one that triggered the recompilation. SQL Server 2005 improves upon this behavior by compiling only the statement that caused the recompilation, not the entire batch. This “statement-level recompilation” feature will improve SQL Server 2005’s recompilation behavior when compared to that of SQL Server 2000. In particular, SQL Server 2005 spends less CPU time and memory during batch recompilations, and obtains fewer compile locks.


Posted by furrukhbaig in Indexes, Performance, SQL Server 2005.
Tags: , , ,

Many of us are familiar with the AUTO_UPDATE_STATISTICS database option. It provides the Optimizer with up-to-date index and column statistics to achieve optimum query plans queries execution.

Everything comes up with a price. There are some pros and cons of using that option and choice should be made very carefully otherwise it can seriously hurt performance. For example, the AUTO_UPDATE_STATISTICS option, when turned on (which it is by default for a new database), tells the Query Optimizer to automatically update the index and column statistics as data changes. It is vital because data changes can affect Optimizer decision in selecting optimum query plan. Statistics are automatically updated when the statistics used in a query execution plan are outdated. Statistics are considered outdated when 20% or more rows in a table have changed.

So what it means to me is when Optimizer consider statistics need to be updated it does it right there and then recompile the query to use updated statistics and create new execution plan and execute the query. Most of the time this process is not noticeable but in some scenarios it can hinder query performance and might cause query timeouts.

For large tables that have many indexes and statistics on columns it can take significant amount of time. Query has to wait during that time to be recompiled using new statistics and then time to execute that query plan. This can cause serious delay and can happen on periodic bases for example query usually takes 2 second but some time it takes more the 10 seconds. And believe me it will take you some time before realizing that what’s going on.

Another point to consider is it is unlikely that up-to-date statistics will change execution plan as queries should be written in a manner so they can make use of available indexes and cached for reuse. I would not expect my stored procedure to keep changing its execution plan because it can hinder performance and make no use of cached execution plan. But in some cases, with dynamic SQL, this will help improve performance. So use it with care.

The new feature in 2005 AUTO_UPDATE_STATISTICS_ASYNC can be used to avoid query wait for updated statistics. It enables optimizer to carry on using old statistics while statistics are updated asynchronously and made available for next execution. Keep in mind this can also make queries slower as it uses old statistics without any wait and there is a possibility that changes in data was significant and optimizer should have updated statistics to get optimal execution plan. But in most cases this should work fine. As I mentioned before that its unlikely that execution plan changes for regularly running queries unless columns in ORDER BY and WHERE clause keep changing.

So Recommendation is to use both options ON to be on safer side but always test this with workload in your scenario.

GUID vs IDENTITY ? August 18, 2007

Posted by furrukhbaig in Indexes, Performance, SQL Server 2005, TSQL.
Tags: , ,

Ahhhh… thats the hot debate among developers and database designers now a days. The system I am working on is using GUIDs on database tier and it is the backbone of middle tier framework to make every row unique (I didn’t design that).

The key issues with GUID is its very large (16 bytes). GUID’s are mainly used as PRIMARY KEY (usally CLUSTERED) to ensure uniqueness of rows.  As a rule of thumb clustered index key / primary key should be as narrow as possible (in case of primary key consider lookups for primary key column by foriegn key tables) and therefore GUID are not best suited for clustered index. Again it will cause very high fragmentation on leaf level and page split on data pages as well as index pages  (while sql 2005 has also introduced sequential guids but we are generating guids on client side). Clustered index on GUID also hurt performance for non-clustered index as NC index uses clustered key as page pointer on leaf-level pages.

There are some argument in favor of GUID and its mainly to maintain uniqueness in merge replication scenario or data warehousing scenario where data will be migrated from multiple servers and having single identity (INT) column will cause duplication. There could be several workaround for this issue. One is to create composite primary key on two INT column (one is for row id and other is for server id) to make sure uniqueness among multiple servers. other is to maintain identity ranges for multiple servers.

I just could not stress more about being carefull when using GUID column as candidate for clustered index when designing database as its night mare when it comes to performance.

The confusion starts when people with good designing skills in Middle tier start thinking for a database tier. Believe me database tier is fundamentally different from middle-tier. Every object on middle-tier does not necessarily corresponds to a table on database tier and object relationship does not corresponds to table relationship. And what about performance, there are many people who does not consider performance when desiging databases but i disagree with them and I think data-load should be primary factor, to be consider when designing dababase, among other key factors. After having a very good design with out performance and dataload consideration you will end-up redesigning your database after couple of days or month of going live with such system and hiring someone like me for performance tuning (by the way thats what i do) ;).

INDEX / Statistics Age August 17, 2007

Posted by furrukhbaig in DMV's, Indexes, Performance, SQL Server 2005, TSQL.

When I was working with a major cable provider in UK I had a task to tune performance of a billing engine that runs over night and produced millions of rows in tables and suddenly start taking more then double time to finish.But after struggling many hours to find out why index has not been selected by optimizer I realized that optimizer has old statistics for that index that does not represent the current state of data and therefore stats needs to be updated. Production DBAs has disabled the index rebuild job by mistake that runs after billing engine inserts millions of rows. I often heard from production DBA that they don’t have enough overnight window to rebuild indexes. Which makes database developer’s life misery in fixing performance without realizing that statistics are not up-to-date. I have experienced significant performance difference after updating statistics or rebuilding indexes.

Always check your index age before start looking into execution plan. Here is some very handy query to find out age of index / statistics.  Replace Table_Name with your table name.

            ‘Index Name’ = ind.name,
            ‘Statistics Date’ = STATS_DATE(ind.object_id, ind.index_id)
            SYS.INDEXES ind
            OBJECT_NAME(ind.object_id) = ‘Table_Name’


SEEK is better then SCAN August 17, 2007

Posted by furrukhbaig in DMV's, dm_db_index_physical_stats, dm_db_index_usage_stats, Execution Plan, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, Query hint, SEEK vs Scan, SQL Server 2005, SQL Server 2005 features, TSQL, XML Execution Plan.
add a comment

I have been involved in performance tuning for several databases and usually come across a situation where indexes has been created but due to poorly written SQL Code those indexes are not utilized or badly used by optimizer. To find out how optimizer using your index use query below. This will provide you the frequency optimizer is uses SEEK, SCAN and LOOKUP operation on indexes. 

As a rule of thumb SEEK is better then SCAN. But there are few scenarios where SCAN operator perform better then SEEK specially when large number of rows are expected to return by operation.

            Table_Name = OBJECT_NAME(usg.object_id),
            sys.dm_db_index_usage_stats usg
            INNER JOIN sys.indexes ind
                  ON usg.object_id = ind.object_id
                  AND usg.index_id = ind.index_id
            — Your table list goes here
            OBJECT_NAME(usg.object_id) IN ( ‘Table_Name1’, ‘Table_Name2’ )


Is this Index useful ? August 17, 2007

Posted by furrukhbaig in DMV's, dm_db_index_usage_stats, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, SQL Server 2005, SQL Server 2005 features, TSQL.
Tags: , , , , , , , , ,
1 comment so far

Creating indexes on tables always been a tricky question. Creating a index does not mean that optimizer will use that index to solve queries. SQL Server 2005 introduced a dynamic management view sys.dm_db_index_usage_stats to capture the use of indexes by SELECT, INSERT, UPDATE and DELETE. Thus any index or table which does not exists in this view is overhead to system and not useful.

Following query will return all the indexes and tables which never get used by SQL Server since the service started. The best way to check is to create index and run the data work load or regression test that should make use of all indexes and run following query.see full article on SQL Server Customer Advisory Team (CAT) How can SQL Server 2005 help me evaluate and manage indexes

            ObjectName = OBJECT_NAME(ind.object_id),
            IndexName = ind.name
            Sys.Indexes ind
            INNER JOIN Sys.Objects obj ON obj.object_id = ind.object_id
            OBJECTPROPERTY(obj.object_id,‘IsUserTable’) = 1
            AND NOT EXISTS
                        Sys.dm_db_index_usage_stats usg
                        usg.object_id = ind.object_id 
                        AND usg.index_id = ind.index_id
                        AND ind.index_id = usg.index_id

Worst Performing Queries August 17, 2007

Posted by furrukhbaig in DMV's, Indexes, Performance, SQL Server 2005, TSQL.
Tags: , , , , , ,

I have been looking for a easy way to find worst performing queries without runing Profiler and I come across very interesting DMV’s dm_exec_query_stats and dm_exec_sql_text.Microsoft has introduced Dynamic Management Views (sys.dm_exec_query_stats, sys.dm_exec_sql_text) in SQL Server 2005 which provide a way to fetch information related to resources used by SQL Server for executing SQL statements.

Following query will return TOP 100 worst performing SQL Statement with the name of object that contain those statements e.g. Stored Procedure, Trigger and Function. Current database context will be used so change the database before executing query. The result will be sorted by CPU time in descending order.

            [Object_Name] = object_name(st.objectid),
            total_cpu_time = total_worker_time / 1000,
            avg_cpu_time = (total_worker_time / execution_count) / 1000,
            min_cpu_time = min_worker_time / 1000,
            max_cpu_time = max_worker_time / 1000,
            last_cpu_time = last_worker_time / 1000,
            total_time_elapsed = total_elapsed_time / 1000 ,
            avg_time_elapsed = (total_elapsed_time / execution_count) / 1000,
            min_time_elapsed = min_elapsed_time / 1000,
            max_time_elapsed = max_elapsed_time / 1000,
            avg_physical_reads = total_physical_reads / execution_count,
            avg_logical_reads = total_logical_reads / execution_count,
            SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
                              CASE statement_end_offset
                                    WHEN 1 THEN DATALENGTH(st.text)
                                    ELSE qs.statement_end_offset
                        ) /2
                  ) + 1
            ) as statement_text
            sys.dm_exec_query_stats qs
            sys.dm_exec_sql_text(qs.sql_handle) st
            Object_Name(st.objectid) IS NOT NULL
            AND st.dbid = DB_ID()
            total_worker_time / execution_count  DESC