jump to navigation

Index Fragmentation and dm_db_index_physical_stats December 12, 2007

Posted by furrukhbaig in DMV's, dm_db_index_physical_stats, Fragmentation, Index Fragmentation, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, SQL Server 2005, SQL Server 2005 features, Tips.
Tags: , , , , , ,
1 comment so far

Fragmentation can cause serious performance issues and its vital to understand how this can impact performance before I tell you how to fix it.

Fragmentation can cause slow running queries due to more disk IO then usual which itself cause by page spliting. Fragmentation not only cause query performance it can also slow down write operation such as INSERT, DELETE and UPDATE.

SQL Server write data on 8k data pages. This limit can be reached very quickly, for wide tables storing lots of varchar/nvarchar columns (nvarchar take 2 bytes to store 1 character), will cause request for more data pages. When SQL Sever uses whole page then it occupy another page to store more data. If table has got cluster index then data needs to be stored in sorted order as cluster index ensure physical order of data according to the cluster key and thats why you can only have one clustered index per table.

Now assume you have compound clustered index on CountryID and CustomerNo colums on a transactional table which recieve 10 transactions per minute for an online car rental portal. Clustered index will make sure that data is stored in sorted order on CountryID and CustomerNo. Having millions of rows in table consider what will happen when you recieve transaction row which needs to be inserted on page 1 (CountryId = 1, CustomerId = 100) . Above will require to make some room on 1st data page (depending on FILLFACTOR, default is 100%). If require data page has not got enough room for the row to fit then rows on the data pages needs to be move forward (called PageSplit) in order to make some room for new row to fit on page in sorted order. This can cause lot of page split and fragmentation as new pages required (caused by page splits on existing data pages)  can not be guarantee to be in sequence with other page. This can seriously hurt performance for data modification (due to pagesplit) and data read (due to out of order pages) operations.

dm_db_index_physical_stats can be used to check fragmentation.

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(N’YourDB’), OBJECT_ID(N’dbo.CarRental’), NULL, NULL , ‘DETAILED’);

The above SQL will return fragmentation information for given Database name (Parameter 1) and Table Name (Parameter 2). Last parameter suggest the mode of analysis.

It is good practice to create clustered indexes on incrementing value (e.g. IDENTITY) to avoid page splits and out of order pages.

Advertisements

AUTO_UPDATE_STATISTICS and AUTO_UPDATE_STATISTICS_ASYNC August 22, 2007

Posted by furrukhbaig in Indexes, Performance, SQL Server 2005.
Tags: , , ,
2 comments

Many of us are familiar with the AUTO_UPDATE_STATISTICS database option. It provides the Optimizer with up-to-date index and column statistics to achieve optimum query plans queries execution.

Everything comes up with a price. There are some pros and cons of using that option and choice should be made very carefully otherwise it can seriously hurt performance. For example, the AUTO_UPDATE_STATISTICS option, when turned on (which it is by default for a new database), tells the Query Optimizer to automatically update the index and column statistics as data changes. It is vital because data changes can affect Optimizer decision in selecting optimum query plan. Statistics are automatically updated when the statistics used in a query execution plan are outdated. Statistics are considered outdated when 20% or more rows in a table have changed.

So what it means to me is when Optimizer consider statistics need to be updated it does it right there and then recompile the query to use updated statistics and create new execution plan and execute the query. Most of the time this process is not noticeable but in some scenarios it can hinder query performance and might cause query timeouts.

For large tables that have many indexes and statistics on columns it can take significant amount of time. Query has to wait during that time to be recompiled using new statistics and then time to execute that query plan. This can cause serious delay and can happen on periodic bases for example query usually takes 2 second but some time it takes more the 10 seconds. And believe me it will take you some time before realizing that what’s going on.

Another point to consider is it is unlikely that up-to-date statistics will change execution plan as queries should be written in a manner so they can make use of available indexes and cached for reuse. I would not expect my stored procedure to keep changing its execution plan because it can hinder performance and make no use of cached execution plan. But in some cases, with dynamic SQL, this will help improve performance. So use it with care.

The new feature in 2005 AUTO_UPDATE_STATISTICS_ASYNC can be used to avoid query wait for updated statistics. It enables optimizer to carry on using old statistics while statistics are updated asynchronously and made available for next execution. Keep in mind this can also make queries slower as it uses old statistics without any wait and there is a possibility that changes in data was significant and optimizer should have updated statistics to get optimal execution plan. But in most cases this should work fine. As I mentioned before that its unlikely that execution plan changes for regularly running queries unless columns in ORDER BY and WHERE clause keep changing.

So Recommendation is to use both options ON to be on safer side but always test this with workload in your scenario.

GUID vs IDENTITY ? August 18, 2007

Posted by furrukhbaig in Indexes, Performance, SQL Server 2005, TSQL.
Tags: , ,
5 comments

Ahhhh… thats the hot debate among developers and database designers now a days. The system I am working on is using GUIDs on database tier and it is the backbone of middle tier framework to make every row unique (I didn’t design that).

The key issues with GUID is its very large (16 bytes). GUID’s are mainly used as PRIMARY KEY (usally CLUSTERED) to ensure uniqueness of rows.  As a rule of thumb clustered index key / primary key should be as narrow as possible (in case of primary key consider lookups for primary key column by foriegn key tables) and therefore GUID are not best suited for clustered index. Again it will cause very high fragmentation on leaf level and page split on data pages as well as index pages  (while sql 2005 has also introduced sequential guids but we are generating guids on client side). Clustered index on GUID also hurt performance for non-clustered index as NC index uses clustered key as page pointer on leaf-level pages.

There are some argument in favor of GUID and its mainly to maintain uniqueness in merge replication scenario or data warehousing scenario where data will be migrated from multiple servers and having single identity (INT) column will cause duplication. There could be several workaround for this issue. One is to create composite primary key on two INT column (one is for row id and other is for server id) to make sure uniqueness among multiple servers. other is to maintain identity ranges for multiple servers.

I just could not stress more about being carefull when using GUID column as candidate for clustered index when designing database as its night mare when it comes to performance.

The confusion starts when people with good designing skills in Middle tier start thinking for a database tier. Believe me database tier is fundamentally different from middle-tier. Every object on middle-tier does not necessarily corresponds to a table on database tier and object relationship does not corresponds to table relationship. And what about performance, there are many people who does not consider performance when desiging databases but i disagree with them and I think data-load should be primary factor, to be consider when designing dababase, among other key factors. After having a very good design with out performance and dataload consideration you will end-up redesigning your database after couple of days or month of going live with such system and hiring someone like me for performance tuning (by the way thats what i do) ;).

INDEX / Statistics Age August 17, 2007

Posted by furrukhbaig in DMV's, Indexes, Performance, SQL Server 2005, TSQL.
2 comments

When I was working with a major cable provider in UK I had a task to tune performance of a billing engine that runs over night and produced millions of rows in tables and suddenly start taking more then double time to finish.But after struggling many hours to find out why index has not been selected by optimizer I realized that optimizer has old statistics for that index that does not represent the current state of data and therefore stats needs to be updated. Production DBAs has disabled the index rebuild job by mistake that runs after billing engine inserts millions of rows. I often heard from production DBA that they don’t have enough overnight window to rebuild indexes. Which makes database developer’s life misery in fixing performance without realizing that statistics are not up-to-date. I have experienced significant performance difference after updating statistics or rebuilding indexes.

Always check your index age before start looking into execution plan. Here is some very handy query to find out age of index / statistics.  Replace Table_Name with your table name.

SELECT 
            ‘Index Name’ = ind.name,
            ‘Statistics Date’ = STATS_DATE(ind.object_id, ind.index_id)
FROM
            SYS.INDEXES ind
WHERE
            OBJECT_NAME(ind.object_id) = ‘Table_Name’

  

SEEK is better then SCAN August 17, 2007

Posted by furrukhbaig in DMV's, dm_db_index_physical_stats, dm_db_index_usage_stats, Execution Plan, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, Query hint, SEEK vs Scan, SQL Server 2005, SQL Server 2005 features, TSQL, XML Execution Plan.
add a comment

I have been involved in performance tuning for several databases and usually come across a situation where indexes has been created but due to poorly written SQL Code those indexes are not utilized or badly used by optimizer. To find out how optimizer using your index use query below. This will provide you the frequency optimizer is uses SEEK, SCAN and LOOKUP operation on indexes. 

As a rule of thumb SEEK is better then SCAN. But there are few scenarios where SCAN operator perform better then SEEK specially when large number of rows are expected to return by operation.

SELECT 
            Table_Name = OBJECT_NAME(usg.object_id),
            ind.name,
            usg.user_seeks,
            user_scans,
            user_lookups,
            last_user_seek,
            last_user_scan,
            last_user_lookup
FROM
            sys.dm_db_index_usage_stats usg
            INNER JOIN sys.indexes ind
                  ON usg.object_id = ind.object_id
                  AND usg.index_id = ind.index_id
WHERE
            — Your table list goes here
            OBJECT_NAME(usg.object_id) IN ( ‘Table_Name1’, ‘Table_Name2’ )
ORDER BY
            Table_Name       

  

Is this Index useful ? August 17, 2007

Posted by furrukhbaig in DMV's, dm_db_index_usage_stats, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, SQL Server 2005, SQL Server 2005 features, TSQL.
Tags: , , , , , , , , ,
1 comment so far

Creating indexes on tables always been a tricky question. Creating a index does not mean that optimizer will use that index to solve queries. SQL Server 2005 introduced a dynamic management view sys.dm_db_index_usage_stats to capture the use of indexes by SELECT, INSERT, UPDATE and DELETE. Thus any index or table which does not exists in this view is overhead to system and not useful.

Following query will return all the indexes and tables which never get used by SQL Server since the service started. The best way to check is to create index and run the data work load or regression test that should make use of all indexes and run following query.see full article on SQL Server Customer Advisory Team (CAT) How can SQL Server 2005 help me evaluate and manage indexes

SELECT  
            ObjectName = OBJECT_NAME(ind.object_id),
            IndexName = ind.name
FROM
            Sys.Indexes ind
            INNER JOIN Sys.Objects obj ON obj.object_id = ind.object_id
WHERE
            OBJECTPROPERTY(obj.object_id,‘IsUserTable’) = 1
            AND NOT EXISTS
            (
                  SELECT
                        1
                  FROM
                        Sys.dm_db_index_usage_stats usg
                  WHERE
                        usg.object_id = ind.object_id 
                        AND usg.index_id = ind.index_id
                        AND ind.index_id = usg.index_id
            )
ORDER BY
            ObjectName,
            IndexName
 
 

Worst Performing Queries August 17, 2007

Posted by furrukhbaig in DMV's, Indexes, Performance, SQL Server 2005, TSQL.
Tags: , , , , , ,
8 comments

I have been looking for a easy way to find worst performing queries without runing Profiler and I come across very interesting DMV’s dm_exec_query_stats and dm_exec_sql_text.Microsoft has introduced Dynamic Management Views (sys.dm_exec_query_stats, sys.dm_exec_sql_text) in SQL Server 2005 which provide a way to fetch information related to resources used by SQL Server for executing SQL statements.

Following query will return TOP 100 worst performing SQL Statement with the name of object that contain those statements e.g. Stored Procedure, Trigger and Function. Current database context will be used so change the database before executing query. The result will be sorted by CPU time in descending order.

SELECT  TOP 100
            [Object_Name] = object_name(st.objectid),
            creation_time,
            last_execution_time,
            total_cpu_time = total_worker_time / 1000,
            avg_cpu_time = (total_worker_time / execution_count) / 1000,
            min_cpu_time = min_worker_time / 1000,
            max_cpu_time = max_worker_time / 1000,
            last_cpu_time = last_worker_time / 1000,
            total_time_elapsed = total_elapsed_time / 1000 ,
            avg_time_elapsed = (total_elapsed_time / execution_count) / 1000,
            min_time_elapsed = min_elapsed_time / 1000,
            max_time_elapsed = max_elapsed_time / 1000,
            avg_physical_reads = total_physical_reads / execution_count,
            avg_logical_reads = total_logical_reads / execution_count,
            execution_count,
            SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
                  (
                        (
                              CASE statement_end_offset
                                    WHEN 1 THEN DATALENGTH(st.text)
                                    ELSE qs.statement_end_offset
                              END
                              qs.statement_start_offset
                        ) /2
                  ) + 1
            ) as statement_text
FROM
            sys.dm_exec_query_stats qs
CROSS APPLY
            sys.dm_exec_sql_text(qs.sql_handle) st
WHERE
            Object_Name(st.objectid) IS NOT NULL
            AND st.dbid = DB_ID()
ORDER BY
            db_name(st.dbid),
            total_worker_time / execution_count  DESC