jump to navigation

Rowcount for Large Tables July 24, 2008

Posted by furrukhbaig in DMV's, Performance, SQL Server 2005, SQL Server 2005 features, Tips, TSQL.
Tags: , , ,
15 comments

Ever wondered why simple statements like SELECT COUNT(*) FROM [Table_Name] takes forever to return row count on large tables? Its because it uses full table scan to count number of rows. The instant way to get the row count on any table is to query new Dynamic Management Views (DMV) in SQL Server 2005 sys.dm_db_partition_stats. DMV contains row count and page counts for any table including all the partitions. Please note even you did not create partition for your table, your table still going to be created on single default partition so this will include all the tables.

Following are some useful queries that will return row count almost instantly. In SQL Server 2000 sysindexes system table is used to get the number of rows for large table to avoid full table scan.

Comments are welcome.

— All the tables having ZERO rows

— compatable with partitioned tables

SELECT

Table_Name = object_name(object_id),

Total_Rows = SUM(st.row_count)

FROM

sys.dm_db_partition_stats st

GROUP BY

object_name(object_id)

HAVING

SUM(st.row_count) = 0

ORDER BY

object_name(object_id)

 

— Row Count without doing full table Scan

— This will include provide total number of rows in all partition (if table is partitioned)

SELECT

Total_Rows= SUM(st.row_count)

FROM

sys.dm_db_partition_stats st

WHERE

object_name(object_id) = ‘Your_Table_Name_Goes_Here’

 

AND (index_id < 2) — Only Heap or clustered index

Is your code really SET based ? January 30, 2008

Posted by furrukhbaig in BEST PRACTICE, CROSS JOIN, Execution Plan, Performance, Performance Tuning, RBAR, SET BASED, SQL Server 2005, SQL Server 2005 features, Tips, Triangular Join, TSQL.
Tags: , , ,
add a comment

Its a well known fact that best practice is to write the set based code to get better performance. While there is no absolute defination of set based and many people think that set based code is anything except CURSORS and LOOPs. Believe me that is not true.

I have been thinking to write about this topic for a while and just today i have come across very usefull article that explain exactly what I was trying to say. It also explains about Triangular and Cross Joins and a new (for me atleast) terminology ‘RBAR’. Its interesting.

http://www.sqlservercentral.com/articles/T-SQL/61539/

Enjoy !!

Statement Level Recompile August 22, 2007

Posted by furrukhbaig in Performance, RECOMPILE hint, SQL Server 2005, stored procedure, TSQL.
Tags: , , ,
1 comment so far

In SQL Server 2000, the unit of compilation was the whole stored procedure. So even if you wanted just one particular query to be recompiled, you couldn’t request it. If you created the stored procedure with the RECOMPILE option, the whole procedure went through recompilation every time you invoked it.SQL Server 2005 supports statement-level recompile. Instead of having all queries in the stored procedure recompiled, SQL Server can now recompile individual statements. You’re provided with a new RECOMPILE query hint that allows you to explicitly request a recompilation of a particular query. This way, other queries can benefit from reusing previously cached execution plans if there’s no reason to recompile them every time the stored procedure is invoked.

Parameter Sniffing and OPTIMIZE FOR August 22, 2007

Posted by furrukhbaig in Optimize, Parameter Sniffing, Performance, Query hint, SQL Server 2005, stored procedure, TSQL.
Tags: , , , , ,
14 comments

Parameter Sniffing refers to a process whereby SQL Server’s execution environment “sniffs” the parameter values during first invocation, and passes it along to the query optimizer so that they can be used to generate optimal query execution plans.

“First invocation” also refers to the first invocation after a plan was removed from cache for lack of reuse or for any other reason. The optimizer “knows” what the values of the input parameters are, and it generates an adequate plan for those inputs parameters. SQL Server internally maintains the statistics and distribution of the values in the columns used for filtering.

While parameter sniffing is certainly a powerful feature, it can cause problems when a procedure’s plan happens to have been kicked out of the procedure cache (or was never in it) just prior to the procedure being called with atypical parameter values. This can result in a plan that is skewed toward atypical use, one that is not optimal when called with typical values. Since, once cached, a query plan can be reused for parameter values that vary widely, the ideal situation is to have a plan in the cache that covers the typical usage of the procedure as much as possible. If a plan makes it into the cache that is oriented toward atypical parameter values, it can have a devastating effect on performance when executed with typical values.

An example would probably help here. Suppose we had a stored procedure that returns sales data by country. In our case, three-fourths of our sales is in the UK. The procedure takes a single parameter, @country, indicating the country for which to return sales info. It uses this parameter to filter a simple SELECT statement that returns the requested sales data.

CREATE PROCEDURE uspGetCountrySale
(@Country Varchar(50))
AS 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @Country 
GO

 The optimizer would most likely to choose to do a clustered index scan when creating execution plan for this query because (given that “UK” would normally be passed in for @country) so much of the table would be traversed anyway that scanning it would require less I/O and be faster than repeated nonclustered index lookups. However, what happens if the plan happens to have been kicked out of the cache (let’s say due to an auto-statistics update) just prior to a user calling it with, say, “Spain”, where we have almost no sales? Assuming a suitable index exists, the optimizer may decide to use a nonclustered index seek in the new query plan. Subsequent executions of the procedure would reuse this plan, even if they passed in “UK” for @country. This could result in performance that is very slower than the scan-based plan.

As a workaround prior to SQL Server 2005, local variables can be used instead of stored procedure parameters. Please note SQL Server can not sniff the value of local variable. This will lead SQL Server to use statistics on filter column and create a plan which is best for average values in that column. This can also lead to serious performance when same procedure called with atypical value but will do best for typical values.

CREATE PROCEDURE uspGetCountrySale
(@Country Varchar(50))
AS 
DECLARE @_Country Varchar(20)
SET @_Country = @Country 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @_Country 
GO
 
 

There’s a new query hint provided in SQL Server 2005 to tackle the problem—the OPTIMIZE FOR query hint. This hint allows you to provide SQL Server with a literal that reflects the selectivity of the variable, in case the input is typical. For example, if you know that the variable will typically end up with a highly selective value, you can provide the literal which reflects the typical value. for example.

 
CREATE PROCEDURE uspGetCountrySale (@Country Varchar(50))
AS 
DECLARE @_Country Varchar(20)
SET @_Country = @Country 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @_Country
OPTION (OPTIMIZE FOR(@Country = ‘UK’)); 
GO

 

 

Stored Procedures !! Some Facts you should know August 22, 2007

Posted by furrukhbaig in Performance, SQL Server 2005, TSQL.
Tags: , ,
8 comments

Most of us are familliar with the stored procedures and use them on daily bases. But there are some facts which can cause some performance issues and here I just want to shed some light on them.

  • Always use qualified name when calling sprocs for example EXEC dbo.Your_Proc

This is very common mistake which cause an extra trip to procedure cache to get execution plan for execution. SQL Server compiles stored procedure on its first execution and store the execution plan in procedure cache to be reuse in subsequent call for same sproc. In order to get the execution plan it require qualified stored procedure name e.g. dbo.My_Proc (fully qualified name contain Server.Database.Owner.My_Proc). When owner name is not specified then initial cache lookup by object name fails as owner name was not specified. SQL Server then acquire exclusive compile lock on stored procedure and all the referenced objects including tables for recompilation. Next step SQL Server will resolve the object name to a objectID and before compilation it makes another trip to procedure cache by using object id that can result finding previous exection plan. But as you noticed this can cause blocking in certain situation where many SPIDs are calling same sproc frequently and while its is lock for compilation all the caller have to wait until sql server find the execution plan in cache or recompile the sproc.

So the rule of thumb is Always qualify your objects (sproc, tables, views, functions) with owner name.

This behaviour can be captured using profiler by capturing following events.

The SP:CacheMiss event occurs when the cache lookup by name fails. The subsequent SP:ExecContextHit indicates that a matching cached plan was ultimately found in cache once the ambiguous object name had been resolved to an object ID. Depending on the circumstances, SP:CacheHit may appear in place of SP:ExecContextHit.

for more detail see http://support.microsoft.com/kb/263889

  • SET NOCOUNT ON

This can cause extra network traffic and can have some serious impact on performance when sproc get called frequently.

  • Don’t use sp_ prefix in stored procedure name

As sp_ prefix is reserved for system stored procedure and any stored procedure which has sp_ prefix will cause an extra lookup in MASTER database. There is another point to note that if a stored procedure uses same name, in user database as system stored procedure in master database, the stored procedure in user database will never get executed as SQL Server will always look first in master database and will execute that one rather one in user database.

  • Avoid using temp tables and DDL statements

This can cause stored procedure recompile when temp table, created in stored procedure, get referenced first time within stored procedure. Due to non-existense of statistics optimizer will not be able to reuse or create execution plan for queries using temp table which is created within stored procedure. Same applies on DDL statement as they also force stored procedure to recompile.

When a batch is recompiled in SQL Server 2000, all of the statements in the batch are recompiled, not just the one that triggered the recompilation. SQL Server 2005 improves upon this behavior by compiling only the statement that caused the recompilation, not the entire batch. This “statement-level recompilation” feature will improve SQL Server 2005’s recompilation behavior when compared to that of SQL Server 2000. In particular, SQL Server 2005 spends less CPU time and memory during batch recompilations, and obtains fewer compile locks.

GUID vs IDENTITY ? August 18, 2007

Posted by furrukhbaig in Indexes, Performance, SQL Server 2005, TSQL.
Tags: , ,
5 comments

Ahhhh… thats the hot debate among developers and database designers now a days. The system I am working on is using GUIDs on database tier and it is the backbone of middle tier framework to make every row unique (I didn’t design that).

The key issues with GUID is its very large (16 bytes). GUID’s are mainly used as PRIMARY KEY (usally CLUSTERED) to ensure uniqueness of rows.  As a rule of thumb clustered index key / primary key should be as narrow as possible (in case of primary key consider lookups for primary key column by foriegn key tables) and therefore GUID are not best suited for clustered index. Again it will cause very high fragmentation on leaf level and page split on data pages as well as index pages  (while sql 2005 has also introduced sequential guids but we are generating guids on client side). Clustered index on GUID also hurt performance for non-clustered index as NC index uses clustered key as page pointer on leaf-level pages.

There are some argument in favor of GUID and its mainly to maintain uniqueness in merge replication scenario or data warehousing scenario where data will be migrated from multiple servers and having single identity (INT) column will cause duplication. There could be several workaround for this issue. One is to create composite primary key on two INT column (one is for row id and other is for server id) to make sure uniqueness among multiple servers. other is to maintain identity ranges for multiple servers.

I just could not stress more about being carefull when using GUID column as candidate for clustered index when designing database as its night mare when it comes to performance.

The confusion starts when people with good designing skills in Middle tier start thinking for a database tier. Believe me database tier is fundamentally different from middle-tier. Every object on middle-tier does not necessarily corresponds to a table on database tier and object relationship does not corresponds to table relationship. And what about performance, there are many people who does not consider performance when desiging databases but i disagree with them and I think data-load should be primary factor, to be consider when designing dababase, among other key factors. After having a very good design with out performance and dataload consideration you will end-up redesigning your database after couple of days or month of going live with such system and hiring someone like me for performance tuning (by the way thats what i do) ;).

INDEX / Statistics Age August 17, 2007

Posted by furrukhbaig in DMV's, Indexes, Performance, SQL Server 2005, TSQL.
2 comments

When I was working with a major cable provider in UK I had a task to tune performance of a billing engine that runs over night and produced millions of rows in tables and suddenly start taking more then double time to finish.But after struggling many hours to find out why index has not been selected by optimizer I realized that optimizer has old statistics for that index that does not represent the current state of data and therefore stats needs to be updated. Production DBAs has disabled the index rebuild job by mistake that runs after billing engine inserts millions of rows. I often heard from production DBA that they don’t have enough overnight window to rebuild indexes. Which makes database developer’s life misery in fixing performance without realizing that statistics are not up-to-date. I have experienced significant performance difference after updating statistics or rebuilding indexes.

Always check your index age before start looking into execution plan. Here is some very handy query to find out age of index / statistics.  Replace Table_Name with your table name.

SELECT 
            ‘Index Name’ = ind.name,
            ‘Statistics Date’ = STATS_DATE(ind.object_id, ind.index_id)
FROM
            SYS.INDEXES ind
WHERE
            OBJECT_NAME(ind.object_id) = ‘Table_Name’

  

SEEK is better then SCAN August 17, 2007

Posted by furrukhbaig in DMV's, dm_db_index_physical_stats, dm_db_index_usage_stats, Execution Plan, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, Query hint, SEEK vs Scan, SQL Server 2005, SQL Server 2005 features, TSQL, XML Execution Plan.
add a comment

I have been involved in performance tuning for several databases and usually come across a situation where indexes has been created but due to poorly written SQL Code those indexes are not utilized or badly used by optimizer. To find out how optimizer using your index use query below. This will provide you the frequency optimizer is uses SEEK, SCAN and LOOKUP operation on indexes. 

As a rule of thumb SEEK is better then SCAN. But there are few scenarios where SCAN operator perform better then SEEK specially when large number of rows are expected to return by operation.

SELECT 
            Table_Name = OBJECT_NAME(usg.object_id),
            ind.name,
            usg.user_seeks,
            user_scans,
            user_lookups,
            last_user_seek,
            last_user_scan,
            last_user_lookup
FROM
            sys.dm_db_index_usage_stats usg
            INNER JOIN sys.indexes ind
                  ON usg.object_id = ind.object_id
                  AND usg.index_id = ind.index_id
WHERE
            — Your table list goes here
            OBJECT_NAME(usg.object_id) IN ( ‘Table_Name1’, ‘Table_Name2’ )
ORDER BY
            Table_Name       

  

Is this Index useful ? August 17, 2007

Posted by furrukhbaig in DMV's, dm_db_index_usage_stats, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, SQL Server 2005, SQL Server 2005 features, TSQL.
Tags: , , , , , , , , ,
1 comment so far

Creating indexes on tables always been a tricky question. Creating a index does not mean that optimizer will use that index to solve queries. SQL Server 2005 introduced a dynamic management view sys.dm_db_index_usage_stats to capture the use of indexes by SELECT, INSERT, UPDATE and DELETE. Thus any index or table which does not exists in this view is overhead to system and not useful.

Following query will return all the indexes and tables which never get used by SQL Server since the service started. The best way to check is to create index and run the data work load or regression test that should make use of all indexes and run following query.see full article on SQL Server Customer Advisory Team (CAT) How can SQL Server 2005 help me evaluate and manage indexes

SELECT  
            ObjectName = OBJECT_NAME(ind.object_id),
            IndexName = ind.name
FROM
            Sys.Indexes ind
            INNER JOIN Sys.Objects obj ON obj.object_id = ind.object_id
WHERE
            OBJECTPROPERTY(obj.object_id,‘IsUserTable’) = 1
            AND NOT EXISTS
            (
                  SELECT
                        1
                  FROM
                        Sys.dm_db_index_usage_stats usg
                  WHERE
                        usg.object_id = ind.object_id 
                        AND usg.index_id = ind.index_id
                        AND ind.index_id = usg.index_id
            )
ORDER BY
            ObjectName,
            IndexName
 
 

Worst Performing Queries August 17, 2007

Posted by furrukhbaig in DMV's, Indexes, Performance, SQL Server 2005, TSQL.
Tags: , , , , , ,
8 comments

I have been looking for a easy way to find worst performing queries without runing Profiler and I come across very interesting DMV’s dm_exec_query_stats and dm_exec_sql_text.Microsoft has introduced Dynamic Management Views (sys.dm_exec_query_stats, sys.dm_exec_sql_text) in SQL Server 2005 which provide a way to fetch information related to resources used by SQL Server for executing SQL statements.

Following query will return TOP 100 worst performing SQL Statement with the name of object that contain those statements e.g. Stored Procedure, Trigger and Function. Current database context will be used so change the database before executing query. The result will be sorted by CPU time in descending order.

SELECT  TOP 100
            [Object_Name] = object_name(st.objectid),
            creation_time,
            last_execution_time,
            total_cpu_time = total_worker_time / 1000,
            avg_cpu_time = (total_worker_time / execution_count) / 1000,
            min_cpu_time = min_worker_time / 1000,
            max_cpu_time = max_worker_time / 1000,
            last_cpu_time = last_worker_time / 1000,
            total_time_elapsed = total_elapsed_time / 1000 ,
            avg_time_elapsed = (total_elapsed_time / execution_count) / 1000,
            min_time_elapsed = min_elapsed_time / 1000,
            max_time_elapsed = max_elapsed_time / 1000,
            avg_physical_reads = total_physical_reads / execution_count,
            avg_logical_reads = total_logical_reads / execution_count,
            execution_count,
            SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
                  (
                        (
                              CASE statement_end_offset
                                    WHEN 1 THEN DATALENGTH(st.text)
                                    ELSE qs.statement_end_offset
                              END
                              qs.statement_start_offset
                        ) /2
                  ) + 1
            ) as statement_text
FROM
            sys.dm_exec_query_stats qs
CROSS APPLY
            sys.dm_exec_sql_text(qs.sql_handle) st
WHERE
            Object_Name(st.objectid) IS NOT NULL
            AND st.dbid = DB_ID()
ORDER BY
            db_name(st.dbid),
            total_worker_time / execution_count  DESC