jump to navigation

Rowcount for Large Tables July 24, 2008

Posted by furrukhbaig in DMV's, Performance, SQL Server 2005, SQL Server 2005 features, Tips, TSQL.
Tags: , , ,
15 comments

Ever wondered why simple statements like SELECT COUNT(*) FROM [Table_Name] takes forever to return row count on large tables? Its because it uses full table scan to count number of rows. The instant way to get the row count on any table is to query new Dynamic Management Views (DMV) in SQL Server 2005 sys.dm_db_partition_stats. DMV contains row count and page counts for any table including all the partitions. Please note even you did not create partition for your table, your table still going to be created on single default partition so this will include all the tables.

Following are some useful queries that will return row count almost instantly. In SQL Server 2000 sysindexes system table is used to get the number of rows for large table to avoid full table scan.

Comments are welcome.

— All the tables having ZERO rows

— compatable with partitioned tables

SELECT

Table_Name = object_name(object_id),

Total_Rows = SUM(st.row_count)

FROM

sys.dm_db_partition_stats st

GROUP BY

object_name(object_id)

HAVING

SUM(st.row_count) = 0

ORDER BY

object_name(object_id)

 

— Row Count without doing full table Scan

— This will include provide total number of rows in all partition (if table is partitioned)

SELECT

Total_Rows= SUM(st.row_count)

FROM

sys.dm_db_partition_stats st

WHERE

object_name(object_id) = ‘Your_Table_Name_Goes_Here’

 

AND (index_id < 2) — Only Heap or clustered index

SSMS slow startup March 7, 2008

Posted by furrukhbaig in Performance, slow start, SQL Server 2005, SSMS, Tips.
Tags: , ,
add a comment

Friend of mine asked me “why SSMS is taking up to a minute to startup” on his office machine and I was not able to figure out the problem untill I find the post on Euan Garden’s Blog.

http://blogs.msdn.com/euanga/archive/2006/07/11/662053.aspx

The quick fix is to add following lines in host file

# entry to get around diabolical Microsoft certificate checks
# which slow down non internet connected computers
127.0.0.1 crl.microsoft.com

This is to do with certificate validity check. It fix the problem with my friends machine. Hope it helps others too.

Enjoy !!

Is your code really SET based ? January 30, 2008

Posted by furrukhbaig in BEST PRACTICE, CROSS JOIN, Execution Plan, Performance, Performance Tuning, RBAR, SET BASED, SQL Server 2005, SQL Server 2005 features, Tips, Triangular Join, TSQL.
Tags: , , ,
add a comment

Its a well known fact that best practice is to write the set based code to get better performance. While there is no absolute defination of set based and many people think that set based code is anything except CURSORS and LOOPs. Believe me that is not true.

I have been thinking to write about this topic for a while and just today i have come across very usefull article that explain exactly what I was trying to say. It also explains about Triangular and Cross Joins and a new (for me atleast) terminology ‘RBAR’. Its interesting.

http://www.sqlservercentral.com/articles/T-SQL/61539/

Enjoy !!

Find out whats running on SQL Server 2005 December 13, 2007

Posted by furrukhbaig in DMV's, Execution Plan, Optimize, Performance, Performance Tuning, Profiler, Queries running on the server, SQL Server 2005, SQL Server 2005 features, XML Execution Plan.
Tags: , , , ,
2 comments

Everyone wants to know whats running on the box. Specially if your job is to stabalise server you always concern what is killing the box.

A Friend of mine has published very usefull post with scripts to find out whats running on the SQL Server with their execution plan.

http://www.proteanit.com/b/2007/01/22/a-useful-script-to-analyse-current-activity-on-your-box/

The same can also be managed if you run profiler and capture XML execution plan which is not always possible due to security issue and overhead of profiler itself.

Index Fragmentation and dm_db_index_physical_stats December 12, 2007

Posted by furrukhbaig in DMV's, dm_db_index_physical_stats, Fragmentation, Index Fragmentation, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, SQL Server 2005, SQL Server 2005 features, Tips.
Tags: , , , , , ,
1 comment so far

Fragmentation can cause serious performance issues and its vital to understand how this can impact performance before I tell you how to fix it.

Fragmentation can cause slow running queries due to more disk IO then usual which itself cause by page spliting. Fragmentation not only cause query performance it can also slow down write operation such as INSERT, DELETE and UPDATE.

SQL Server write data on 8k data pages. This limit can be reached very quickly, for wide tables storing lots of varchar/nvarchar columns (nvarchar take 2 bytes to store 1 character), will cause request for more data pages. When SQL Sever uses whole page then it occupy another page to store more data. If table has got cluster index then data needs to be stored in sorted order as cluster index ensure physical order of data according to the cluster key and thats why you can only have one clustered index per table.

Now assume you have compound clustered index on CountryID and CustomerNo colums on a transactional table which recieve 10 transactions per minute for an online car rental portal. Clustered index will make sure that data is stored in sorted order on CountryID and CustomerNo. Having millions of rows in table consider what will happen when you recieve transaction row which needs to be inserted on page 1 (CountryId = 1, CustomerId = 100) . Above will require to make some room on 1st data page (depending on FILLFACTOR, default is 100%). If require data page has not got enough room for the row to fit then rows on the data pages needs to be move forward (called PageSplit) in order to make some room for new row to fit on page in sorted order. This can cause lot of page split and fragmentation as new pages required (caused by page splits on existing data pages)  can not be guarantee to be in sequence with other page. This can seriously hurt performance for data modification (due to pagesplit) and data read (due to out of order pages) operations.

dm_db_index_physical_stats can be used to check fragmentation.

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(N’YourDB’), OBJECT_ID(N’dbo.CarRental’), NULL, NULL , ‘DETAILED’);

The above SQL will return fragmentation information for given Database name (Parameter 1) and Table Name (Parameter 2). Last parameter suggest the mode of analysis.

It is good practice to create clustered indexes on incrementing value (e.g. IDENTITY) to avoid page splits and out of order pages.

Statement Level Recompile August 22, 2007

Posted by furrukhbaig in Performance, RECOMPILE hint, SQL Server 2005, stored procedure, TSQL.
Tags: , , ,
1 comment so far

In SQL Server 2000, the unit of compilation was the whole stored procedure. So even if you wanted just one particular query to be recompiled, you couldn’t request it. If you created the stored procedure with the RECOMPILE option, the whole procedure went through recompilation every time you invoked it.SQL Server 2005 supports statement-level recompile. Instead of having all queries in the stored procedure recompiled, SQL Server can now recompile individual statements. You’re provided with a new RECOMPILE query hint that allows you to explicitly request a recompilation of a particular query. This way, other queries can benefit from reusing previously cached execution plans if there’s no reason to recompile them every time the stored procedure is invoked.

Parameter Sniffing and OPTIMIZE FOR August 22, 2007

Posted by furrukhbaig in Optimize, Parameter Sniffing, Performance, Query hint, SQL Server 2005, stored procedure, TSQL.
Tags: , , , , ,
14 comments

Parameter Sniffing refers to a process whereby SQL Server’s execution environment “sniffs” the parameter values during first invocation, and passes it along to the query optimizer so that they can be used to generate optimal query execution plans.

“First invocation” also refers to the first invocation after a plan was removed from cache for lack of reuse or for any other reason. The optimizer “knows” what the values of the input parameters are, and it generates an adequate plan for those inputs parameters. SQL Server internally maintains the statistics and distribution of the values in the columns used for filtering.

While parameter sniffing is certainly a powerful feature, it can cause problems when a procedure’s plan happens to have been kicked out of the procedure cache (or was never in it) just prior to the procedure being called with atypical parameter values. This can result in a plan that is skewed toward atypical use, one that is not optimal when called with typical values. Since, once cached, a query plan can be reused for parameter values that vary widely, the ideal situation is to have a plan in the cache that covers the typical usage of the procedure as much as possible. If a plan makes it into the cache that is oriented toward atypical parameter values, it can have a devastating effect on performance when executed with typical values.

An example would probably help here. Suppose we had a stored procedure that returns sales data by country. In our case, three-fourths of our sales is in the UK. The procedure takes a single parameter, @country, indicating the country for which to return sales info. It uses this parameter to filter a simple SELECT statement that returns the requested sales data.

CREATE PROCEDURE uspGetCountrySale
(@Country Varchar(50))
AS 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @Country 
GO

 The optimizer would most likely to choose to do a clustered index scan when creating execution plan for this query because (given that “UK” would normally be passed in for @country) so much of the table would be traversed anyway that scanning it would require less I/O and be faster than repeated nonclustered index lookups. However, what happens if the plan happens to have been kicked out of the cache (let’s say due to an auto-statistics update) just prior to a user calling it with, say, “Spain”, where we have almost no sales? Assuming a suitable index exists, the optimizer may decide to use a nonclustered index seek in the new query plan. Subsequent executions of the procedure would reuse this plan, even if they passed in “UK” for @country. This could result in performance that is very slower than the scan-based plan.

As a workaround prior to SQL Server 2005, local variables can be used instead of stored procedure parameters. Please note SQL Server can not sniff the value of local variable. This will lead SQL Server to use statistics on filter column and create a plan which is best for average values in that column. This can also lead to serious performance when same procedure called with atypical value but will do best for typical values.

CREATE PROCEDURE uspGetCountrySale
(@Country Varchar(50))
AS 
DECLARE @_Country Varchar(20)
SET @_Country = @Country 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @_Country 
GO
 
 

There’s a new query hint provided in SQL Server 2005 to tackle the problem—the OPTIMIZE FOR query hint. This hint allows you to provide SQL Server with a literal that reflects the selectivity of the variable, in case the input is typical. For example, if you know that the variable will typically end up with a highly selective value, you can provide the literal which reflects the typical value. for example.

 
CREATE PROCEDURE uspGetCountrySale (@Country Varchar(50))
AS 
DECLARE @_Country Varchar(20)
SET @_Country = @Country 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @_Country
OPTION (OPTIMIZE FOR(@Country = ‘UK’)); 
GO

 

 

Stored Procedures !! Some Facts you should know August 22, 2007

Posted by furrukhbaig in Performance, SQL Server 2005, TSQL.
Tags: , ,
8 comments

Most of us are familliar with the stored procedures and use them on daily bases. But there are some facts which can cause some performance issues and here I just want to shed some light on them.

  • Always use qualified name when calling sprocs for example EXEC dbo.Your_Proc

This is very common mistake which cause an extra trip to procedure cache to get execution plan for execution. SQL Server compiles stored procedure on its first execution and store the execution plan in procedure cache to be reuse in subsequent call for same sproc. In order to get the execution plan it require qualified stored procedure name e.g. dbo.My_Proc (fully qualified name contain Server.Database.Owner.My_Proc). When owner name is not specified then initial cache lookup by object name fails as owner name was not specified. SQL Server then acquire exclusive compile lock on stored procedure and all the referenced objects including tables for recompilation. Next step SQL Server will resolve the object name to a objectID and before compilation it makes another trip to procedure cache by using object id that can result finding previous exection plan. But as you noticed this can cause blocking in certain situation where many SPIDs are calling same sproc frequently and while its is lock for compilation all the caller have to wait until sql server find the execution plan in cache or recompile the sproc.

So the rule of thumb is Always qualify your objects (sproc, tables, views, functions) with owner name.

This behaviour can be captured using profiler by capturing following events.

The SP:CacheMiss event occurs when the cache lookup by name fails. The subsequent SP:ExecContextHit indicates that a matching cached plan was ultimately found in cache once the ambiguous object name had been resolved to an object ID. Depending on the circumstances, SP:CacheHit may appear in place of SP:ExecContextHit.

for more detail see http://support.microsoft.com/kb/263889

  • SET NOCOUNT ON

This can cause extra network traffic and can have some serious impact on performance when sproc get called frequently.

  • Don’t use sp_ prefix in stored procedure name

As sp_ prefix is reserved for system stored procedure and any stored procedure which has sp_ prefix will cause an extra lookup in MASTER database. There is another point to note that if a stored procedure uses same name, in user database as system stored procedure in master database, the stored procedure in user database will never get executed as SQL Server will always look first in master database and will execute that one rather one in user database.

  • Avoid using temp tables and DDL statements

This can cause stored procedure recompile when temp table, created in stored procedure, get referenced first time within stored procedure. Due to non-existense of statistics optimizer will not be able to reuse or create execution plan for queries using temp table which is created within stored procedure. Same applies on DDL statement as they also force stored procedure to recompile.

When a batch is recompiled in SQL Server 2000, all of the statements in the batch are recompiled, not just the one that triggered the recompilation. SQL Server 2005 improves upon this behavior by compiling only the statement that caused the recompilation, not the entire batch. This “statement-level recompilation” feature will improve SQL Server 2005’s recompilation behavior when compared to that of SQL Server 2000. In particular, SQL Server 2005 spends less CPU time and memory during batch recompilations, and obtains fewer compile locks.

AUTO_UPDATE_STATISTICS and AUTO_UPDATE_STATISTICS_ASYNC August 22, 2007

Posted by furrukhbaig in Indexes, Performance, SQL Server 2005.
Tags: , , ,
2 comments

Many of us are familiar with the AUTO_UPDATE_STATISTICS database option. It provides the Optimizer with up-to-date index and column statistics to achieve optimum query plans queries execution.

Everything comes up with a price. There are some pros and cons of using that option and choice should be made very carefully otherwise it can seriously hurt performance. For example, the AUTO_UPDATE_STATISTICS option, when turned on (which it is by default for a new database), tells the Query Optimizer to automatically update the index and column statistics as data changes. It is vital because data changes can affect Optimizer decision in selecting optimum query plan. Statistics are automatically updated when the statistics used in a query execution plan are outdated. Statistics are considered outdated when 20% or more rows in a table have changed.

So what it means to me is when Optimizer consider statistics need to be updated it does it right there and then recompile the query to use updated statistics and create new execution plan and execute the query. Most of the time this process is not noticeable but in some scenarios it can hinder query performance and might cause query timeouts.

For large tables that have many indexes and statistics on columns it can take significant amount of time. Query has to wait during that time to be recompiled using new statistics and then time to execute that query plan. This can cause serious delay and can happen on periodic bases for example query usually takes 2 second but some time it takes more the 10 seconds. And believe me it will take you some time before realizing that what’s going on.

Another point to consider is it is unlikely that up-to-date statistics will change execution plan as queries should be written in a manner so they can make use of available indexes and cached for reuse. I would not expect my stored procedure to keep changing its execution plan because it can hinder performance and make no use of cached execution plan. But in some cases, with dynamic SQL, this will help improve performance. So use it with care.

The new feature in 2005 AUTO_UPDATE_STATISTICS_ASYNC can be used to avoid query wait for updated statistics. It enables optimizer to carry on using old statistics while statistics are updated asynchronously and made available for next execution. Keep in mind this can also make queries slower as it uses old statistics without any wait and there is a possibility that changes in data was significant and optimizer should have updated statistics to get optimal execution plan. But in most cases this should work fine. As I mentioned before that its unlikely that execution plan changes for regularly running queries unless columns in ORDER BY and WHERE clause keep changing.

So Recommendation is to use both options ON to be on safer side but always test this with workload in your scenario.

GUID vs IDENTITY ? August 18, 2007

Posted by furrukhbaig in Indexes, Performance, SQL Server 2005, TSQL.
Tags: , ,
5 comments

Ahhhh… thats the hot debate among developers and database designers now a days. The system I am working on is using GUIDs on database tier and it is the backbone of middle tier framework to make every row unique (I didn’t design that).

The key issues with GUID is its very large (16 bytes). GUID’s are mainly used as PRIMARY KEY (usally CLUSTERED) to ensure uniqueness of rows.  As a rule of thumb clustered index key / primary key should be as narrow as possible (in case of primary key consider lookups for primary key column by foriegn key tables) and therefore GUID are not best suited for clustered index. Again it will cause very high fragmentation on leaf level and page split on data pages as well as index pages  (while sql 2005 has also introduced sequential guids but we are generating guids on client side). Clustered index on GUID also hurt performance for non-clustered index as NC index uses clustered key as page pointer on leaf-level pages.

There are some argument in favor of GUID and its mainly to maintain uniqueness in merge replication scenario or data warehousing scenario where data will be migrated from multiple servers and having single identity (INT) column will cause duplication. There could be several workaround for this issue. One is to create composite primary key on two INT column (one is for row id and other is for server id) to make sure uniqueness among multiple servers. other is to maintain identity ranges for multiple servers.

I just could not stress more about being carefull when using GUID column as candidate for clustered index when designing database as its night mare when it comes to performance.

The confusion starts when people with good designing skills in Middle tier start thinking for a database tier. Believe me database tier is fundamentally different from middle-tier. Every object on middle-tier does not necessarily corresponds to a table on database tier and object relationship does not corresponds to table relationship. And what about performance, there are many people who does not consider performance when desiging databases but i disagree with them and I think data-load should be primary factor, to be consider when designing dababase, among other key factors. After having a very good design with out performance and dataload consideration you will end-up redesigning your database after couple of days or month of going live with such system and hiring someone like me for performance tuning (by the way thats what i do) ;).