GUID vs IDENTITY ? August 18, 2007Posted by furrukhbaig in Indexes, Performance, SQL Server 2005, TSQL.
Tags: clustered on guid, guid vs identity, primary key on guid
Ahhhh… thats the hot debate among developers and database designers now a days. The system I am working on is using GUIDs on database tier and it is the backbone of middle tier framework to make every row unique (I didn’t design that).
The key issues with GUID is its very large (16 bytes). GUID’s are mainly used as PRIMARY KEY (usally CLUSTERED) to ensure uniqueness of rows. As a rule of thumb clustered index key / primary key should be as narrow as possible (in case of primary key consider lookups for primary key column by foriegn key tables) and therefore GUID are not best suited for clustered index. Again it will cause very high fragmentation on leaf level and page split on data pages as well as index pages (while sql 2005 has also introduced sequential guids but we are generating guids on client side). Clustered index on GUID also hurt performance for non-clustered index as NC index uses clustered key as page pointer on leaf-level pages.
There are some argument in favor of GUID and its mainly to maintain uniqueness in merge replication scenario or data warehousing scenario where data will be migrated from multiple servers and having single identity (INT) column will cause duplication. There could be several workaround for this issue. One is to create composite primary key on two INT column (one is for row id and other is for server id) to make sure uniqueness among multiple servers. other is to maintain identity ranges for multiple servers.
I just could not stress more about being carefull when using GUID column as candidate for clustered index when designing database as its night mare when it comes to performance.
The confusion starts when people with good designing skills in Middle tier start thinking for a database tier. Believe me database tier is fundamentally different from middle-tier. Every object on middle-tier does not necessarily corresponds to a table on database tier and object relationship does not corresponds to table relationship. And what about performance, there are many people who does not consider performance when desiging databases but i disagree with them and I think data-load should be primary factor, to be consider when designing dababase, among other key factors. After having a very good design with out performance and dataload consideration you will end-up redesigning your database after couple of days or month of going live with such system and hiring someone like me for performance tuning (by the way thats what i do) ;).