jump to navigation

Installing Oracle Oledb Provider for SQL Server 2008 R2 September 3, 2012

Posted by furrukhbaig in Oracle Oledb Provider, SQL Server 2008 R2, SSIS.
Tags: , , ,
add a comment

I have found this very useful post to solve classic problem of getting data from oracle using SSIS as BID needs 32 bit driver for design time and runtime require 64 bit (on 64 bit machines).

http://sqlblog.com/blogs/jorg_klein/archive/2011/06/09/ssis-connect-to-oracle-on-a-64-bit-machine.aspx

I myself will try to do it as need to configure SQL Server 2008 R2 machine to pull data from Oracle database using SSIS. I am bit exicited as in past had tons of issues with Oracle connectivity from SSIS.

Processing poorly formatted large text file with SSIS February 28, 2012

Posted by furrukhbaig in SQL Server 2005, SSIS.
Tags: , , ,
add a comment

I have been working on a project where I am processing large comma seperated text files (2 GB, containing 20 million rows) having comma as a value in Last column. Using flat file data source does not solve the problem alone and I had to use script component to cater the open format value in last column which can contain comma within the value.

First you would need to define a connection as per following connection dialogue.

 Image

Then click on columns on left pane. The trick here is to select any thing other than CRLF as COLUMN DELIMITER and go back to first screen by clicking General and come back to this screen.

You will only see one column containing whole row. Please note you would need to move away from this screen after selecting another delimeter in order to see on column with whole row.

 Image

 Now click on Advance and name your column anything you like, but I am using column name as Line as it contains whole line from the text file. Make sure you select String (STR) or Unicode String (WSTR) data type with maximum possible row length.

Image

Now drag Data Flow Task on control Flow and double click on it

Now drag Flat File Source component on Data Flow designer

 Select the connection for the Flat File Source component that we have created in steps above.

Flat File Source should only show one column as per connection manager.

Now Drag Script Component on Data Flow Designer and it will show you the dialogue box to select how this script component going to work. Select Transformation here.

Click on Inputs and Outputs. Input should already have one column as defined in Flat File Source. Add columns for Output. I have added 11 columns and for demonstration purpose change the name of columns. Note “Untranslatable” column that will contain open format value. Be careful with column data type and allow enough length as per values expected.

 Image

Now its time to write some code in script component.

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        char[] delimiter = “,”.ToCharArray();

        string[] tokenArray = Row.Line.Split(delimiter, 11);

        Row.Column1= tokenArray[0];
        Row.Column2= tokenArray[1];
        Row.Column3= tokenArray[2];
        Row.Column4= tokenArray[3];
        Row.Column5= tokenArray[4];
        Row.Column6= tokenArray[5];
        Row.Column7= tokenArray[6];
        Row.Description1 = tokenArray[7];
        Row.Description2 = tokenArray[8];
        Row.Description3 = tokenArray[9];
        Row.Untranslatable = tokenArray[10];
    }

 Now let me say that I am not the expert of C Sharp or .Net but I know how to get the work done ;)

The tricky part is SPLIT function which does the magic. Note the second parameter for no of array element expected (no of column expected). This is vital otherwise SPLIT will create variable no of array element depending on number of delimiter occurance in Line.

Our target is to get 11 column and if delimiter occur more than 10 time in one row it will append rest of the line in last column.

now you can do the OLE DB destination. your data flow should look like screen below.

Image

 I would appriciate any comments but this code processed 2 GB text file in less than a minute. loading 20 million rows in table.

Thanks

Who is Active ? October 25, 2010

Posted by furrukhbaig in DMV's, Execution Plan, Optimize, Queries running on the server, sp_who, sp_who2, sp_whoisactive, system monitoring, Tips, XML Execution Plan.
add a comment

Adam Machanic has released new build of system monitoring stored procedure SP_WhoIsActive. Its really useful and I would like to congratulate him for his efforts.

check out the link below for his original post and to download the code.

http://sqlblog.com/blogs/adam_machanic/archive/2010/10/21/who-is-active-v10-00-dmv-monitoring-made-easy.aspx

Enjoy !!

GO – Run a batch multiple times March 26, 2009

Posted by furrukhbaig in SQL Server 2005.
Tags: , , , , ,
1 comment so far

Today I have come across the blog of Kalen Delaney using an undocumented feature of GO statement (batch terminator).

You can run a batch multiple time without re-runing batch or using WHILE loop.

Following code is taken from the original post. see the post below.

So, to run the INSERT 5 times, I would do this:

INSERT INTO details
(SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID,
UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate)
        SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID,
             UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate
        FROM Sales.SalesOrderDetail
GO 5

http://sqlblog.com/blogs/kalen_delaney/archive/2007/07/22/did-you-know-run-a-batch-multiple-times.aspx

Rowcount for Large Tables July 24, 2008

Posted by furrukhbaig in DMV's, Performance, SQL Server 2005, SQL Server 2005 features, Tips, TSQL.
Tags: , , ,
15 comments

Ever wondered why simple statements like SELECT COUNT(*) FROM [Table_Name] takes forever to return row count on large tables? Its because it uses full table scan to count number of rows. The instant way to get the row count on any table is to query new Dynamic Management Views (DMV) in SQL Server 2005 sys.dm_db_partition_stats. DMV contains row count and page counts for any table including all the partitions. Please note even you did not create partition for your table, your table still going to be created on single default partition so this will include all the tables.

Following are some useful queries that will return row count almost instantly. In SQL Server 2000 sysindexes system table is used to get the number of rows for large table to avoid full table scan.

Comments are welcome.

– All the tables having ZERO rows

– compatable with partitioned tables

SELECT

Table_Name = object_name(object_id),

Total_Rows = SUM(st.row_count)

FROM

sys.dm_db_partition_stats st

GROUP BY

object_name(object_id)

HAVING

SUM(st.row_count) = 0

ORDER BY

object_name(object_id)

 

– Row Count without doing full table Scan

– This will include provide total number of rows in all partition (if table is partitioned)

SELECT

Total_Rows= SUM(st.row_count)

FROM

sys.dm_db_partition_stats st

WHERE

object_name(object_id) = ‘Your_Table_Name_Goes_Here’

 

AND (index_id < 2) – Only Heap or clustered index

SSMS slow startup March 7, 2008

Posted by furrukhbaig in Performance, slow start, SQL Server 2005, SSMS, Tips.
add a comment

Friend of mine asked me “why SSMS is taking up to a minute to startup” on his office machine and I was not able to figure out the problem untill I find the post on Euan Garden’s Blog.

http://blogs.msdn.com/euanga/archive/2006/07/11/662053.aspx

The quick fix is to add following lines in host file

# entry to get around diabolical Microsoft certificate checks
# which slow down non internet connected computers
127.0.0.1 crl.microsoft.com

This is to do with certificate validity check. It fix the problem with my friends machine. Hope it helps others too.

Enjoy !!

Is your code really SET based ? January 30, 2008

Posted by furrukhbaig in BEST PRACTICE, CROSS JOIN, Execution Plan, Performance, Performance Tuning, RBAR, SET BASED, SQL Server 2005, SQL Server 2005 features, Tips, Triangular Join, TSQL.
add a comment

Its a well known fact that best practice is to write the set based code to get better performance. While there is no absolute defination of set based and many people think that set based code is anything except CURSORS and LOOPs. Believe me that is not true.

I have been thinking to write about this topic for a while and just today i have come across very usefull article that explain exactly what I was trying to say. It also explains about Triangular and Cross Joins and a new (for me atleast) terminology ‘RBAR’. Its interesting.

http://www.sqlservercentral.com/articles/T-SQL/61539/ 

Enjoy !!

Find out whats running on SQL Server 2005 December 13, 2007

Posted by furrukhbaig in DMV's, Execution Plan, Optimize, Performance, Performance Tuning, Profiler, Queries running on the server, SQL Server 2005, SQL Server 2005 features, XML Execution Plan.
2 comments

Everyone wants to know whats running on the box. Specially if your job is to stabalise server you always concern what is killing the box.

A Friend of mine has published very usefull post with scripts to find out whats running on the SQL Server with their execution plan.

http://www.proteanit.com/b/2007/01/22/a-useful-script-to-analyse-current-activity-on-your-box/

The same can also be managed if you run profiler and capture XML execution plan which is not always possible due to security issue and overhead of profiler itself.

Index Fragmentation and dm_db_index_physical_stats December 12, 2007

Posted by furrukhbaig in DMV's, dm_db_index_physical_stats, Fragmentation, Index Fragmentation, Index tuning, Index usefulness, Indexes, Optimize, Performance, Performance Tuning, SQL Server 2005, SQL Server 2005 features, Tips.
1 comment so far

Fragmentation can cause serious performance issues and its vital to understand how this can impact performance before I tell you how to fix it.

Fragmentation can cause slow running queries due to more disk IO then usual which itself cause by page spliting. Fragmentation not only cause query performance it can also slow down write operation such as INSERT, DELETE and UPDATE.

SQL Server write data on 8k data pages. This limit can be reached very quickly, for wide tables storing lots of varchar/nvarchar columns (nvarchar take 2 bytes to store 1 character), will cause request for more data pages. When SQL Sever uses whole page then it occupy another page to store more data. If table has got cluster index then data needs to be stored in sorted order as cluster index ensure physical order of data according to the cluster key and thats why you can only have one clustered index per table.

Now assume you have compound clustered index on CountryID and CustomerNo colums on a transactional table which recieve 10 transactions per minute for an online car rental portal. Clustered index will make sure that data is stored in sorted order on CountryID and CustomerNo. Having millions of rows in table consider what will happen when you recieve transaction row which needs to be inserted on page 1 (CountryId = 1, CustomerId = 100) . Above will require to make some room on 1st data page (depending on FILLFACTOR, default is 100%). If require data page has not got enough room for the row to fit then rows on the data pages needs to be move forward (called PageSplit) in order to make some room for new row to fit on page in sorted order. This can cause lot of page split and fragmentation as new pages required (caused by page splits on existing data pages)  can not be guarantee to be in sequence with other page. This can seriously hurt performance for data modification (due to pagesplit) and data read (due to out of order pages) operations.

dm_db_index_physical_stats can be used to check fragmentation.

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(N’YourDB’), OBJECT_ID(N’dbo.CarRental’), NULL, NULL , ‘DETAILED’);

The above SQL will return fragmentation information for given Database name (Parameter 1) and Table Name (Parameter 2). Last parameter suggest the mode of analysis.

It is good practice to create clustered indexes on incrementing value (e.g. IDENTITY) to avoid page splits and out of order pages.

Statement Level Recompile August 22, 2007

Posted by furrukhbaig in Performance, RECOMPILE hint, SQL Server 2005, stored procedure, TSQL.
1 comment so far

In SQL Server 2000, the unit of compilation was the whole stored procedure. So even if you wanted just one particular query to be recompiled, you couldn’t request it. If you created the stored procedure with the RECOMPILE option, the whole procedure went through recompilation every time you invoked it.SQL Server 2005 supports statement-level recompile. Instead of having all queries in the stored procedure recompiled, SQL Server can now recompile individual statements. You’re provided with a new RECOMPILE query hint that allows you to explicitly request a recompilation of a particular query. This way, other queries can benefit from reusing previously cached execution plans if there’s no reason to recompile them every time the stored procedure is invoked.

Follow

Get every new post delivered to your Inbox.