Friday, November 25, 2016

SSIS OLE DB Destination sends all records to Error Output

SSIS OLE DB Destination is one of the destinations we heavily used for loading data to databases. Generally we used Data access mode - fast load for loading data because it is the most efficient and fastest way of loading data. However with default setting, you may see some issues if you try to capture all error records while loading the destination unless you make some changes to some settings.

Let's make a simple data set and try to load to a table. I have following table created in tempdb.

USE tempdb

CREATE TABLE dbo.Product
(
 ProductId int IDENTITY(1,1) NOT NULL,
 ProductName varchar(200) NOT NULL,
 Class char(1) NOT NULL,
 Cost money NOT NULL,
 PRIMARY KEY CLUSTERED (ProductId ASC)
)
GO

ALTER TABLE dbo.Product  
ADD  CONSTRAINT CH_Product_Class 
CHECK  ((Class='A' OR Class='a' OR Class='B' OR Class='b'))
GO

ALTER TABLE dbo.Product  
ADD  CONSTRAINT CH_Product_Cost CHECK  ((Cost>(0)))
GO

I have a text file named Products.txt that contains 15 records as below;

ProductName,Class,Cost
P01,A,100
P02,B,150
P03,A,344
P04,A,555
P05,A,100
P06,B,150
P07,X,344
P08,A,555
P09,A,555
P10,A,100
P11,B,150
P12,A,344
P13,A,555
P14,A,344
P15,A,555

As you see, P07 record is not a valid record as it has X as the Class. Let's set up a package with a Data Flow Task and configure it to read data from this file and load into Product table as below.

I have used OLE DB Destination with its default settings.


And have configured the Error Output to Redirect Row on Error.


Then I have set a Record Destination just for capturing the error output. If you do, you can configure DataReader Destination as well. This is Data Flow looks like;


Now if I run the package, packages retrieves all records but it send all records to Recordset Destination even though there is only one record that violates a constraint.


What is the reason for this. The reason is the value we have set (or the default value) for Maximum insert commit size property. This property indicates the batch size that the OLE DB Destination tries to commit data during the operation. Since we have only 15 records and the batch size set is 2147483647, it tries to commit all at once. Since there is an invalid record, entire batch is getting failed and none of the record is getting inserted (or committed).

This is fine with a small dataset like this but this is not acceptable for a large data set. Hence we need to make sure that it does not send all records to error flow but an acceptable range.

For this, let's make it as 5 and see.


Here is the result of execution after changes;


This makes sure that at least certain batches are getting inserted without sending all records to error flow.


However, you need to decide the batch size without degrading the performance. If you set it for a small value, you may not get the expected performance form fast-load option.


No comments: