Thursday, April 4, 2013

Error logging with Bulk Insert

Error logging with Bulk Insert

The bulk insert is used to bulk load data into staging tables from .txt or .csv files. It is very fast compared to normal insert statements because by default, CHECK and FOREIGN KEY constraints are disabled. Although this behavior can be controlled by using CHECK_CONSTRAINTS, it is highly recommended for better performance not to use this argument. For more information on the Bulk Insert statement please refer to the Microsoft documentation: http://msdn.microsoft.com/en-us/library/ms188365.aspx
The Bulk Insert statement rejects all the records that do not match the column data type or size such as:
  1. We have a column defined as an Integer and a value present in that column is actually composed of characters.
  2. The size defined is varchar(10) and actually the length of the value is more than 10 characters.
Only the records that violates the above rules are rejected and not the entire file. Again this behavior can be controlled by using the MAXERRORS argument. This argument defines how many records can be rejected before rejecting the complete file.
Let's examine a couple scenarios to show how this works
Scenario 1: Suppose we have defined the value of MAXERRORS to be 10 and 5 records are rejected during the Bulk Insert process. The rest of the records will be loaded successfully, and only those 5 records that don't meet the criteria are not loaded.
Scenario 2: Suppose we have defined the value of MAXERRORS to be 10 and 11 records are rejected during the Bulk Insert process. Then the entire file will be rejected and no records will be loaded into staging table.
Now the question arises about the 5 records rejected during Scenario 1. We can save those records in a different file if we enable error logging features. Use the argument ERRORFILE and give the path and name of the file where you want to log such records that are rejected during BULK INSERT.
Remember if the entire file is rejected due to exceeding the value of MAXERRORS value then only MAXERRORS+1 records will be logged in the error log file. This is because the system checks the file only to the point where it gets to MAXERRORS + 1 and not beyond this record.
Along with the error log file, another file is created by default by the system which has the extension .txt.error. It will contain row number and offset of erroneous records. Note that there are some issues with this file (.txt.error) in SQL Server 2005 SP2, which are fixed in the SP4 patch. The issue is SQL Server creates this file two times during the execution of BULK INSERT. When it is created the second time, it gives the error that the file already exists, which leads to the failure of the BULK INSERT statement.

Conclusion

Using the argument ERRORFILE in BULK INSERT statement helps you in identifying the records which are rejected and to fix those records and reload into the system. The MAXERRORS parameter also allows you to control whether you want to allow a file to load that might have specified number of errors.

No comments:

Post a Comment