Monday, September 30, 2013

31 DAYS OF SSIS – ADDING SOME DELTA FORCE (27/31)

In a presentation that I did this week I was asked about how I handled updates with the SSIS package I was demoing.  This was the package that I discussed in the post One Package, Unlimited Databases.  I explained the process that I use in that webcast but thought it would also make a good topic.

DELTA FORCE REQUIREMENTS

Before we get into the details, I’d like to take the question and restate it as requirements.  By doing this, I’m hoping that by reading it in this way requirements you encounter might spark a connection.  Through that connection you’ll recall this post and be able to apply this concept.
The general need is a package that will receive data from one data source.  In this case, we are receiving data from a central database.  When the data is received, rows that do not exist will be insert.  Rows that already exist as records in the tables will be updated.
The updates to the records should only be applied to records where there is a change to at least one column.  Existing records should not be modified if there are no changes in any of the columns.

DELTA FORCE SOLUTION

The first part of the requirements was already implemented in the One Package, Unlimited Databasespost.  Pieces that were discussed as part of that solution won’t be detailed out here.  We’ll pick up the solution from the second half of the requirements.
One option we could choose would be to go cheap and easy.  To do this we could modify the OLE DB Command with a WHERE clause and then call this requirement done.  The problem with this option is that if there are 1,000 in the path of existing rows, then 1,000 UPDATE statements will be executed.  The trouble with this is that there might only be 2 updates that need to occur.  Due to this, I wouldn’t consider this option ideal.
Another option is to use the Script Component transformation to check the values of the columns.  We don’t want to execute SQL statements against the database that may or may not need to be executed.  Instead, the Script Component transformation can be used to filter out all of the rows that haven’t changed.  Instead of sending all of the existing rows to the OLE DB Command, it will instead only receive those rows that have changed.

PACKAGE CONTROL FLOW

imageSince most of the interesting things in the Control Flow were described in the earlier post, they will not be expanded on too much again.  Our focus will be on drilling into the Data Flow which we will just to shortly.  Since they aren’t related to this variation of the solution, I won’t be looking at that variable and connection configurations.
There are four tasks that comprise the Control Flow:
  • SQL_DestinationDatabases: This is a T-SQL that fetches a list of servers and databases that need the information.  The server and database names retrieved are populated to variables.
  • FELP_Databases: ForEach Loop container that iterates across each occurrence of server and database name.
  • SQL_ChangeData: Demo only task to delete and update data to make all of the data flow paths do something.
  • DFT_InsertUpdateStateProvince: Data Flow task to determine whether rows should be inserted or updated.  This will contain the bulk of our solution.

PACKAGE DATA FLOW

If you recall from the previous post, there were two data flows in the package.  Since the second one doesn’t impact the focus on the delta change, it has been removed.  In a full solution to cover inserts, updates, and deletes it would reaming.  With that said, we’ll dig into the data flow that is remaining.

DFT_INSERTUPDATESTATEPROVINCE

imageThis Data Flow task controls the INSERT and UPDATE operations on the table affected by this data flow.  The data flow is similar to before with the exception to a couple additional transformations.
  • DS_MSS_AdventureWorks: OLE DB Source to retrieve central data for distribution.
  • LKUP_Subscriber: Lookup transformation to determine if the record exists in the subscriber database.  There is a slight change in this transformation, described below, to assist in performing the delta checks.
  • DD_MSS_StateProvince: OLE DB Destination in the subscriber database that receives the data.  This is just a straight insert.
  • SCR_DeltaPower: Script Component transformation that checks each row that exists in the destination table to see if any of the columns in the row has changed compared to the destination table.
  • OLECMD_UpdateStateProvince: OLE DB Command to perform updates for each row that currently exists in that database.  This isn’t ideal, but I’m saving a more ideal solution for determining deltas for a future post.
  • RSAMP_Ignore: Row Sampling transformation that exist purely for viewing pleasure.  It allows us to see how many rows are ignored when the package executes.

LKUP_SUBSCRIBER

SNAGHTML10faef9As I mentioned above, there is a little variation into how this transformation is configured.  In the previous post, this transformation was purely used to determine if a row in the data flow path also existed in the table.  The lookup was matching on the StateProvinceID without returning any data.  This time we want to return some data.
Specifically, the columns StateProvinceCode, CountryRegionCode, IsOnlyStateProvinceFlag, Name, and TerritoryID need to be returned.  These are the columns that we want to check to see if any of the values have changed.
One important thing to note is that all of these columns already exist in the Data Flow path.  To prevent column naming irritations, append the value “_Current” to each of the columns.  This will make them stick out and avoid name irritations.
When this is complete, this transformation is done.

SCR_DELTAPOWER

The true effort of the delta check will be occurring in the SCR_DeltaPower script component transformation.  There are two main points of configuration for the transformation.  These will be the outputs and the script.
SNAGHTML1179909We will configure this component to provide multiple outputs along two paths.  The paths will be UpdateRows and IgnoreRows.  The names should indicate the state of the data that we plan to add to each of the paths.  Each path will contain all of the rows from the input and should be mapped to the input.  If you are unfamiliar with how to do that, please read Every Which Way Outputs.
Once the output is configured, it’s time to work on the guts of the script.  If you haven’t done column comparisons in SSIS packages before the code structure might seem obvious.  Unfortunately, with the possibilities of NULL values in columns, this can prove tricky to the novice.
Before explaining the script, open the script window and replace the contents of public class ScriptMain : UserComponent with the code below.
01public override void Input0_ProcessInputRow(Input0Buffer Row)
02{
03//Check state of StateProvinceCode
04if (!Row.StateProvinceCode_IsNull && !Row.StateProvinceCodeCurrent_IsNull)
05{
06if (Row.StateProvinceCode != Row.StateProvinceCodeCurrent)
07{
08Row.DirectRowToUpdateRow();
09return;
10}
11}
12else if ((Row.StateProvinceCode_IsNull && !Row.StateProvinceCodeCurrent_IsNull)
13|| (!Row.StateProvinceCode_IsNull && Row.StateProvinceCodeCurrent_IsNull))
14{
15Row.DirectRowToUpdateRow();
16return;
17}
18 
19//Check state of CountryRegionCode
20if (!Row.CountryRegionCode_IsNull && !Row.CountryRegionCodeCurrent_IsNull)
21{
22if (Row.CountryRegionCode != Row.CountryRegionCodeCurrent)
23{
24Row.DirectRowToUpdateRow();
25return;
26}
27}
28else if ((Row.CountryRegionCode_IsNull && !Row.CountryRegionCodeCurrent_IsNull)
29|| (!Row.CountryRegionCode_IsNull && Row.CountryRegionCodeCurrent_IsNull))
30{
31Row.DirectRowToUpdateRow();
32return;
33}
34 
35//Check state of IsOnlyStateProvinceFlag
36if (!Row.IsOnlyStateProvinceFlag_IsNull && !Row.IsOnlyStateProvinceFlagCurrent_IsNull)
37{
38if (Row.IsOnlyStateProvinceFlag != Row.IsOnlyStateProvinceFlagCurrent)
39{
40Row.DirectRowToUpdateRow();
41return;
42}
43}
44else if ((Row.IsOnlyStateProvinceFlag_IsNull && !Row.IsOnlyStateProvinceFlagCurrent_IsNull)
45|| (!Row.IsOnlyStateProvinceFlag_IsNull && Row.IsOnlyStateProvinceFlagCurrent_IsNull))
46{
47Row.DirectRowToUpdateRow();
48return;
49}
50 
51//Check state of TerritoryID
52if (!Row.TerritoryID_IsNull && !Row.TerritoryIDCurrent_IsNull)
53{
54if (Row.TerritoryID != Row.TerritoryIDCurrent)
55{
56Row.DirectRowToUpdateRow();
57return;
58}
59}
60else if ((Row.TerritoryID_IsNull && !Row.TerritoryIDCurrent_IsNull)
61|| (!Row.TerritoryID_IsNull && Row.TerritoryIDCurrent_IsNull))
62{
63Row.DirectRowToUpdateRow();
64return;
65}
66 
67//No Changes, ignore it
68Row.DirectRowToIgnoreRow();
There are a couple items in the code that I want to all out.  First, before I check to see if the values of the source and destination columns are different (pink highlighting), I check to see if both columns are not NULL (yellow highlighting).  This is the issue I mention above with NULL values.
image
Next, if row isn’t picked up above then we check to see if one column is NULL and the other is not NULL.  And vice versa.
image
This is completed for each column that needs to be checked.  Any rows that don’t get kicked out to the UpdateRow output get sent to the IgnoreRow output.

WRAP-UP

Yet another method that the Script Component transformation can be used to add functionality to an SSIS package.  Through the use of a solution such as the one above needless SQL statements can be prevented from being sent to your SQL Servers.
One of the things that you might be thinking is “Why the heck is StrateSQL using an OLE DB Command?”  To that I say, good point.  But is the transformation really that bad?
Commentary on the use of the OLE DB Commands transformation could fill an entire post all itself.  What I will say here about the transformation is that in cases where the volume of rows being processed is not substantial, it is acceptable to use this transformation.  You definitely need to understand the current and future number of rows that will be processed.

PACKAGE DOWNLOAD

No comments:

Post a Comment