Monday, September 30, 2013

31 DAYS OF SSIS – EVERY WHICH WAY OUTPUTS (15/31)

One of my favorite transformations to use within SSIS packages is the Script Component transformation.  Back in the day with DTS, we had to use vbscript, and our solutions were quite limited with their capabilities.  In this post, I want to take a look at one of the capabilities that would not be possible with vbscript that we are able to accomplish today.

MULTIPLE OUTPUTS REQUIREMENTS

Before we go into the solution, we need a problem.  A problem that can be mapped to this solution and in which it also makes sense to choose this solution.  For this problem let’s start with a common task – a data export.
This data export is going to export some data from a single table.  Depending on the data in the table, there are three possible text files that will be populated.  The routing of the three text files will be based on the ProductCode column.  The first file will receive all values less than 50 or starting with the letter “A”.  The second file will received all values 50 or greater or starting with the letter “B”.  The third file will receive all other product codes.
As contrived as these requirements may seem, they are nearly identical to a request that I received a few months back.  Those requirements matched others that I have encountered throughout the years.

MULTIPLE OUTPUTS SOLUTION

There are a few alternatives that can be looked at for these requirements.  We’ll review a couple options and then discuss implementing one of the solutions.  The solution that we implement will be the one I consider the most simple, extensible, and maintainable.
The first option, the one I expect most will default to, will be to to use a Conditional Split transformation.  For those that don’t know, this transformation performs a case statement across a number of expressions.  For each case statement, an output is created from the transformation.
Now doing this might prove to be a bit tricky since the expression for the ranges will be a bit tricky.  The expression will need to change the ProductCode value to an int to check the ranges or contain all of the numerical values in the evaluation.  With expressions this complex, it would probably be easier to troubleshoot the package if the values were populated to a new column in the Data Flow with a Derived Column transformation.  After the new column is added, then use a Conditional Split transformation to split out the rows.
The solution is a bit better, but not to the point where I would be satisfied.  The complexity of the expression will make adding new values to the ranges or maintaining the expression difficult.  Can you imagine having to update someone else’s package that has this kind of an expression in it?  After you curse out the former developer, you’ll spend the rest of the day making the change and then validating your changes.
You could improve the flexibility of the package by moving all of the logic from the Derived Column transformation to a Script Component transformation.  In this solution, you’d add in a new column on the Data Flow path and populate a value that the Conditional Split transformation can use, as was mentioned above.  By using .NET code you will be able to break apart the logic to split the rows into cleaner IF-THEN statements, which the next developer will find fairly easy to maintain and troubleshoot.  Also, since you are in a development environment, data types can be easily changed and ranges can be compared versus a list of values.
At this point, though, there is still one more change that can be made. This change is the real point of this entire post.  The previous solutions would determine the path to use and then use a Conditional Split transformation to split apart the rows.  That transformation really isn’t needed.  Instead the Script Component transformation can be configured to split the rows and redirect them within itself.  This is the solution that we will be going throughout the rest of this post.

imagePACKAGE CONTROL FLOW

The focus of this SSIS package will be in the details of the Script Component transformation.  To get to that transformation, add a Data Flow to the Control Flow.

PACKAGE VARIABLES

There are no variables for this solution.  Freaky… right?!

PACKAGE DATA FLOWS

There hasn’t been a lot to this package thus far, and the data flows will be no different.  There is just a single data flow for this package.

imageDFT_MULTIPEOUTPUTS

Within this data flow, data will be imported into the data flow.  A Script Component transformation  will split that data along three outputs.  These three outputs will move data into three Row Sampling transformations.
The transformations in the data flow are:
  • DS_MSS_Product: OLE DB Source that retrieves data from the AdventureWorks database.  The data source brings product data into the data flow.
  • SCR_MultiOutput: Script Component transformation with multiple outputs.  Product data is processed by the transformation to send the data to one of the three output options.
  • RSAMP_Less50: Row Sampling transformation that was included to allow a data flow path and data viewer for rows with ProductCode between 0 and 50 or ends with the letter “A”.
  • RSAMP_More50: Row Sampling transformation that was included to allow a data flow path and data viewer for rows with ProductCode that is 50 or more or ends with the letter “A”.
  • RSAMP_Alpha: Row Sampling transformation that was included to allow a data flow path and data viewer for rows with ProductCode that is non-numeric and doesn’t end in “A” or “B”.

SCR_MULTIOUTPUT CONFIGURATION

The main item to configure in this package is the SCR_MultiOutput Script Component transformation.  The other items are fairly simple and can just be copied from the package included below.
To begin configuring the script component transformation, open the transformation editor.  Select theInput Columns tab.  On this tab, select ProductCode from the Available Input Columns.
SNAGHTMLdc3dc42[4]
After that chance is made, choose the Inputs and Outputs tab.  In this tab create two new outputs and rename the existing output.  These should be ValueLessThan50ValueMoreThan50, andValueAlphabetical.
Once these are created they will need two additional changes made to them.  The first is to set theExclusionGroup property to 1.  This property dictates whether the output is a distinct group of output rows; for any row to appear in this output the row must be directed to it.  Second, change theSychronousInputID to the Input 0 for the transformation.  Doing this will give the output the same metadata as Input 0.
SNAGHTMLdcf5e1e
These steps will modify the Script Component transformation to provide three outputs.  If you close the transformation it can be connected to other transformations; such as the Row Sampling transformations mentioned above.
If you closed the transformation, open the editor for it again.  Select the Edit Script button on the Script tab.  A Visual Studio 2008 Tools for Applications design window window will open next.  There are a few functions in the window, the one we will be modifying is Input0_ProcessInputRow.
First let’s replace the code for Input0_ProcessInputRow with the code below.  After that, I’ll review what the code will accomplish.
01public override void Input0_ProcessInputRow(Input0Buffer Row)
02{
03int iValue;
04 
05if (int.TryParse(Row.ProductCode, out iValue))
06{
07if (iValue >= 0 &amp;amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp;amp; iValue < 50)
08{
09Row.DirectRowToValueLessThan50();
10}
11else
12{
13Row.DirectRowToValueMoreThan50();
14}
15}
16else
17{
18if (Row.ProductCode.EndsWith("A"))
19{
20Row.DirectRowToValueLessThan50();
21}
22else if (Row.ProductCode.EndsWith("B"))
23{
24Row.DirectRowToValueMoreThan50();
25}
26else
27{
28Row.DirectRowToValueAlphabetical();
29}
30}
31}
First, an attempt is made to convert the value for ProductCode to an int.  This returns a true value if the ProductCode is an int and populates the value to iValue.  If iValue is less than 50 then the row is directed to the ValueLessThan50 output through the DirectRowToValueLessThan50 method. This method is created when the ExclusionGroup property was set for the output.  If iValue is 50 or above, then the row is directed to the ValueMoreThan50 output with the DirectRowToValueMoreThan50 method.
When the TryParse returns a false value, then the second code block is checked.  If the ProductCode value ends with “A” then the DirectRowToValueLessThan50 method is called.  When the ProductCode value ends with a “B” then the DirectRowToValueMoreThan50 method is executed.  Finally, the remaining rows are sent to the ValueAlphabetical output with the DirectRowToValueAlphabetical method.

MULTIPLE OUTPUTS WRAP-UP

At this point the Script Component transformation configuration is completed.  The transformation will now split the rows between the configured outputs.
But why choose this solution over the other solution options?  There are three reasons, which were previously named, for going with this route:
  • Simple: Instead of two or more transformations that together implement this logic there is only a single transformation.  At times the need to implement logic over multiple transformations can muddy the waters within the package.  Since this is all contained in a single transformation, outside dependency issues are avoided.
  • Extensible: When the rules change and the ranges need modification, the logic in the Script Component can be easily modified and new outputs can be added without needing to configure multiple transformations.
  • Maintainable: The SSIS expression language is easy to use, provided you know all of the “gotchas” and things that will bite you.  Stay tuned to tomorrow’s post for more on that.When you look at the staff in your environment, there are likely more people that can read and write C# code than can do the same with the SSIS expression language.  This solution lowers the bar on the people that can review, write, and troubleshoot the SSIS package.
I’ve said it before and will say it again, Script Component transformations are extremely useful and open up the possibilities for what can be done within SSIS.  Be sure to consider them in your solutions, especially when the logic includes many transformations.  There may be a way to consolidate the logic and reduce your headaches.

PACKAGE DOWNLOAD

No comments:

Post a Comment