Tuesday, September 24, 2013

The Script Transformation part 1 – a simple Transformation

http://www.bimonkey.com/2009/09/the-script-transformation-part-1-a-simple-transformation/

The Script Transformation part 1 – a simple Transformation

Fig 1: The Script Transformation
Fig 1: The Script Transformation
In this post I will be covering the Script Transformation. The sample package can be found here for 2005 and guidelines on use are here.

What does the Script Transformation do?

The question here should really be what can’t it do? The Script Transform fills in the gap when standard components don’t fit the bill by allowing you to create, consume or access rows and columns in the Data Flow and process them within VB.Net code (and also C# in 2008). This opens up a whole array of functionality to access as both VB.Net and C# are powerful and flexible languages. The Script Transformation can function as a Source (providing rows to the Data Flow), as a Destination (consuming rows from the Data Flow) or as a Transformation (changing or creating column values in the Data Flow).
Functioning as a Transformation, you aren’t restricted to just row in, row out operations either – you add new outputs, create multiple rows from single rows and create single rows from multiple rows. The capacity to do impressive tricks with your data is – well, impressive! Because of this array of options I will break each one out into a separate post. This first post will cover a simple one row in – one row out transformation.
Sadly there are some downsides to all this, two of which really stand out for me. The first of these is if you aren’t a programmer (I never moved much beyond VBA) then writing the code, debugging the code, or even knowing what can and can’t be done in the code can make working with this component a bit of a struggle. The second is you lose a lot of visibility over what is being done in the component – unlike with most other transforms there is no nice GUI to show what column is going where and what is being done to it. You need to be able to actually read the code to understand what is going on, and I must warn that the BI Monkey becomes one Angry Ape when code is insufficiently commented!

Configuring the Script Transformation

It is fairly easy to set up the Script Transformation, but you need to use a little more of the Advanced Editor type features than basic developers are probably used to. Key actions are selecting input columns, defining output columns, choosing input variables and connection managers being used within the component.
First up simply check the columns from the Input that you want to access in the script component. By “access” I mean read or alter the value of.
b
Fig 2: Selecting the Input Columns
Second, define the output columns. If you are adding new columns to the Data Flow as I do in the example, click the “Add Column” button which becomes enabled when you select the “Output Columns” folder. Then name it and select the data type. By default when using a Script Transformation as a Transformation a single output Output 0 is created for you to add columns to.
b
Fig 3: Configuring the Output Columns
Finally (in this case – I won’t be using any connection managers here and will cover those in a future post) enter the variables you want to be able to access in the script – there are two options – ReadOnlyVariables and ReadWriteVariables. Fairly self explanatory – if you want to change the value of the variable in the script enter it into the ReadWriteVariables line – and if you don’t want the value to change enter it into the ReadOnlyVariables line. Two quick notes of warning – if there’s a space in your list of variables in the ReadOnlyVariables line it will cause an error in 2005 Script tasks. Secondly, remember variable names are case sensitive.
b
Fig 4: Specifying the Variables
The last thing you need to do is click on the Design Script button that will open your code editor. There is a commented section where you can add your code.
Below is my sample code from the package – there’s nothing too fancy going on here. Note how the columns selected in the Inputs are available as a property of the Row - they simply pop up on Intellisense as you code. Similarly variables are accessible from Me.Variables.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
‘ Use a standard VB Function
Row.ShipYear = Row.ShipDate.Year
‘ Use a variable in the Me.Variables.<VariableName> format
Row.DoubleDue = Row.TotalDue * CType(Me.Variables.Multiplier, Decimal)
End Sub
I’ll dig into how code is structured in one of the future posts in this mini-series.

Where should you use the Script Transformation?

The Script transformation is there to be used when native SSIS component functionality doesn’t meet your needs – the most common use I have had for it is when the Derived Column editor doesn’t give me what I need – such as when I want to use Regular Expressions, or reuse a function across many columns.
MSDN Documentation for the Script Transformation can be found here for 2008 and here for 2005.
If you are still struggling, try these additional resources:
If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.

No comments:

Post a Comment