PACKAGE REQUIREMENTS
Sometimes, though, it isn’t necessary to populate customer data that doesn’t fit into the portion of the data model for custom values. Consider for a moment data that needs to be stored for a customer, but will only ever be used for troubleshooting or to be returned in some later process.
Would this data be better stored in the EAV architecture, or some other storage mechanism? As the name of the post infers, it might be better to store that data in XML. You can’t throw it away, but you need to store it where it won’t get in the way.
This is exactly the issue that was brought up to me a number of months ago. A new import would have millions of records, each of which might have dozens of custom values. Storing these values would drastically change the statistics of the existing EAV architecture that only contained a few million rows by adding millions of more rows.
SCRIPT TASK SOLUTION
For this SSIS package, we need a task that will be able to consume one or more columns. These columns should be added to an XML document that is created and returned by the transformation. The package will then be able to take the XML document created and move it to wherever it needs to be delivered.
To keep the solution flexible it should be portable to other SSIS packages. The process for creating the XML document should be easily configurable and require limited coding for each new implementation.
To accomplish these needs, we’ll implement some code within a Script Component transformation. The code will be dynamic with configuration relying on the columns passed into the transformation.
Before getting too far into explaining this solution, the idea for this solution arose after reading a blog post by John Welch (Blog | @John_Welch). In his post, John discussed how to create an XML Destination Script Component. Some of the code in this post is derived from his post. I highly recommend reading this and other posts on his blog – there is some great content on his site.
PACKAGE CONTROL FLOW
To demonstrate this solution, we’ll build a fairly simple SSIS package. The package only contains a couple Data Flow tasks. The first data flow task creates a file that will be used to demonstrate the functionality in the second data flow task. The second data flow task will consolidate a number of columns into an XML document that can be inserted or update into an XML column for reference.
PACKAGE VARIABLES
Only two tasks, so why not only two variables as well for the package? Well this wasn’t intentional, but it was all that I needed.
The variables from above were added for the following purposes:
- FileProduce: Name and location of the text file the source for the XML creation transformation.
- WorkFolder: This is my one variable to rule them all that sets the FileDestination and FileSource paths.
PACKAGE DATA FLOWS
As noted above there are two data flows. In this section, we’ll dissect them to explain what you will see when you download the SSIS package.
DFT_DEMO_FILE
Within this data flow task a text file will be created. The purpose of the text file is to provide data for the second data flow. This isn’t entirely necessary to demonstrate this functionality but is included to implement the demonstration with the data source that the original design utilized.
There are two transformations in this data flow:
- DS_MSS_Product: OLE DB Source that retrieves data from the AdventureWorks database. The data source exports sales information.
- DD_FF_Product: Flat file destination for the sales information.
DFT_XMLDOCUMENT
This data flow is where the real action is. The data flow uses the file created in the previous data flow as it’s source data. The data is transformed in a Script Component transformation. The last step is a Row Sampling transformation that provides a data flow path that we can hook a data viewer onto.
The transformations in this data flow are:
- DS_FF_Product: Flat file source containing the product information.
- SCR_Fill_XMLData: Script Component transformation that accepts columns from the data flow path and returns those columns as an XML document in a new column. The configuration of this component will be describe in detail below
- RSAMP_Data: As mentioned above, this is here to provide a place for a data viewer.
SCR_FILL_XMLDATA CONFIGURATION
To use a Script Component transformation for building XML there are a few configuration points that will be important to understand. We’ll go over setting this up, and how to change the contents of the XML document without changing any code.
First browse to the Input Columns tab. On this tab, select all of the columns that you want to include in the XML document that is created. This is the primary configuration point for this task.
The columns selected can be changed at anytime. The only restriction will be the need to recompile the code in order for the underlying collections to be updated. No edits will need to be made.
Next on the Inputs and Outputs tab add a new column to the Output 0 output. This column should be configured as a string data type with a maximum length of 4,000. This is shown below.
Now that these two pieces are configured, go back to the Script tab and select the Edit Script button. The script window will open and you see a basic transformation template.
Input0_ProcessInputRow Function
The first step in setting up the script will be to replace the Input0_ProcessInputRow function with the one below. Before we do that lets review what this script is doing to make certain that you understand the code before you use it.
The code will do the following:
- For each column that has been selected as an input for the Script Component transformation the function will loop across each column.
- If the column name is XMLData, don’t do anything. Since we will assign the XML document to the column later we don’t want to include it in the data being fed into the XML document.
- Get the name of the column. Spaces and periods are removed from the column name since the Script Component transformation will do this itself.
- Get the value of the column.
- Get the name of the NULL column created for the column. When columns are added to a Script Component transformation a NULL is created to allow tests for NULL.
- If the column is configured as ReadOnly then consider it for the XML document. This was added as a fail safe to the XMLData. It is redundant and could be removed.
- Check if the column value is NULL. If so, add an empty element to the XML document. Otherwise, add an element to the XML document with the column value.
Now that we’ve gone through the psuedo-code, here’s that function:
The code does what the pseudo-code outlined.
Formatting Functions
There are two formatting functions that came from the post mentioned above. These allow for converting values to XML element easier with less code. The one distinction with these functions to mention is the inclusion of CDATA. CDATA is used to force the encoding of values that can be in the source data to prevent them from causing the XML document to be malformed.
XML SCRIPT TASK WRAP-UP
With that you can execute the SSIS package. If you have a data viewer on the package you’ll see something like this:
Pull out one of the values, and you get the following:
Wait a minute, this doesn’t look like XML! But remember what I mentioned about he CDATA. Drop these values into a variable of the XML data type and select it back out. Try that and you will get the following:
This, of course, looks like what you were expecting. Now go forth and cut and paste this into all of the new SSIS packages where you need this functionality.
The one word of caution what I will add to using this is the limit to 4,000 characters. Right now, strings can’t exceed that within SSIS and that can be an issue. If you run into this, this solution will not completely fit for you.
Hopefully, with this and the previous SSIS package you are starting to see that there are a lot of creative things that can be done with Script Component transformations. For the most part, the sky is the limit with these.
No comments:
Post a Comment