The Unpivot transformation is a relatively easy transformation to setup and configure. The trouble is there are situations where you may need to unpivot data where the unpivot transformation is not ideal. In this post, I’ll describe one such situation and then review an alternative method to unpivoting data that I have used.
PACKAGE REQUIREMENTS
As a DBA, one of the more common activities that I need to use SSIS for is to load data from customer data files into client databases. This process is usually pretty easy mapping out people, orders, items, etc. to the their respective database objects.
Often times, though, the customers will have a data element or thirty that doesn’t match to the client database. This information can be stored in various methods, but the most common method that I run into is structures for storing name-value pairs. Another name for this is the entity-attribute-value model.
Unpivoting data for this type of an import is fairly easy for an average sized file with an average number of columns. This gets tricky with some of the situations that I’ve run into at times.
For instance, in one solution a file with a couple hundred thousand rows was unpivoted and the resulting output was in the 10s of millions. This happened due to the large number of name-value pairs that needed to be setup for the file. In this case, as each column was unpivoted and validated it forced all of the operations to be done serially.
What I needed was a method that would unpivot the data. It also, though, needed to be able to put the output data in a state that it could be processed in parallel in later stages of the import.
UNPIVOT ALTERNATIVE SOLUTION
The solution to using the Unpivot transformation will need to know what fields are in the file that need to be unpivoted from the customer file. Then for each of these field, the value in the customer file will need to be extracted.
As I mentioned above, after the unpivot each of the value should be in a state that can be processed in parallel. To do this, the values that are unpivoted will be exported into individual files.
PACKAGE CONTROL FLOW
To accomplish this we will need a few components added to the control flow.
First, add a Data Flow task that will be used to extract a sample file to simulate an import. This will be discussed in more detail below.
Next add an Execute SQL task. The data source for the Execute SQL task will be a SQL Server database with the AdventureWorks database. The query can be found in the download at the end of the post. It is important, though, to note that the task will be setup to with ResultSet set to Full result set.
Also, the Result Set will be mapped to the RSYearList variable.
After this add a ForEach Loop Container to the Control Flow. Set the Enumerator to Foreach ADO Enumerator. The ADO object source variable will be RSYearList. Leave the Enumeration mode to Rows in the first table.
Under Variable Mappings add a mapping for the variable Year to the Index of 0. This will tie the variable to the first column in the dataset that will be enumerated.
Within the ForEach Loop Container add a Data Flow task. This task will do the unpivoting work for the package. As with the other data flow task, more details will be provided below.
PACKAGE VARIABLES
As noted above, there will be a few variables that are part of this package.
The variables from above were added for the following purposes:
- FileDestination: Name and location of the text files will be the destination for the unpivoted data. The name of this file will change per column that needs to be unpivoted.
- FileSource: Name and location of the text file that is the source for the unpivot operation.
- RSYearList: Variable with an object data type. This will store a dataset containing information about the columns to be unpivoted.
- WorkFolder: This is my one variable to rule them all that sets the FileDestination and FileSource paths.
- Year: A variable to hold the year value to indicate which column to unpivot.
PACKAGE DATA FLOWS
As noted above, there are two Data Flows in this SSIS Package. The Data Flows is where most of the work will happen in the package.
DFT_DEMO_FILE
This Data Flow task creates a text file that will simulate a customer file that needs to be imported. This wouldn’t be part of the final SSIS solution since the files would be received from your customers.
- DS_MSS_Sales: OLE DB Source that retrieves data from the AdventureWorks database. The data source provides sales information.
- DD_FF_Sales: Flat file destination for the sales information.
There isn’t much to see here. Moving along then.
DFT_UNPIVOT
The DFT_Unpivot Data Flow task will be executed one time for each loop of FELC_Year_List. The FELC_Year_List ForEach Loop contained will loop once for every row in RSYearList.
During each execution, the customer file will be read and the column that is currently being unpivoted will be output to destination files.
The transformations in this data flow are:
- DS_FF_Sales: Flat file source containing the sales information.
- SCR_Unpivot: Script Component transformation that selects columns from the source file and outputs the information into a new column. Configuration of this transformation will be detailed more below.
- DD_FF_Sales: Flat file destination for each of the unpivoted columns from the sales information.
SCR_UNPIVOT CONFIGURATION
In order to use a Script Component transformation to unpivot the data from the source file there are a few things that need to be done to configure the transformation. This section will review the transformation and the steps required to set it up.
Starting with the first tab, set the ReadOnlyVariable to use the Year variable. If you are using SQL Server 2008 this can be selected from a list. With SQL Server 2005 you will need to type out the variable name.
Next go to the Input Columns tab. In this tab, select the columns 2001, 2002, 2003, and 2004. These columns are the one that will be unpivoted. Notice that they Output Alias for each column has the prefix Year appended to it. This is done because columns with all numeric name cannot be used within Script Transformations.
Move next to the Inputs and Outputs tab. On this tab add two Output Columns to the transformation. The Total Due column should have a decimal type. The Year should have an integer type.
Finally, the fun part will begin. What’s a script transformation without a script. Start by opening the script window for the transformation. For this demonstration the code will be in C#.
In the code window, replace the Input0_ProcessInputRow function with the following code:
This code will go through the data that is being streamed and based on the Year variable will change the values and columns that are assigned to the Year and Total Due columns. The last part of the code will throw an error if an unexpected value is encountered. This raises an error if there is an attempt to unpivot columns that do not exist or has not been coded.
UNPIVOT WRAP-UP
Once all of the above is done, the SSIS package can be executed. Doing so will generate four files which contain one column of the unpivoted values in them. The should look similar to the following:
This package may not fit your situation. It will hopefully, though, spark some ideas on other ways to utilize Script Transformations in your solutions. Script Transformations do offer a lot of flexibility to do some great things in SSIS packages. There will likely be a few more of these types of solutions this month.
One of the downsides to this solution is that it does read through the source file multiple times. This can be an issue at times, especially if there is a lot of data to unpivot. It can, though, balance out by the benefits of being able to process the pivoted data in parallel after this step. Also, the difference between pivoting one column and many columns is not that much effort and this can assist in having flexible solutions.
As always, let me know what you think. If anything is unclear, I will update to help clarify.
No comments:
Post a Comment