FILE IMPORT REQUIREMENTS
To demonstrate the Import Column, I am going to expand on the scenario that I used previously. In the previous post, I needed to export execution plans for later analysis. The execution plans exported in to files and sent over to me. Now let’s put those plans back in a database so that they can be investigated with XQuery using some of the concepts I discuss in the plan cache series.
FILE EXPORT SOLUTION
To accomplish this we need an SSIS package that can look at a directory and import all of the files in that folder into a table in a database. We’ll use the Import Column transformation to populate the information to the table. The trouble is, there isn’t a data source that returns directory contents. For this need, we’ll use a Script Component transformation and configure it as a data source.
With those transformations, we should be able to load the contents of the folder. Then use that information to populate a table with the contents of all of the files that are discovered.
PACKAGE CONTROL FLOW
In this package, there are two tasks in the Control Flow. These tasks provide the table for the data and the data flow to load the data.
The components are:
- SQL_CreateTable: Execute SQL task that creates a table in tempdb. This table will store the files and their contents that are imported.
- DFT_ExportPlans: Data Flow task that imports the contents of files in the the table created in the previous task.
PACKAGE VARIABLES
To allow the most flexibility for rewriting this package to your needs, the package has been setup to use a couple package variables.
- FileExtension: Extension of the files that the SSIS package will be importing. Since we only want to import execution plans, the package will filter based on this extension.
- WorkFolder: Folder in which files are stored that are to be imported.
PACKAGE DATA FLOW
We only have a single data flow. This will list the files from a folder. The contents of these files will be imported into the table created in the Control Flow.
DFT_EXPORTPLANS
The Data Flow task here will be used to import the files from the directory stored in the variable above. These files will be read into the data flow and the contents will be pulled in as well. At completion, all of the data will be inserted into a table.
This is done with the following transformations:
- DS_SCR_SQLPlan: Script Component transformation that reads the contents of a directory and returns a data flow output with those files.
- IC_SQLPlan: Import Columns transformation that will export the contents of the file names in the data flow path into a new column.
- DD_MSS_SQLPlans: OLE DB Destination for the data retrieved by the script component and input column transformations.
DS_SCR_SQLPLAN CONFIGURATION
Previous posts haven’t covered using this transformation in this manner. Since this hasn’t been done before, let’s take a look a bit at how to configure this in your SSIS package.
To start, theScript tab will need to be configured. The transformation will need to have the variables FileExtension and WorkFolder added asReadOnlyVariables. This is shown in the image above.
Next, configure the Inputs and Outputs tab. On this tab, add a new column to the Output Columns folder. Name this column FilePath. The data type for the column should be DT_STR with a width of 512.
With these configured, open the Script window for the transformation. Within the script window replace the contents of ScriptMain : UserComponent with the code below. Also, add a reference toSystem.IO at the beginning of the code block.
This code will use the WorkFolder variable to create a DirectoryInfo object. The DirectoryInfo GetFileSystemInfos() method is then called to return a collection populated with all of the files in that WorkFolder. While looping through all of the files, all of those that match the file extension in the FileExtension variable will be added to the data source output.
To add rows to the output of the Script Component, the AddRow() method of the output buffer is called. After the row is created, the values for the columns in the row are assigned. In this case, there is only a single column so just set the column FilePath to the FullName of the FileSystemInfo object.
If we run the package now and look at the data through a Data Viewer, it should look similar to the image below:
IC_SQLPLAN CONFIGURATION
With the Script Component transformation configured, we are left with configuring the Import Column transformation. Configuring this will be similar in a sense to configuring the Pivot transformation.
Start by opening the the editor for the transformation. This will open Advanced Editor for the transformation. Next, select the Input Columns tab in the editor. In this tab, you will need to add the FilePath column as an Input Column. This column will be used later as the source for where the file that needs to be imported is located.
Once this is complete, browse to the Input and Output Properties tab. There are a couple items that need to be configured on this tab. To start with, add a column to the Import Column Output. For this example, the column is named FileContents, but it can have any name. Then change the DataType for the column to DT_NTEXT. Depending on the type of data you need to import, this could be DT_TEXT or DT_IMAGE. After completing this, make note of the LineageID, that will be needed in the next step.
Finally, to complete the transformation setup select the properties for the FilePath column in the Import Column Input. This column will have a property name FileDataColumnID. Set this property to the value of the LineageID from the last step. Once this is complete, the Import Column transformation setup is done.
Through this, the FilePath column has been configured as the source path to the files for the transformation to consume. The FileContents column will then be used to store the contents of the file in the FilePath.
If we run the package now and look at the data through a Data Viewer, it should look similar to the image below:
WRAP-UP
There are a couple of things that are worth noting in this package. These are the capabilities of the Import Column transformation and the Script Component transformation.
The Import Column transformation can be used for more than just the above. Consider a situation where you have large images stored in a folder that can be previewed before downloading. You could create some thumbnail images for each and store those in the SQL Server database to reduce bandwidth. Or if you need to migrate some files from a system that stores files outside the database to a new system that stores them within the database.
Maybe it isn’t even a migration or new functionality. Consider an admin that has to manually load a few dozen files to a database a day through a web application. How much time and energy would it save if you automated that process with an SSIS package. You might get a cup of coffee out of the deal.
Now, I must really sound like a broken record after a month of mentioning the Script Component transformation in various posts. But this transformation can really bring a lot of flexibility into your packages. Unlike the other packages, in this package the transformation was used as a data source. If you consider non-traditional data sources, like directories, this open up a lot more ways to get to the data that your need and the way you want to get it.
What are your thoughts? How would you use the Import Column transformation? What data sources will you be looking at to get at with the Script Component transformation?
No comments:
Post a Comment