Monday, September 30, 2013

31 DAYS OF SSIS – SSIS NAMING CONVENTIONS (31/31)

We’re at the thirty-first, and last, post for the 31 Days of SSIS. The series started with an introductory post. This series has been a collection of the things that I’ve picked up over the years in my work with SSIS. I figured I’d end this series with a best practice that I’ve not often been a fan of and am now fully committed to.

The best practice I am talking about is the use of naming conventions for tasks and transformation in SSIS packages. Through this practice, the names of tasks and transformations will always start with an acronym or abbreviation followed by a meaningful name for the item. In the place of spaces, underscores are used in the names.

For a list of the naming conventions that I typically use, I will point you to Jamie Thompson’s (Blog |@JamieT) post SSIS: Suggested Best Practices and naming conventions. About half way down the post, are the naming conventions. The rest of the post is worth reading as well, if you haven’t.

PRACTICE IMPACT

As I’ve worked with SSIS packages, I’ve found a few areas where this provides benefits. Whether developing new packages or maintaining and troubleshooting existing SSIS package, you will find an appreciation for this best practice.

Development Impact: From a development perspective there is marginal immediate benefit for using this best practice. The only impact is the naming of the task or transformation to specific convention. If you don’t typically name the objects in SSIS packages, start naming them and make it meaningful.
Maintenance Impact: When maintaining an SSIS package, the use of a naming convention will help two-fold. First, the prefix will provide an instant indication of the context of the type of work being done. Second, the meaningful name portion of the name provides context for the work itself. When adding in new functionality, this recognition will help reduce the time it takes to understand yours or someone else’s code that may have been written weeks or months ago.
Troubleshooting Impact: As I mentioned in the last bullet the naming convention provides an immediate level of understanding of the type of task and the context. When troubleshooting, not having to crack open code and investigate configuration settings for every object is critical to solving issues efficiently. True, you may still end up examining everything. With the use of a naming convention, when you are looking at Billy’s or Sally’s SSIS packages, you can get a leg up on troubleshooting by having context when they are not around.

CONVINCING ME, CONVINCING YOU

After reading through the impacts, are you convinced? Maybe not, and until a few months ago neither was I. To conclude this best practice, I want to go through a couple of my experiences with naming conventions. This should help convince you that this best practice is worth the effort, as these experiences did for me as well.

LOOKING AT MY PACKAGE

I wrote an SSIS package a while back. This was a fairly simple package. I had a Data Flow task that was named similar to “Insert Rows to SalesOrder”. A OLE DB destination within the data flow was named similar to “Insert Rows”.

In the context of the developing the SSIS package, these names seemed perfectly reasonable. The package was tested, signed off on, and deployed. Of course, these actions don’t always guarantee success when it hits production.

After a few weeks the package did what packages do when you forget about them. The package failed and I had to take a look at the log files. Opening up and digging into the SSIS package revealed that the error occurred on “Insert Rows to SalesOrder Insert Rows”.

Since I was no longer working on this package, I had to think some and hunt some to figure out where the error occurred. Was the error in the task “Insert Rows” or “Insert Rows to SalesOrder”? Was the transformation “SalesOrder Insert Rows” or “Insert Rows”? These names may seem simple to figure out when developing the package, but after some time you may forget how you designed the package.

The names of objects in your SSIS packages should help you to immediately identify where to look and start with your troubleshooting. Would you be able to better remember what you were doing if the data flow task was named “DFT_InsertRowsToSalesOrder” and the OLE DB destination was named “OLE_DST_MSS_InsertRows? Then the error message would say the error occurred at DFT_InsertRowsToSalesOrder OLE_DST_MSS_InsertRows.

LOOKING AT YOUR PACKAGE

You probably won’t always be working with your own SSIS packages. Sometimes you will have to troubleshoot or maintain a package that someone else developed. In these cases, using the naming convention best practice will be worth it’s weight in gold. Probably more since SSIS package aren’t that heavy.

A while back I was monitoring some SSIS packages that had been deployed to production. Most of these packages had been written by others. Since I was on the administration side of the world, it was under my duties to troubleshoot packages when they failed.

One day, a package did fail. Actually, a lot of packages failed. Some schema changes had occurred and they were causing havoc in the SSIS packages. Due to these failures, I needed to investigate the packages and determine what needed to be done to resolve the issues.

Opening one of the first log files revealed that there was an error at “Data Flow Task 1 1 Derived Column 1 2”. Where, oh where did this error go wrong? Of course, the log file does provide the reason for the error. But it doesn’t do much to tell me the context for the error. In fact, based on these names, I already expect a mess inside the package. The names provided weren’t meaningful, and the purpose could only be discerned by reverse engineering what the component does.

Had the error occurred at “DFT_ProcessHumans DC_AddIsZombieFlag” then I would have been better equipped to troubleshoot the package. I also, probably wouldn’t be calling you at 2 AM in the morning so you can tell me what this package is supposed to be doing.

NAMING CONVENTION WRAP-UP

Hopefully, you’ve taken some time to consider this best practice. I’m a convert for exactly the reasons I’ve stated above. You should buy into this as well. It’s not just about having meaningful names. It is also about leaving behind SSIS packages that others can troubleshoot and maintain.

31 DAYS OF SSIS – IMPORTING EXECUTION PLANS (30/31)

FILE IMPORT REQUIREMENTS

To demonstrate the Import Column, I am going to expand on the scenario that I used previously. In the previous post, I needed to export execution plans for later analysis. The execution plans exported in to files and sent over to me. Now let’s put those plans back in a database so that they can be investigated with XQuery using some of the concepts I discuss in the plan cache series.

FILE EXPORT SOLUTION

To accomplish this we need an SSIS package that can look at a directory and import all of the files in that folder into a table in a database. We’ll use the Import Column transformation to populate the information to the table. The trouble is, there isn’t a data source that returns directory contents. For this need, we’ll use a Script Component transformation and configure it as a data source.

With those transformations, we should be able to load the contents of the folder. Then use that information to populate a table with the contents of all of the files that are discovered.

PACKAGE CONTROL FLOW

In this package, there are two tasks in the Control Flow. These tasks provide the table for the data and the data flow to load the data.

The components are:

SQL_CreateTable: Execute SQL task that creates a table in tempdb. This table will store the files and their contents that are imported.
DFT_ExportPlans: Data Flow task that imports the contents of files in the the table created in the previous task.

PACKAGE VARIABLES

To allow the most flexibility for rewriting this package to your needs, the package has been setup to use a couple package variables.

The variable from above are:

FileExtension: Extension of the files that the SSIS package will be importing. Since we only want to import execution plans, the package will filter based on this extension.
WorkFolder: Folder in which files are stored that are to be imported.

PACKAGE DATA FLOW

We only have a single data flow. This will list the files from a folder. The contents of these files will be imported into the table created in the Control Flow.

DFT_EXPORTPLANS

The Data Flow task here will be used to import the files from the directory stored in the variable above. These files will be read into the data flow and the contents will be pulled in as well. At completion, all of the data will be inserted into a table.

This is done with the following transformations:

DS_SCR_SQLPlan: Script Component transformation that reads the contents of a directory and returns a data flow output with those files.
IC_SQLPlan: Import Columns transformation that will export the contents of the file names in the data flow path into a new column.
DD_MSS_SQLPlans: OLE DB Destination for the data retrieved by the script component and input column transformations.

DS_SCR_SQLPLAN CONFIGURATION

Previous posts haven’t covered using this transformation in this manner. Since this hasn’t been done before, let’s take a look a bit at how to configure this in your SSIS package.

To start, theScript tab will need to be configured. The transformation will need to have the variables FileExtension and WorkFolder added asReadOnlyVariables. This is shown in the image above.

Next, configure the Inputs and Outputs tab. On this tab, add a new column to the Output Columns folder. Name this column FilePath. The data type for the column should be DT_STR with a width of 512.

With these configured, open the Script window for the transformation. Within the script window replace the contents of ScriptMain : UserComponent with the code below. Also, add a reference toSystem.IO at the beginning of the code block.

01public override void CreateNewOutputRows()

02{

03string path = Variables.WorkFolder.ToString();

04string extension = Variables.FileExtension.ToString();

05 

06DirectoryInfo di = new DirectoryInfo(path);

07 

08//[] =  di.GetFileSystemInfos(filter);

09 

10foreach (FileSystemInfo fsi in di.GetFileSystemInfos())

11{

12if (fsi.Extension == "." + extension)

13{

14FileOutputBuffer.AddRow();

15FileOutputBuffer.FilePath = fsi.FullName;

16}

17}

18}

This code will use the WorkFolder variable to create a DirectoryInfo object. The DirectoryInfo GetFileSystemInfos() method is then called to return a collection populated with all of the files in that WorkFolder. While looping through all of the files, all of those that match the file extension in the FileExtension variable will be added to the data source output.

To add rows to the output of the Script Component, the AddRow() method of the output buffer is called. After the row is created, the values for the columns in the row are assigned. In this case, there is only a single column so just set the column FilePath to the FullName of the FileSystemInfo object.

If we run the package now and look at the data through a Data Viewer, it should look similar to the image below:

IC_SQLPLAN CONFIGURATION

With the Script Component transformation configured, we are left with configuring the Import Column transformation. Configuring this will be similar in a sense to configuring the Pivot transformation.

Start by opening the the editor for the transformation. This will open Advanced Editor for the transformation. Next, select the Input Columns tab in the editor. In this tab, you will need to add the FilePath column as an Input Column. This column will be used later as the source for where the file that needs to be imported is located.

Once this is complete, browse to the Input and Output Properties tab. There are a couple items that need to be configured on this tab. To start with, add a column to the Import Column Output. For this example, the column is named FileContents, but it can have any name. Then change the DataType for the column to DT_NTEXT. Depending on the type of data you need to import, this could be DT_TEXT or DT_IMAGE. After completing this, make note of the LineageID, that will be needed in the next step.

Finally, to complete the transformation setup select the properties for the FilePath column in the Import Column Input. This column will have a property name FileDataColumnID. Set this property to the value of the LineageID from the last step. Once this is complete, the Import Column transformation setup is done.

Through this, the FilePath column has been configured as the source path to the files for the transformation to consume. The FileContents column will then be used to store the contents of the file in the FilePath.

If we run the package now and look at the data through a Data Viewer, it should look similar to the image below:

WRAP-UP

There are a couple of things that are worth noting in this package. These are the capabilities of the Import Column transformation and the Script Component transformation.

The Import Column transformation can be used for more than just the above. Consider a situation where you have large images stored in a folder that can be previewed before downloading. You could create some thumbnail images for each and store those in the SQL Server database to reduce bandwidth. Or if you need to migrate some files from a system that stores files outside the database to a new system that stores them within the database.

Maybe it isn’t even a migration or new functionality. Consider an admin that has to manually load a few dozen files to a database a day through a web application. How much time and energy would it save if you automated that process with an SSIS package. You might get a cup of coffee out of the deal.

Now, I must really sound like a broken record after a month of mentioning the Script Component transformation in various posts. But this transformation can really bring a lot of flexibility into your packages. Unlike the other packages, in this package the transformation was used as a data source. If you consider non-traditional data sources, like directories, this open up a lot more ways to get to the data that your need and the way you want to get it.

What are your thoughts? How would you use the Import Column transformation? What data sources will you be looking at to get at with the Script Component transformation?

PACKAGE DOWNLOAD

30of31-ImportPlans.dtsx