Thursday, December 12, 2013

5 Tips for Developing SQL Server Integration Services Packages

http://sqlmag.com/sql-server-integration-services/5-tips-developing-sql-server-integration-services-packages

SQL Server Business Intelligence Development Studio (BIDS) provides a powerful environment for developing SQL Server Integration Services (SSIS) packages. Once you get the hang of how to implement tasks in your control flow and configure your data flow, most efforts are fairly straightforward. Even so, you can sometimes run into situations that slow down development or stop progress altogether. Often these situations are easy to resolve if you know how to proceed. Here are five tips to help you avoid a few frustrations that might arise when developing SSIS packages so that you can keep your development efforts moving forward.

Tip #1: Control Design Time Validation

When you open an SSIS package in BIDS, SSIS Designer tries to connect to the defined data sources and verify the metadata associated with those data sources. If the data sources are unavailable, the database objects are locked, or the connect times are slow, you might have to wait an inordinate amount of time for the validation to complete or to learn that the validation failed. This can be particularly annoying if you have to reopen your packages often.
You can also run into validation problems if you try to add a component that relies on a database object not yet created. For example, suppose a control flow includes an Execute SQL task that creates a table. If you add a Data Flow task that populates the table with data, you'll receive a validation warning because SSIS can't validate the table -- it doesn't yet exist. And if you try to run the package, you'll receive a validation error.
Fortunately, SSIS lets you override the default behavior by allowing you to work offline or to configure properties that affect validation. To work offline, you simply enable the Work Offline option on the SSIS menu. You can access this option while you're working on the package or before you open it. (To do the latter, a different BIDS solution must be open so that the SSIS menu is displayed.) When you're ready for your package to be back online, you just disable the Work Offline option, and SSIS Designer returns to its default behavior.
The second method you can use to override the default validation behavior is to set one of the two validation-related properties in your package. First, you can set the DelayValidation property to True for a specific task. For instance, Figure 1 shows the property set to True for a Data Flow task. After you've configured the property, you can work with the task and run your package without receiving those validation warnings or errors.
Setting the DelayValidation property on a Data Flow task
Figure 1: Setting the DelayValidation Property on a Data Flow Task

Note, however, that the DelayValidation property is available only in executables (e.g., Control Flow tasks, the package itself). The property isn't available in data flow components. If you want to change the behavior of a specific data flow component, you must set its ValidateExternalMetadata property to False. Figure 2 shows the property set to False for an OLE DB Source component in the data flow.
Setting the ValidateExternalMetadata property on an OLE DB Source component
Figure 2: Setting the ValidateExternalMetadata Property on an OLE DB Source Component

Setting the ValidateExternalMetadata property to False provides more granular control over your data flow than setting the DelayValidation property to True. However, when the ValidateExternalMetadata property is set to True (the default), it helps avoid locking issues when your package uses transactions. For this reason, you might want to stick with using DelayValidation at the task level, unless you have some overriding need to work at the component level.

Tip #2: Create a Package Template

For many SSIS developers, it's not uncommon to create multiple packages that contain many of the same components. Your packages might share variables, connection managers, event handlers, property settings, or tasks. Rather than implementing these same components each time you create a package, you can create a package template that includes all these components and base each new package on the template.
Although creating a template in BIDS is relatively simple, it's not particularly intuitive how you go about doing it. Basically, you create a package in SSIS Designer as you would any other package, although you should give it a name that makes it easily recognizable as a template, such as SsisPkgTemplate.
After you've added the components that you want to include in the template, save and close the package. Then go to the Windows Explorer folder where you store your SSIS project files and copy the template package you just created to the BIDS template folder. The location of the template folder can vary depending on the OS, whether you're working in a 32-bit or 64-bit environment, and whether you've installed BIDS or Visual Studio in its default location. On my system, I'm running 64-bit versions of Windows 7 and SQL Server 2008 R2, both installed on the C drive, with BIDS installed in its default location. As a result, the location of my template folder is C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ProjectItems\DataTransformationProject\DataTransformationItems.
After you copy your template package to the BIDS template folder, you can start using the template to create packages. In Solution Explorer, right-click the project name, point to Add, and click New Item. When the Add New Item dialog box appears, select the name of the template you just created, provide a name for the new package, and click Add.
After you create the new package, you need to generate a unique ID for it. To do so, locate the package's ID property, click the down arrow in the property value text box, then click Generate New ID. At this point, the new package will already include the components in the template. If needed, you can add more components.

Tip #3: Implement Package Configurations

When an SSIS package is moved to a new environment, the data source connections in the package often need to be updated. For example, if you develop an SSIS package using a test database as the data source, you need to update the data source connection so that it points to the production database when you deploy the package.
One way of performing the necessary updates is to manually open the package, modify the applicable property values (such as the SQL Server instance), and redeploy the package. Another method is to use package configurations that automatically supply those values to the package. Although the latter is the preferable approach, it's often not implemented.
Package configurations are easy to set up and modify, which makes supporting changing property values a lot simpler than modifying packages directly. A package configuration is a stored property value that can be updated outside of the package. If the package has been configured to reference package configurations, the package looks for the configurations at runtime, then applies the defined values to the applicable properties. You can modify the package configurations as often as you like, without having to open the package, and the package will use the most current values that have been defined.
Package configurations are useful not only for storing data source information but also for any situation in which property values might change. For instance, you might want to change the value of a variable that's used in an expression in a Script task or Execute SQL task.
To use package configurations, you must enable them on each package for which you want to implement them. With the package open, select the Package Configurations option from the SSIS menu. This launches the Package Configurations Organizer shown in Figure 3. To enable package configurations, select the Enable package configurationsoption.
Enabling package configurations in the Package Configurations Organizer
Figure 3: Enabling Package Configurations in the Package Configurations Organizer

Next, you need to add the package configurations. To do so, click the Add button to launch the Package Configuration Wizard. When the wizard appears, click Next to skip the Welcome page and go to the Select Configuration Type page shown in Figure 4.
Selecting the type of configuration in the Package Configuration Wizard
Figure 4: Selecting the Type of Configuration in the Package Configuration Wizard

On the Select Configuration Type page, select one of the following five types of package configurations from the Configuration type drop-down list:
  • XML configuration file. You can use an XML configuration file to store one or more property values for one or more packages. You can have the wizard generate a new XML configuration file for the current package's property values or add the package's property values to an existing XML configuration file. Either way, you need only select the properties you want to include. The wizard will then write the properties' current values to the XML configuration file.
  • Environment variable. You can use an environment variable to store the value of a package property. You can associate this type of configuration with only one property, so you need to create an environment variable for each property you want to update. (Unlike the XML configuration file, the environment variable isn't created for you by the wizard. You must create it yourself.)
  •  Registry entry. You can use a registry entry to store the value of a package property. The key must exist under the registry hive HKEY_CURRENT_USER. You can associate this type of configuration with only one property, so you must create a registry entry for each property you want to update. (The wizard doesn't create the registry entry for you. You must create it yourself.)
  • Parent package variable. Within a child package, you can use a variable that exists in the parent package to store the property value you want to pass to the child package. As with environment variables and registry entries, you must specify a variable for each property you want the parent package to update.
  • SQL Server. You can use a SQL Server table to store one or more property values for one or more packages. The table must include columns for the property value, the SSIS data type associated with that value, the path that points to the property, and a configuration filter that specifies which package configuration the row is associated with. The wizard can create the table for you, or you can create it before running the wizard. In either case, you must specify the connection manager that the package should use to connect to the database where the table will be located. As with the XML configuration file, you must also select the properties you want to include. The wizard will then write the properties' current values to the table.
After you select the package configuration type, follow the steps in the Package Configuration Wizard to finish creating the package configuration. You can then update the package configuration as necessary in order to pass in the correct property value at runtime.
Note that you're not limited to one configuration type. You can combine types or use more than one type. If you plan to update only one or two properties at runtime, you might find that an environment variable or registry entry is the easiest way to go. For multiple properties, using an XML configuration file or SQL Server table might be a better solution.
You also want to take into account security issues when determining the best solution for your environment. For instance, an XML configuration file might not provide the security you need. A SQL Server database might provide a more secure environment than a file share that's out of your control. However, you'd have to make sure that the SQL Server instance would always be available when needed.

Tip #4: Remove Leading and Trailing Spaces

When developing the data flow of an SSIS package, you might be unaware that the source data is padded with spaces until you try to run your package and problems appear. For instance, suppose your package retrieves string values from a comma-separated value (CSV) file, then uses a Lookup transformation to compare those values. You might not know that many of the string values have spaces before or after them until a lot of comparisons evaluate to False during runtime. Because the Lookup transformation is space-sensitive, "book" doesn't equal "book ", even though they appear to be the same value.
Extra spaces can also be a problem if you're trying to insert data into a column configured with a character data type and the data type's length is too small to accommodate the source value and its spaces. For instance, you can insert the value "one two three" into a VARCHAR(15) column but not the value "one two three ". As a result, when you're retrieving data that might include extra spaces, you might need to remove those spaces, no matter whether the data is coming from text files, columns configured with the CHAR data type, or another type of data source.
If you're retrieving data from a database, you can include the logic to remove spaces in your T-SQL code. For instance, the Production.Document table in the AdventureWorks2008R2 database includes the Revision column, which is defined with the NCHAR(5) data type. To remove the extra spaces, you can use a T-SQL statement similar to the following one in your OLE DB Source component:
SELECT Title, FileName, RTRIM(Revision) AS Revision
FROM Production.Document
This statement uses the RTRIM function to remove any trailing spaces from the Revision values. If you want to remove leading spaces as well, you can modify the statement to include the LTRIM function:
SELECT Title, FileName, LTRIM(RTRIM(Revision)) AS Revision
FROM Production.Document
Now when the Revision values are passed down the data paths, they won't include extra spaces.
You can take this approach only if your source is a database. If it's a text file or some other source that you have less control over, you can add a Derived Column transformation to the data flow and trim the extra spaces there. Figure 5 shows how to trim spaces from the Revision values in the Derived Column Transformation Editor. As the figure shows, you can create a simple expression that uses the TRIM function to trim both leading and trailing spaces. The rest of the data flow can then use the derived column in place of the original Revision column in order to work with data that contains no leading or trailing spaces.
Creating a derived column to trim spaces
Figure 5: Creating a Derived Column to Trim Spaces

Tip #5: Run Packages with the 32-Bit Runtime

If you're working with a 64-bit version of SSIS and BIDS, you might come across limitations in some of the features available in the development environment. For example, suppose you set a breakpoint within the C# code of a Script task. If you run the package in debugging mode, the package will pause at all the breakpoints you set outside of the Script task, but not at the breakpoint within the Script task. An examination of the Progress tab in SSIS Designer will reveal a warning message that indicates Script tasks can't be debugged while running under the 64-bit version of the SSIS runtime.
The problem is that some SSIS features don't work properly under the 64-bit version of the SSIS runtime. If you want to use those features, you must configure the project to load the 32-bit environment. To do so, right-click the package in Solution Explorer and click Properties to open the properties for that package. Then, in the Property Pages dialog box, navigate to the Debugging page and set the Run64BitRuntime property to False, as Figure 6 shows.
Setting the Run64BitRuntime property to False
Figure 6: Setting the Run64BitRuntime Property to False

By setting the Run64BitRuntime property to False, you're specifying that packages in the project should start in the 32-bit runtime rather than the 64-bit runtime. Note, however, this applies only to the BIDS environment. To run a package as 32-bit outside of BIDS, you have to take specific steps when you execute the package. For example, if you create a SQL Server Agent job to run the package, you should specify that the 32-bit runtime be used.

Keep Your Development Efforts Moving Forward

In addition to the five tips mentioned here, there are many other considerations to take into account when developing an SSIS package. Even so, knowing how to open packages without validation, create templates, use package configurations, handle leading and trailing spaces, and run your packages with the 32-bit runtime can, in the long run, save you time developing and troubleshooting your packages. That's not to say each of these tips carries the same weight or that they all apply to your circumstances, but at least some of them should prove useful in your development efforts. For any of the tips that do apply, be sure to check out SQL Server Books Online (BOL) for more information.

No comments:

Post a Comment