Monday, September 30, 2013

31 DAYS OF SSIS – ONE PACKAGE, UNLIMITED DATABASES (24/31)

Yesterday, I jumped back to looking at another use for theScript Component transformation.  Today, I want to make another jump and look at another use for Expressions and ForEach loops.  In this case, we will look at a package where one package will be used to make similar changes to multiple databases.

PACKAGE REQUIREMENTS

A number of years ago, I was working with a single tenant database application.  That database had a state/province table that was “kinda” similar in each database.  This “kinda”-bit had nothing to do with any specific business requirements.  It was this way because nobody ever got around to updating all of the databases whenever a change was made in one database or another.  As is probably obvious, new states don’t show up too often but the state of this data was terrible.
Management of the data in these tables fell down to me, as the DBA.  When they were out of synch it was my problem so I needed to do something to resolve the situation.  What I wanted was a method that I could manage the table from one location and have each of the databases updated in turn.  I also needed a solution that was light-weight and low tech.  The people that would manage the solution after it was pushed out were savvy with Access and a bit afraid of SQL Server.

ONE PACKAGE SOLUTION

To solve this issue, I opted for an SSIS solution.  SSIS has capabilities to bring the data across to the databases and determine what needs to be inserted, updated, or deleted.  Alternatively, I could have just used change management and DML scripts to manage the data.  The downside to that was that, at that point, that process was what caused the problem.  Also, I could have looked at using replication.  I opted against that since I wanted to keep the implementation low tech.  Replication does require a bit of know-how to keep it going.
Of course, there should be a package per database.  That would be impossible to manage and put the solution back with the existing process.  There also shouldn’t need to be one execution of the package per database.  This needed to be a process that would just run and work itself out with one or one hundred databases.
To accomplish these needs, the package will utilize a ADO record set that contains a list of all of the databases that need to be managed.  That record set will be used to iterate a loop that will make all the necessary changes to that databases that subscribe for the updates.
With the solution outline, we’ll begin to go through the package to detail how this is done and hopefully help you learn a couple new things today.

PACKAGE CONTROL FLOW

There are a few things that will be occurring in the Control Flow to take note of in this package.  First, the we need a list of all of the servers and databases that contain the tables that we want to update.  With that information the package will loop through each of the server and database combinations.  Finally, for each iteration of the loop, the package will insert, update, and delete data as is required to keep the data in the client database tables identical to the data in the central table.
imageThis logic will be implemented with the following Control Flow components:
  • SQL_DestinationDatabases: This is a T-SQL that fetches a list of servers and databases that need the information.  For the example, the information is just in a couple SELECT statements.  In a live environment, this would be tied into a table that lists that servers and databases.  The server and database names are populated to variables.
  • FELP_Databases: ForEach Loop container that iterates across each occurrence of server and database name.
  • SQL_DeleteRows: Demo only task to delete and update data to make all of the data flow paths do something.
  • DFT_InsertUpdateStateProvince: Data Flow task to determine whether rows should be inserted or updated.
  • DFT_DeleteStateProvince: Data Flow task to determine whether rows should be deleted.

PACKAGE VARIABLES

There are a number of variables in the package to help control how everything is processed.
image
The variables from above were added for the following purposes:
  • DatabaseName: Name of the database that needs to receive central data
  • RSDatabases: Record set that contains the list of servers and databases that need to receive central data.
  • ServerName: Name of the server that contains a database that needs to receive central data

CONNECTION MANAGER

Usually I don’t highlight the connections, but in this case it is necessary.  The SSIS package contains two connections:
  • AdventureWorks: The database with all of the central data.  Not truly a central database but it is playing the part for this demo.
  • SNAGHTML107c570bSubscriber: The database receiving the data from the central database.  The Connection String for this connection is controlled by an expression that is tied to the DatabaseName and ServerName variables.  The settings for these variables can be seen to the right.  By doing this, you are able to change the connection of your SSIS package not just at the beginning of run-time, BUT also in the midst of execution.  Connections to databases do not have to remain constant throughout execution.

PACKAGE DATA FLOWS

With the connection information configured, it’s time to look at the data flows of the SSIS package.  There are two data flows in this SSIS package

DFT_INSERTUPDATESTATEPROVINCE

imageThis first Data Flow task will control the INSERT and UPDATE operations on the table that is being managed through this package.  To accomplish these operations the following transformations are used:
  • DS_MSS_AdventureWorks: OLE DB Source to retrieve central data for distribution.
  • LKUP_Subscriber: Lookup transformation to determine if the record exists in the subscriber database.
  • DD_MSS_StateProvince: OLE DB Destination in the subscriber database that receives the data.  This is just a straight insert.
  • OLECMD_UpdateStateProvince: OLE DB Command to perform updates for each row that currently exists in that database.  This isn’t ideal, but I’m saving a more ideal solution for determining deltas for a future post.

DFT_DELETESTATEPROVINCE

imageThe second Data Flow task controls the DELETE operations on the table.  Since rows may be deleted on the central table, they should also be deleted on the subscribers.  To accomplish these operations the following transformations are used:
  • DS_MSS_Subscriber: OLE DB Source to retrieve subscriber data for validation against the central data.
  • LKUP_SourceTable: Lookup transformation to determine if the subscriber record still exists in the central database.
  • OLECMD_DeleteStateProvince: OLE DB Command to perform deletes for each row that no longer exists in the central database.
  • RCNT_Matched: Row count of rows that match the central database.  This is just for informational and troubleshooting needs.

LAST TIDBITS

Before you can run the package, you will need to create a couple of dummy databases.  The script for these is below:
01CREATE DATABASE [Client_Db1]
02GO
03 
04USE [Client_Db1]
05GO
06 
07CREATE TABLE [dbo].[StateProvince](
08[StateProvinceID] [intNULL,
09[StateProvinceCode] [nchar](3) NOT NULL,
10[CountryRegionCode] [nvarchar](3) NOT NULL,
11[IsOnlyStateProvinceFlag] [bitNOT NULL,
12[Name] [nvarchar](50) NOT NULL,
13[TerritoryID] [intNOT NULL,
14[rowguid] [uniqueidentifier] NOT NULL,
15[ModifiedDate] [datetime] NOT NULL
16ON [PRIMARY]
17GO
18 
19CREATE DATABASE [Client_Db2]
20GO
21 
22USE [Client_Db2]
23GO
24 
25CREATE TABLE [dbo].[StateProvince](
26[StateProvinceID] [intNULL,
27[StateProvinceCode] [nchar](3) NOT NULL,
28[CountryRegionCode] [nvarchar](3) NOT NULL,
29[IsOnlyStateProvinceFlag] [bitNOT NULL,
30[Name] [nvarchar](50) NOT NULL,
31[TerritoryID] [intNOT NULL,
32[rowguid] [uniqueidentifier] NOT NULL,
33[ModifiedDate] [datetime] NOT NULL
34ON [PRIMARY]
35GO

PACKAGE RESULTS

Go ahead and execute the packages a couple times.  The first time the results will be kind of boring since everything will be an insert.  The second execution, though, there will be some rows that flow into the update and delete paths.  This is due to the demo only T-SQL statement to mimic changes in the database.
The results for the second execution should look like the image below.  Of course, I’ve cropped both data flows over each other into a single image.
image

ONE PACKAGE WRAP-UP

One of the things that I really like about SSIS is it’s continued flexibility and ease of use.  With just a record set, a couple of variables and a loop, an SSIS package can change from updating one database to updating as many as you need.  Hopefully this provides some inspiration to your own processes where you can consolidate packages and processes.
I should mention that while I use the OLE DB Command in this package it isn’t necessarily overt endorsement of it’s use.  Considerations for using it are many and enough for another blog post.  For today, I’ll just say that you need to be careful of the fact that this transformation will execute one T-SQL statement per row that it processes in the data flow path.

PACKAGE DOWNLOAD

No comments:

Post a Comment