Tuesday, December 31, 2013

SSIS Interview Questions - Part 2

http://www.mssqltips.com/sqlservertip/2487/ssis-interview-questions-on-transactions-event-handling-and-validation/

Problem
When preparing for your next SSIS interview be sure to understand what could be questions asked in interview. In this tip series, I will try to cover as much as I can to help you prepare better for SSIS interview. This tip covers transactions, event handling and validations SQL Server Integration Services interview questions.
Solution

What is the transaction support feature in SSIS?

  • When you execute a package, every task of the package executes in its own transaction. What if you want to execute two or more tasks in a single transaction? This is where the transaction support feature helps. You can group all your logically related tasks in single group.  Next you can set the transaction property appropriately to enable a transaction so that all the tasks of the package run in a single transaction. This way you can ensure either all of the tasks complete successfully or if any of them fails, the transaction gets roll-backed too.

What properties do you need to configure in order to use the transaction feature in SSIS?

  • Suppose you want to execute 5 tasks in a single transaction, in this case you can place all 5 tasks in a Sequence Container and set the TransactionOption and IsolationLevel properties appropriately.
    • The TransactionOption property expects one of these three values:
      • Supported - The container/task does not create a separate transaction, but if the parent object has already initiated a transaction then participate in it
      • Required - The container/task creates a new transaction irrespective of any transaction initiated by the parent object
      • NotSupported - The container/task neither creates a transaction nor participates in any transaction initiated by the parent object
  • Isolation level dictates how two more transaction maintains consistency and concurrency when they are running in parallel. To learn more about Transaction and Isolation Level, refer to this tip.

When I enabled transactions in an SSIS package, it failed with this exception: "The Transaction Manager is not available. The DTC transaction failed to start."  What caused this exception and how can it be fixed?

  • SSIS uses the MS DTC (Microsoft Distributed Transaction Coordinator) Windows Service for transaction support.  As such, you need to ensure this service is running on the machine where you are actually executing the SSIS packages or the package execution will fail with the exception message as indicated in this question.

What is event handling in SSIS?

  • Like many other programming languages, SSIS and its components raise different events during the execution of the code. You can write an even handler to capture the event and handle it in a few different ways. For example consider you have a data flow task and before execution of this data flow task you want to make some environmental changes such as creating a table to write data into, deleting/truncating a table you want to write, etc. Along the same lines, after execution of the data flow task you want to cleanup some staging tables. In this circumstance you can write an event handler for the OnPreExcute event of the data flow task which gets executed before the actual execution of the data flow. Similar to that you can also write an event handler for OnPostExecute event of the data flow task which gets executed after the execution of the actual data flow task.  Please note, not all the tasks raise the same events as others. There might be some specific events related to a specific task that you can use with one object and not with others.

How do you write an event handler?

  • First, open your SSIS package in Business Intelligence Development Studio (BIDS) and click on the Event Handlers tab.  Next, select the executable/task from the left side combo-box and then select the event you want to write the handler in the right side combo box. Finally, click on the hyperlink to create the event handler. So far you have only created the event handler, you have not specified any sort of action. For that simply drag the required task from the toolbox on the event handler designer surface and configure it appropriately. To learn more about event handling, click here.

What is the DisableEventHandlers property used for?

  • Consider you have a task or package with several event handlers, but for some reason you do not want event handlers to be called. One simple solution is to delete all of the event handlers, but that would not be viable if you want to use them in the future. This is where you can use the DisableEventHandlers property.  You can set this property to TRUE and all event handlers will be disabled. Please note with this property you simply disable the event handlers and you are not actually removing them.  This means you can set this value to FALSE and the event handlers will once again be executed.

What is SSIS validation?

  • SSIS validates the package and all of it's tasks to ensure it has been configured correctly.  With a given set of configurations and values, all the tasks and package will execute successfully. In other words, during the validation process, SSIS checks if the source and destination locations are accessible and the meta data about the source and destination tables are stored with the package are correct, so that the task will not fail if executed. The validation process reports warnings and errors depending on the validation failure detected. For example, if the source/destination tables/columns get changed/dropped it will show as error.  Whereas if you are accessing more columns than used to write to the destination object this will be flagged as a warning. To learn about validationclick here.

Define design time validation versus run time validation.

  • Design time validation is performed when you are opening your package in BIDS whereas run time validation is performed when you are actually executing the package.

Define early validation (package level validation) versus late validation (component level validation).

  • When a package is executed, the package goes through the validation process.  All of the components/tasks of package are validated before actually starting the package execution. This is called early validation or package level validation. During execution of a package, SSIS validates the component/task again before executing that particular component/task.  This is called late validation or component level validation.

What is DelayValidation and what is the significance?

  • As I said before, during early validation all of the components of the package are validated along with the package itself. If any of the component/task fails to validate, SSIS will not start the package execution. In most cases this is fine, but what if the second task is dependent on the first task?  For example, say you are creating a table in the first task and referring to the same table in the second task? When early validation starts, it will not be able to validate the second task as the dependent table has not been created yet.  Keep in mind that early validation is performed before the package execution starts. So what should we do in this case?  How can we ensure the package is executed successfully and the logically flow of the package is correct?  This is where you can use the DelayValidation property.  In the above scenario you should set the DelayValidation property of the second task to TRUE in which case early validation i.e. package level validation is skipped for that task and that task would only be validated during late validation i.e. component level validation. Please note using the DelayValidation property you can only skip early validation for that specific task, there is no way to skip late or component level validation.
Problem
When preparing for SSIS interview it is important to understand what questions could be asked in the interview. In this tip series (tip 1tip 2tip 3), I will try to cover common SSIS interview questions to help you prepare for a future SSIS interview. In this tip we will cover interview questions related to the SSIS architecture and internals.  Check it out.
Solution

What are the different components in the SSIS architecture?

  • The SSIS architecture comprises of four main components:
    • The SSIS runtime engine manages the workflow of the package
    • The data flow pipeline engine manages the flow of data from source to destination and in-memory transformations
    • The SSIS object model is used for programmatically creating, managing and monitoring SSIS packages
    • The SSIS windows service allows managing and monitoring packages
  • To learn more about the architecture click here.

How is SSIS runtime engine different from the SSIS dataflow pipeline engine?

  • The SSIS Runtime Engine manages the workflow of the packages during runtime, which means its role is to execute the tasks in a defined sequence.  As you know, you can define the sequence using precedence constraints. This engine is also responsible for providing support for event logging, breakpoints in the BIDS designer, package configuration, transactions and connections. The SSIS Runtime engine has been designed to support concurrent/parallel execution of tasks in the package.
  • The Dataflow Pipeline Engine is responsible for executing the data flow tasks of the package. It creates a dataflow pipeline by allocating in-memory structure for storing data in-transit. This means, the engine pulls data from source, stores it in memory, executes the required transformation in the data stored in memory and finally loads the data to the destination. Like the SSIS runtime engine, the Dataflow pipeline has been designed to do its work in parallel by creating multiple threads and enabling them to run multiple execution trees/units in parallel.

How is a synchronous (non-blocking) transformation different from an asynchronous (blocking) transformation in SQL Server Integration Services?

  • A transformation changes the data in the required format before loading it to the destination or passing the data down the path. The transformation can be categorized in Synchronous and Asynchronous transformation.
  • A transformation is called synchronous when it processes each incoming row (modify the data in required format in place only so that the layout of the result-set remains same) and passes them down the hierarchy/path. It means, output rows are synchronous with the input rows (1:1 relationship between input and output rows) and hence it uses the same allocated buffer set/memory and does not require additional memory. Please note, these kinds of transformations have lower memory requirements as they work on a row-by-row basis (and hence run quite faster) and do not block the data flow in the pipeline. Some of the examples are : Lookup, Derived Columns, Data Conversion, Copy column, Multicast, Row count transformations, etc.
  • A transformation is called Asynchronous when it requires all incoming rows to be stored locally in the memory before it can start producing output rows. For example, with an Aggregate Transformation, it requires all the rows to be loaded and stored in memory before it can aggregate and produce the output rows. This way you can see input rows are not in sync with output rows and more memory is required to store the whole set of data (no memory reuse) for both the data input and output. These kind of transformations have higher memory requirements (and there are high chances of buffer spooling to disk if insufficient memory is available) and generally runs slower. The asynchronous transformations are also called "blocking transformations" because of its nature of blocking the output rows unless all input rows are read into memory. To learn more about it click here.

What is the difference between a partially blocking transformation versus a fully blocking transformation in SQL Server Integration Services?

  • Asynchronous transformations, as discussed in last question, can be further divided in two categories depending on their blocking behavior:
    • Partially Blocking Transformations do not block the output until a full read of the inputs occur.  However, they require new buffers/memory to be allocated to store the newly created result-set because the output from these kind of transformations differs from the input set. For example, Merge Join transformation joins two sorted inputs and produces a merged output. In this case if you notice, the data flow pipeline engine creates two input sets of memory, but the merged output from the transformation requires another set of output buffers as structure of the output rows which are different from the input rows. It means the memory requirement for this type of transformations is higher than synchronous transformations where the transformation is completed in place.
    • Full Blocking Transformations, apart from requiring an additional set of output buffers, also blocks the output completely unless the whole input set is read. For example, the Sort Transformation requires all input rows to be available before it can start sorting and pass down the rows to the output path. These kind of transformations are most expensive and should be used only as needed. For example, if you can get sorted data from the source system, use that logic instead of using a Sort transformation to sort the data in transit/memory. To learn more about it click here.

What is an SSIS execution tree and how can I analyze the execution trees of a data flow task?

  • The work to be done in the data flow task is divided into multiple chunks, which are called execution units, by the dataflow pipeline engine.  Each represents a group of transformations. The individual execution unit is called an execution tree, which can be executed by separate thread along with other execution trees in a parallel manner. The memory structure is also called a data buffer, which gets created by the data flow pipeline engine and has the scope of each individual execution tree. An execution tree normally starts at either the source or an asynchronous transformation and ends at the first asynchronous transformation or a destination. During execution of the execution tree, the source reads the data, then stores the data to a buffer, executes the transformation in the buffer and passes the buffer to the next execution tree in the path by passing the pointers to the buffers. To learn more about it click here.
  • To see how many execution trees are getting created and how many rows are getting stored in each buffer for a individual data flow task, you can enable logging of these events of data flow task: PipelineExecutionTrees, PipelineComponentTime, PipelineInitialization, BufferSizeTunning, etc. To learn more about events that can be logged click here.

How can an SSIS package be scheduled to execute at a defined time or at a defined interval per day?

  • You can configure a SQL Server Agent Job with a job step type of SQL Server Integration Services Package, the job invokes the dtexec command line utility internally to execute the package. You can run the job (and in turn the SSIS package) on demand or you can create a schedule for a one time need or on a reoccurring basis. Refer tothis tip to learn more about it.

What is an SSIS Proxy account and why would you create it?

  • When we try to execute an SSIS package from a SQL Server Agent Job it fails with the message "Non-SysAdmins have been denied permission to run DTS Execution job steps without a proxy account". This error message is generated if the account under which SQL Server Agent Service is running and the job owner is not a sysadmin on the instance or the job step is not set to run under a proxy account associated with the SSIS subsystem. Refer tothis tip to learn more about it.

How can you configure your SSIS package to run in 32-bit mode on 64-bit machine when using some data providers which are not available on the 64-bit platform?

  • In order to run an SSIS package in 32-bit mode the SSIS project property Run64BitRuntime needs to be set to "False".  The default configuration for this property is "True".  This configuration is an instruction to load the 32-bit runtime environment rather than 64-bit, and your packages will still run without any additional changes. The property can be found under SSIS Project Property Pages -> Configuration Properties -> Debugging.

SSIS Interview Questions - Part 1

http://www.mssqltips.com/sqlservertip/2485/ssis-interview-questions--part-1/

Problem
When preparing for a SQL Server interview, it is helpful to understand what questions may be asked related to SSIS.  In this tip series,  I will try to cover as much as I can to help you prepare.
Solution

What is SQL Server Integration Services (SSIS)?

  • SQL Server Integration Services (SSIS) is component of SQL Server 2005 and later versions. SSIS is an enterprise scale ETL (Extraction, Transformation and Load) tool which allows you to develop data integration and workflow solutions. Apart from data integration, SSIS can be used to define workflows to automate updating multi-dimensional cubes and automating maintenance tasks for SQL Server databases.

How does SSIS differ from DTS?

  • SSIS is a successor to DTS (Data Transformation Services) and has been completely re-written from scratch to overcome the limitations of DTS which was available in SQL Server 2000 and earlier versions. A significant improvement is the segregation of the control/work flow from the data flow and the ability to use a buffer/memory oriented architecture for data flows and transformations which improve performance.

What is the Control Flow?

  • When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components.  The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order.

What is the Data Flow Engine?

  • The Data Flow Engine, also called the SSIS pipeline engine, is responsible for managing the flow of data from the source to the destination and performing transformations (lookups, data cleansing etc.).  Data flow uses memory oriented architecture, called buffers, during the data flow and transformations which allows it to execute extremely fast. This means the SSIS pipeline engine pulls data from the source, stores it in buffers (in-memory), does the requested transformations in the buffers and writes to the destination. The benefit is that it provides the fastest transformation as it happens in memory and we don't need to stage the data for transformations in most cases.

What is a Transformation?

  •  A transformation simply means bringing in the data in a desired format. For example you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are  removed.  Anther example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.

What is a Task?

  • A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language.

What is a Precedence Constraint and what types of Precedence Constraint are there?

  • SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.
  • These are the types of precedence constraints and the condition could be either a constraint, an expression or both 
    • Success (next task will be executed only when the last task completed successfully) or
    • Failure (next task will be executed only when the last task failed) or
    • Complete (next task will be executed no matter the last task was completed or failed).

What is a container and how many types of containers are there?

  • A container is a logical grouping of tasks which allows you to manage the scope of the tasks together.
  • These are the types of containers in SSIS:
    • Sequence Container - Used for grouping logically related tasks together
    • For Loop Container - Used when you want to have repeating flow in package
    • For Each Loop Container - Used for enumerating each object in a collection; for example a record set or a list of files.
  • Apart from the above mentioned containers, there is one more container called the Task Host Container which is not visible from the IDE, but every task is contained in it (the default container for all the tasks).

What are variables and what is variable scope?

  • A variable is used to store values. There are basically two types of variables, System Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can use but cannot change and User Variable which you create, assign values and read as needed. A variable can hold a value of the data type you have chosen when you defined the variable.
  • Variables can have a different scope depending on where it was defined. For example you can have package level variables which are accessible to all the tasks in the package and there could also be container level variables which are accessible only to those tasks that are within the container.
Problem
When you are preparing for an SSIS interview you need to understand what questions could be asked in the interview. In this tip series, I will try to cover as much as I can to help you prepare for SSIS interview. In this tip we will cover SSIS basics and SSIS event logging.
Solution

What are SSIS Connection Managers?

What is the RetainSameConnection property and what is its impact?

  • Whenever a task uses a connection manager to connect to source or destination database, a connection is opened and closed with the execution of that task. Sometimes you might need to open a connection, execute multiple tasks and close it at the end of the execution. This is where RetainSameConnection property of the connection manager might help you. When you set this property to TRUE, the connection will be opened on first time it is used and remain open until execution of the package completes.

What are a source and destination adapters?

  • A source adaptor basically indicates a source in Data Flow to pull data from. The source adapter uses a connection manager to connect to a source and along with it you can also specify the query method and query to pull data from the source.
  • Similar to a source adaptor, the destination adapter indicates a destination in the Data Flow to write data to. Again like the source adapter, the destination adapter also uses a connection manager to connect to a target system and along with that you also specify the target table and writing mode, i.e. write one row at a time or do a bulk insert as well as several other properties.
  • Please note, the source and destination adapters can both use the same connection manager if you are reading and writing to the same database.

What is the Data Path and how is it different from a Precedence Constraint?

  • Data Path is used in a Data Flow task to connect to different components of a Data Flow and show transition of the data from one component to another. A data path contains the meta information of the data flowing through it, such as the columns, data type, size, etc. When we talk about differences between the data path and precedence constraint; the data path is used in the data flow, which shows the flow of data. Whereas the precedence constraint is used in control flow, which shows control flow or transition from one task to another task.

What is a Data Viewer utility and what it is used for?

  • The data viewer utility is used in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. The data viewer utility is placed on a data path to see what data is flowing through that specific data path during execution. The data viewer utility displays rows from a single buffer at a time, so you can click on the next or previous icons to go forward and backward to display data. Check out the Data Viewer enhancements in SQL Server 2012.

What is an SSIS breakpoint? How do you configure it? How do you disable or delete it?

  • A breakpoint allows you to pause the execution of the package in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. You can right click on the task in control flow, click on Edit Breakpoint menu and from the Set Breakpoint window, you specify when you want execution to be halted/paused. For example OnPreExecute, OnPostExecute, OnError events, etc. To toggle a breakpoint, delete all breakpoints and disable all breakpoints go to the Debug menu and click on the respective menu item. You can event specify different conditions to hit the breakpoint as well. To learn more about breakpoints, refer to Breakpoints in SQL Server 2005 Integration Services SSIS.

What is SSIS event logging?

  • Like any other modern programming language, SSIS also raises different events during package execution life cycle. You can enable or write these events to trace the execution of your SSIS package and its tasks. You can also can write your custom message as a custom log. You can enable event logging at the package level as well as at the tasks level. You can also choose any specific event of a task or a package to be logged. This is essential when you are troubleshooting your package and trying to understand a performance problem or root cause of a failure. Check out this tip about Custom Logging in SQL Server Integration Services SSIS.

What are the different SSIS log providers?

  • There are several places where you can log execution data generated by an SSIS event log:
    • SSIS log provider for Text files
    • SSIS log provider for Windows Event Log
    • SSIS log provider for XML files
    • SSIS log provider for SQL Profiler
    • SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version.

How do you enable SSIS event logging?

  • SSIS provides a granular level of control in deciding what to log and where to log. To enable event logging for an SSIS Package, right click in the control flow area of the package and click on Logging. In the Configure SSIS Logs window you will notice all the tasks of the package are listed on the left side of the tree view. You can specifically choose which tasks you want to enable logging. On the right side you will notice two tabs; on the Providers and Logs tab you specify where you want to write the logs, you can write it to one or more log providers together. On the Details tab you can specify what events do you want to log for the selected task.
  • Please note, enabling event logging is immensely helpful when you are troubleshooting a package, but also incurs additional overhead on SSIS in order to log the events and information. Hence you should only enabling event logging when needed and only choose events which you want to log. Avoid logging all the events unnecessarily.

What is the LoggingMode property?

  • SSIS packages and all of the associated tasks or components have a property called LoggingMode. This property accepts three possible values: Enabled - to enable logging of that component, Disabled - to disable logging of that component and UseParentSetting - to use parent's setting of that component to decide whether or not to log the data.

Thursday, December 19, 2013

An Introduction to SSIS Balanced Data Distributor Transformation Component

http://www.codeproject.com/Articles/247883/An-Introduction-to-SSIS-Balanced-Data-Distributor

Table of Contents

  1. Introduction
  2. How it Works
  3. Adding Balanced Data Distributor(BDD) in the SSIS toolbox
  4. Let us look at an example
  5. When to use the BDD
  6. References
  7. Conclusion

Introduction

Microsoft has released a new SSIS 2008 transform component called SSIS Balanced Data Distributor(BDD) on 6/7/2011. This component accepts a single input and almost evenly distributes the data stream across multiple destinations via multithreading. It can be downloaded from here.

How It Works

The documentation says:
BDD accepts a single input and evenly distributes the data stream across multiple destination via multithreading. The transform takes one pipeline buffer worth of rows at a time and moves it to the next output in a round robin fashion. It’s balanced and synchronous so if one of the downstream transforms or destinations is slower than the others, the rest of the pipeline will stall so this transform works best if all of the outputs have identical transforms and destinations. The intention of BDD is to improve performance via multi-threading. Several characteristics of the scenarios BDD applies to: 1) the destinations would be uniform, or at least be of the same type. 2) the input is faster than the output, for example, reading from flat file to OleDB.
N.B.~BDD works only on SSIS 2008 and SSIS 2008 R2 versions.

Adding Balanced Data Distributor(BDD) in the SSIS toolbox

After installation of the BalancedDataDistributor-x86.exe file, we need to add it to the toolbox of Data Flow Transformation as under.
Step 1: Right click on the Data Flow Transformation and select Choose Items:
1.jpg
Step 2: From the Choose Items dialog box that appears, select SSIS Data Flow Items tab and from there, choose Balanced Data Distributor and finally click OK:
2.jpg
We will find that the BDD has been added to our Data Flow Transformation section:
3.jpg

Let Us Look At An Example

BDD takes a single input and tries to distribute the data in an almost equal proportion to its various outputs, e.g., if we have 5 outputs, then each output component will receive almost 1/5th of the total input data. The efficiency of BDD comes since it operates on the data buffer and not on individual rows.
Let us have a text file, say input.txt, which is having 10,00,000(ten lac) rows. It’s a simple text file which holds some employee information like EmpIDEmpNameEmpSex and EmpPhoneNumber.
Step 1: Open Bids and drag and drop a Data Flow Task in the Control Flow Designer.
Step 2: In the Data Flow Designer, first add a Flat File source and specify the input.txt file as its source. If we make a preview, the output (partial) will be as under:
4.jpg
Step 3: Next, add a BDD whose source is obviously the Flat file Source.
N.B.~ Apart from its name, internal metadata validation and description, BDD does not offer any other customizations of any kind.
The property window of BDD looks as under:
5.jpg
Step 4: Add 5 flat file destinations, each of whose source will be our BDD. Specify the destination files for each Flat File Destination component as Output1.txtOutput2.txtOutput3.txtOutput4.txtOutput5.txtrespectively. Our design should look as under:
6.jpg
Step 5: Now let us run the package and the output is as under:
7.jpg
As can be figured out, out of 10 lac data, BDD distributed 2,04,240 rows of data in the first output component while the others receive 1,98,940 rows. If we divide 10 lac by 5, then every output component is supposed to received 2 lac rows of data evenly. But since the distribution was happening through multithreading, henceforth there is no control over the data being divided precisely.
Now if we open the output files, we can find out the way BDD has distributed the data, e.g.:
File NameFirst RecordLast RecordTotal # of Records
Output1.txt1,Name1,Female,123456791000000,Name1000000,Male,133456782,04,240
Output2.txt9948,Name9948,Male,12355626964859,Name964859,Female,133105371,98,940
Output3.txt19895,Name19895,Female,12365573974806,Name974806,Male,133204841,98,940
Output4.txt29842,Name29842,Male,12375520984753,Name984753,Female,133304311,98,940
Output5.txt39789,Name39789,Female,12385467994700,Name994700,Male,133403781,98,940
BDD works on the principle of Parallelism. It provides an easy way of creating independent segments by distributributing the work over multiple threads.
N.B.~ As discussed earlier, this component uses an internal buffer of 9,947 rows (as per the experiment, I found so) and it is pre-set. There is no way to override this. As a proof, instead of 10 lac rows, we will use only 9,947 (Nine thousand nine forty seven ) rows in our input file and will observe the behavior. After running the package, we will find that all the rows are being transferred to the first output component and the other components received nothing.
8.jpg
Now let us increase the number of rows in our input file from 9,947 to 9,948 (Nine thousand nine forty eight). After running the package, we find that the first output component received 9,947 rows while the second output component received 1 row.
9.jpg
So we can infer that the component has some pre-defined logic of distributing rows to its output components which has been abstracted.

When to Use the BDD

  1. If there is a huge chunk of data coming and there is a necessity of reading those faster.
  2. If there is no ordering dependency of the data as BDD works on the Parallelism principle and not sequentially.

Tuesday, December 17, 2013

ReturnValue, OutputValue, ResultSet and InputValue in single SSIS Execute SQL Task

SP-

USE [BGBL]
GO
/****** Object:  StoredProcedure [dbo].[testsp]    Script Date: 12/17/2013 5:03:13 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[testsp] (@p_Input varchar(50), @p_output int =null output )
AS
BEGIN

/*declare @p_Output1 int
declare @p_ReturnValue int
exec @p_ReturnValue =  testsp 'A',@p_Output1 out
select @p_Output1
select @p_ReturnValue */

select * from Dealer_Master where dealer_group=@p_Input
set @p_output=@@rowcount
return (select count(distinct core) as cntreturn from Dealer_Master where dealer_group=@p_Input)
END


---

SSIS

1) Place Execute SQL Task in control flow 
2) Create OLE DB connection (not .net connection)



SSIS Execute SQL Tasks - Details

http://www.sqlis.com/sqlis/post/The-Execute-SQL-Task.aspx

In this article we are going to take you through the Execute SQL Task in SQL Server Integration Services for SQL Server 2005 (although it appies just as well to SQL Server 2008).  We will be covering all the essentials that you will need to know to effectively use this task and make it as flexible as possible. The things we will be looking at are as follows:
  • A tour of the Task.
  • The properties of the Task.
After looking at these introductory topics we will then get into some examples. The examples will show different types of usage for the task:
  • Returning a single value from a SQL query with two input parameters.
  • Returning a rowset from a SQL query.
  • Executing a stored procedure and retrieveing a rowset, a return value, an output parameter value and passing in an input parameter.
  • Passing in the SQL Statement from a variable.
  • Passing in the SQL Statement from a file.

Tour Of The Task

Before we can start to use the Execute SQL Task in our packages we are going to need to locate it in the toolbox. Let's do that now. Whilst in the Control Flow section of the package expand your toolbox and locate the Execute SQL Task. Below is how we found ours.
Find
Now drag the task onto the designer. As you can see from the following image we have a validation error appear telling us that no connection manager has been assigned to the task. This can be easily remedied by creating a connection manager. There are certain types of connection manager that are compatable with this task so we cannot just create any connection manager and these are detailed in a few graphics time.
Drop
Double click on the task itself to take a look at the custom user interface provided to us for this task. The task will open on the general tab as shown below. Take a bit of time to have a look around here as throughout this article we will be revisting this page many times.
General
Whilst on the general tab, drop down the combobox next to the ConnectionType property. In here you will see the types of connection manager which this task will accept.
ConnectionTypes
As with SQL Server 2000 DTS, SSIS allows you to output values from this task in a number of formats. Have a look at the combobox next to the Resultset property. The major difference here is the ability to output into XML.
ResultsetTypes
If you drop down the combobox next to the SQLSourceType property you will see the ways in which you can pass a SQL Statement into the task itself. We will have examples of each of these later on but certainly when we saw these for the first time we were very excited.
SourceType
Next to the SQLStatement property if you click in the empty box next to it you will see ellipses appear. Click on them and you will see the very basic query editor that becomes available to you.
QueryEditor
Alternatively after you have specified a connection manager for the task you can click on the Build Query button to bring up a completely different query editor. This is slightly inconsistent.
AlternateEditor
Once you've finished looking around the general tab, move on to the next tab which is the parameter mapping tab. We shall, again, be visiting this tab throughout the article but to give you an initial heads up this is where you define the input, output and return values from your task. Note this is not where you specify the resultset.
ParamBlank
If however you now move on to the ResultSet tab this is where you define what variable will receive the output from your SQL Statement in whatever form that is.
ResultsetBlank
Property Expressions are one of the most amazing things to happen in SSIS and they will not be covered here as they deserve a whole article to themselves. Watch out for this as their usefulness will astound you.
ExpressionsTabBlank
For a more detailed discussion of what should be the parameter markers in the SQL Statements on the General tab and how to map them to variables on the Parameter Mapping tab see Working with Parameters and Return Codes in the Execute SQL Task.

Task Properties

There are two places where you can specify the properties for your task. One is in the task UI itself and the other is in the property pane which will appear if you right click on your task and select Properties from the context menu. We will be doing plenty of property setting in the UI later so let's take a moment to have a look at the property pane. Below is a graphic showing our properties pane.
PropertiesPane
Now we shall take you through all the properties and tell you exactly what they mean. A lot of these properties you will see across all tasks as well as the package because of everything's base structure The Container.
BypassPrepare
Should the statement be prepared before sending to the connection manager destination (True/False)
Connection
This is simply the name of the connection manager that the task will use. We can get this from the connection manager tray at the bottom of the package.
DelayValidation
Really interesting property and it tells the task to not validate until it actually executes. A usage for this may be that you are operating on table yet to be created but at runtime you know the table will be there.
Description
Very simply the description of your Task.
Disable
Should the task be enabled or not? You can also set this through a context menu by right clicking on the task itself.
DisableEventHandlers
As a result of events that happen in the task, should the event handlers for the container fire?
ExecValueVariable
The variable assigned here will get or set the execution value of the task.
Expressions
Expressions as we mentioned earlier are a really powerful tool in SSIS and this graphic below shows us a small peek of what you can do. We select a property on the left and assign an expression to the value of that property on the right causing the value to be dynamically changed at runtime.
PropertyExpressions
One of the most obvious uses of this is that the property value can be built dynamically from within the package allowing you a great deal of flexibility
FailPackageOnFailure
If this task fails does the package?
FailParentOnFailure
If this task fails does the parent container? A task can he hosted inside another container i.e. the For Each Loop Container and this would then be the parent.
ForcedExecutionValue
This property allows you to hard code an execution value for the task.
ForcedExecutionValueType
What is the datatype of the ForcedExecutionValue?
ForceExecutionResult
Force the task to return a certain execution result. This could then be used by the workflow constraints. Possible values are None, Success, Failure and Completion.
ForceExecutionValue
Should we force the execution result?
IsolationLevel
This is the transaction isolation level of the task.
IsStoredProcedure
Certain optimisations are made by the task if it knows that the query is a Stored Procedure invocation. The docs say this will always be false unless the connection is an ADO connection.
LocaleID
Gets or sets the LocaleID of the container.
LoggingMode
Should we log for this container and what settings should we use? The value choices are UseParentSetting, Enabled and Disabled.
MaximumErrorCount
How many times can the task fail before we call it a day?
Name
Very simply the name of the task.
ResultSetType
How do you want the results of your query returned? The choices are ResultSetType_None, ResultSetType_SingleRow, ResultSetType_Rowset and ResultSetType_XML.
SqlStatementSource
Your Query/SQL Statement.
SqlStatementSourceType
The method of specifying the query. Your choices here are DirectInput, FileConnection and Variables
TimeOut
How long should the task wait to receive results?
TransactionOption
How should the task handle being asked to join a transaction?

Usage Examples

As we move through the examples we will only cover in them what we think you must know and what we think you should see. This means that some of the more elementary steps like setting up variables will be covered in the early examples but skipped and simply referred to in later ones. All these examples used the AventureWorks database that comes with SQL Server 2005.

Returning a Single Value, Passing in Two Input Parameters

So the first thing we are going to do is add some variables to our package. The graphic below shows us those variables having been defined. Here the CountOfEmployees variable will be used as the output from the query and EndDate and StartDate will be used as input parameters. As you can see all these variables have been scoped to the package. Scoping allows us to have domains for variables. Each container has a scope and remember a package is a container as well. Variable values of the parent container can be seen in child containers but cannot be passed back up to the parent from a child.
Example1ParametersSet
Our following graphic has had a number of changes made. The first of those changes is that we have created and assigned an OLEDB connection manager to this Task ExecuteSQL Task Connection. The next thing is we have made sure that the SQLSourceType property is set to Direct Input as we will be writing in our statement ourselves. We have also specified that only a single row will be returned from this query. The expressions we typed in was:
SELECT COUNT(*) AS CountOfEmployees FROM HumanResources.Employee WHERE (HireDate BETWEEN ? AND ?)

Example1ShowStatement
Moving on now to the Parameter Mapping tab this is where we are going to tell the task about our input paramaters. We Add them to the window specifying their direction and datatype. A quick word here about the structure of the variable name. As you can see SSIS has preceeded the variable with the word user. This is a default namespace for variables but you can create your own. When defining your variables if you look at the variables window title bar you will see some icons. If you hover over the last one on the right you will see it says "Choose Variable Columns". If you click the button you will see a list of checkbox options and one of them is namespace. after checking this you will see now where you can define your own namespace.
Example1ParamMap
The next tab, result set, is where we need to get back the value(s) returned from our statement and assign to a variable which in our case is CountOfEmployees so we can use it later perhaps. Because we are only returning a single value then if you remember from earlier we are allowed to assign a name to the resultset but it must be the name of the column (or alias) from the query.
Example1ResultSet
A really cool feature of Business Intelligence Studio being hosted by Visual Studio is that we get breakpoint support for free. In our package we set a Breakpoint so we can break the package and have a look in a watch window at the variable values as they appear to our task and what the variable value of our resultset is after the task has done the assignment. Here's that window now.
Example1WatchWindow
As you can see the count of employess that matched the data range was 2.

Returning a Rowset

In this example we are going to return a resultset back to a variable after the task has executed not just a single row single value. There are no input parameters required so the variables window is nice and straight forward. One variable of type object.
VariablesExample2
Here is the statement that will form the soure for our Resultset.
select  
p.ProductNumber, p.name, pc.Name as ProductCategoryName
FROM Production.ProductCategory pc
JOIN Production.ProductSubCategory psc
ON pc.ProductCategoryID = psc.ProductCategoryID
JOIN Production.Product p
ON psc.ProductSubCategoryID = p.ProductSubCategoryID
We need to make sure that we have selected Full result set as the ResultSet as shown below on the task's General tab.
Example2ResultsetOnGeneralTab
Because there are no input parameters we can skip the parameter mapping tab and move straight to the Result Set tab. Here we need to Addour variable defined earlier and map it to the result name of 0 (remember we covered this earlier)
Example2ResultsetMap
Once we run the task we can again set a breakpoint and have a look at the values coming back from the task. In the following graphic you can see the result set returned to us as a COM object. We can do some pretty interesting things with this COM object and in later articles that is exactly what we shall be doing.
Example2WatchWindow

Return Values, Input/Output Parameters and Returning a Rowset from a Stored Procedure

This example is pretty much going to give us a taste of everything. We have already covered in the previous example how to specify the ResultSet to be a Full result set so we will not cover it again here. For this example we are going to need 4 variables. One for the return value, one for the input parameter, one for the output parameter and one for the result set. Here is the statement we want to execute. Note how much cleaner it is than if you wanted to do it using the current version of DTS.
Example3ShowStatement
In the Parameter Mapping tab we are going to Add our variables and specify their direction and datatypes.
Example3ParamMap
In the Result Set tab we can now map our final variable to the rowset returned from the stored procedure.
Example3ResultSet
It really is as simple as that and we were amazed at how much easier it is than in DTS 2000.

Passing in the SQL Statement from a Variable

SSIS as we have mentioned is hugely more flexible than its predecessor and one of the things you will notice when moving around the tasks and the adapters is that a lot of them accept a variable as an input for something they need. The ExecuteSQL task is no different. It will allow us to pass in a string variable as the SQL Statement. This variable value could have been set earlier on from inside the package or it could have been populated from outside using a configuration. The ResultSet property is set to single row and we'll show you why in a second when we look at the variables. Note also the SQLSourceType property. Here's the General Tab again.
Example5ShowStatement
Looking at the variable we have in this package you can see we have only two. One for the return value from the statement and one which is obviously for the statement itself.
Example5ParametersSet
Again we need to map the Result name to our variable and this can be a named Result Name (The column name or alias returned by the query) and not 0.
Example5ResultSet
The expected result into our variable should be the amount of rows in the Person.Contact table and if we look in the watch window we see that it is.
Example5WatchWindow

Passing in the SQL Statement from a File

The final example we are going to show is a really interesting one. We are going to pass in the SQL statement to the task by using a file connection manager. The file itself contains the statement to run. The first thing we are going to need to do is create our file connection mananger to point to our file. Click in the connections tray at the bottom of the designer, right click and choose "New File Connection"
Example6FileConnection
As you can see in the graphic below we have chosen to use an existing file and have passed in the name as well. Have a look around at the other "Usage Type" values available whilst you are here.
Example6FileSetup
Having set that up we can now see in the connection manager tray our file connection manager sitting alongside our OLE-DB connection we have been using for the rest of these examples.
Example6ConnectionsTray
Now we can go back to the familiar General Tab to set up how the task will accept our file connection as the source.
Example6ShowStatement
All the other properties in this task are set up exactly as we have been doing for other examples depending on the options chosen so we will not cover them again here.

We hope you will agree that the Execute SQL Task has changed considerably in this release from its DTS predecessor. It has a lot of options available but once you have configured it a few times you get to learn what needs to go where. We hope you have found this article useful.