Tuesday, June 25, 2013

Change Tracking - 2008

http://www.sqlservercentral.com/articles/Change+Tracking/90251/

Change Tracking - 2008

By Nitesh Dedhia, 2012/06/18
Some applications are designed to pull data from a central database into a local cache. These applications need a way to track data changes in order to know what data has been changed. Before SQL Server 2008, application developers had to implement custom tracking solutions using triggers or timestamp columns, and create additional tables in order to track data changes. As we know, triggers are expensive. Each table involved in the DML operation is checked recursively via an internal function for the presence of a trigger. In addition, because triggers are executed as a part of the transactions that cause them to be invoked, the transactions take longer to commit and introduce complicated blocking issues.
SQL Server 2008 provides a new feature, Change Tracking. Change tracking is great for building one-way or two-way synchronization applications and is designed to work with Sync Services for ADO.NET. Change Tracking is a light-weight feature which provides a synchronization mechanism between two applications. In other words, it tracks a table for the net DML (INSERT, UPDATE and DELETE) changes that occur on a table, so that an application (like a caching application) can refresh itself with just the changed dataset.
Today, let's discuss in detail about how to configure and enable Change Tracking and how Change Tracking works, for this we will create a new table, enable Change Tracking on Database and on this newly created table and then after performing some DML operation will see how Change Tracking helps in getting the information of the change done previously.
We begin by creating a table named “tblEmployee”.
Step 1: Create table “tblEmployee”
Here we will create a table named “tblEmployee” in database “Perspective”. This table has 5 columns with EmployeeID as Primary Key and other columns with details of the Employee’s.
CREATE TABLE [dbo].[tblEmployee]
(
      [EmployeeID] [int] NOT NULL,
      [ManagerID] [int] NULL,
      [FirstName] [varchar](20) NOT NULL,
      [LastName] [varchar](20) NOT NULL,
      [Gender] [bit] NULL,
  CONSTRAINT [PK_tblEmployee] PRIMARY KEY CLUSTERED
  (
      [EmployeeID] ASC
  )
 WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
 IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
Step 2: Enable Change Tracking on Database
Now before we enable Change Tracking on a table we need to enable it at the database level. The  SQL code below does this. Here we set the “Auto Cleanup” option ON, and changes will be retained for 3 days only. After 3 days the tracked change will no longer be available.
Depending on the project requirement you should plan to have a retention period, but make sure not to have it too long else as it will consume space on the database server. A too short time frame will not let you capture changes effectively. Generally speaking, 2 to 10 days of retention is preferred.
Alter Database Perspective
Set Change_Tracking = ON
(Auto_CleanUP=ON, CHANGE_RETENTION=3 Days)
AUTO_CLEANUP: With this option we can switch ON or OFF automatic tracking table cleanup process
CHANGE_RETENTION: With this option, you can specify the time frame for which tracked information will be maintained

Step 3: Enable Change Tracking at Table Level
Now we enable the Change Tracking for the table “tblEmployee” as well. Here we can mention specific columns that we want to monitor for tracking changes, if no column is mentioned, by default all the columns within the table are tracked.
ALTER TABLE tblEmployee
ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON)
Step 4: Check if Change Tracking is enabled correctly
Let’s check if the Change Tracking has been enabled on the database and the table where we want it to be. This is basically a verification step, to make sure above queries have performed their task as expected.
SELECT DB_NAME(database_id) AS NAME, 
       retention_period_units, 
       retention_period_units_desc 
FROM sys.change_tracking_databases

SELECT t.name As TableName
FROM sys.change_tracking_tables ctt
      JOIN sys.tables t ON ctt.object_id = t.object_id
The results are shown below:

 

The table "sys.change_tracking_databases" shows a row for each database if Change Tracking is enabled for it.
The table "sys.change_tracking_tables" shows a row for each table that has Change Tracking enabled.

SQL Server creates an internal tracking table, the details of which are stored in “sys.internal_tables”. The naming convention for these internal tables is “change_tracking_<Table Object_ID>”

We cannot query the Change Tracking table directly, but we are provided with set of functions that help in getting the changes that are tracked.

Step 5: Retrieve Changed Information
To retrieve changed information we can use following system functions: 
SELECT CHANGE_TRACKING_CURRENT_VERSION ()
SELECT CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('tblEmployee'))
SELECT * 
 FROM CHANGETABLE (CHANGES tblEmployee,0) as CT
 ORDER BY SYS_CHANGE_VERSION
CHANGE_TRACKING_CURRENT_VERSION: This function is used to get the current version number at the database level, possibly the higher boundary for retained change information

CHANGE_TRACKING_MIN_VALID_VERSION: This function gives the minimum version after the information for a table change has been retained or lower boundary for a table change information

CHANGETABLE: This function is used to retrieve change information after version 0. Since we have not performed any DML operations yet after enabling Change Tracking, this function returns no records.

Step 6: Perform some DML operations
Let's insert some data to test Change Tracking.
INSERT INTO tblEmployee Values (1, Null, 'Mike', 'Fields', 1)
INSERT INTO tblEmployee Values (2, 1, 'John', 'Hopkins', 1)
INSERT INTO tblEmployee Values (3, 1, 'Henry', 'III', 1)
Delete tblEmployee Where EmployeeID = 2
Update tblEmployee Set FirstName = 'Suresh', LastName = 'Sankar' Where EmployeeID = 3
Here we first insert 3 records, then delete record number 2 and update the 3rd record in our tblEmployee table.
Step 7: Retrieve Changed Information
Let us now retrieve the tracked information.
SELECT CHANGE_TRACKING_CURRENT_VERSION ()
SELECT CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('tblEmployee'))
SELECT * 
 FROM CHANGETABLE (CHANGES tblEmployee,0) as CT 
 ORDER BY SYS_CHANGE_VERSION
The results are shown below




CHANGE_TRACKING_CURRENT_VERSION: This function is used to get the current version number at the database level, possibly the higher boundary for retained change information. This returns 5, because 5 DML operations have been performed after enabling Change Tracking

CHANGE_TRACKING_MIN_VALID_VERSION: This function gives the minimum version after the information for a table change has been retained or lower boundary for a table change information. This still returns 0 indicating the change information has been retained after version 0 for table tblEmployee

By passing the minimum valid version to CHANGETRACKING function we get all the change information after that specified version. We can pass any version number between the minimum valid version and current version and this function will give changed information if there are any changes between these boundaries.

Step 8: Retrieve all DML changes
From the SQL code below we are able to retrieve all the changes that have been performed on tracked table using DML operation.
DECLARE @PreviousVersion bigint = 0

SELECT CTTable.EmployeeID, 
       CTTable.SYS_CHANGE_OPERATION, 
       Emp.FirstName, Emp.LastName, Emp.Gender, 
       CTTable.SYS_CHANGE_VERSION, 
       CTTable.SYS_CHANGE_COLUMNS, 
       CTTable.SYS_CHANGE_CONTEXT
 FROM CHANGETABLE (CHANGES tblEmployee, @PreviousVersion) AS CTTable
   LEFT OUTER JOIN tblEmployee AS Emp 
     ON emp.EmployeeID = CTTable.EmployeeID
The results are shown below:




Step 9: Get impacted Column information only
Once we know which records were modified/inserted we might be interested in knowing the columns which got impacted. Change Tracking gives us this ability to track based on individual columns as well. The code below will help in identifying if a particular column was impacted / changed or not. Here a “1” means a change occurred and a “0” mean this column was not changed. For Insert operations since all columns get impacted it will display “1” for all columns.
DECLARE @PreviousVersion bigint = 0
SELECT 
   CTTable.EmployeeID, 
   CTTable.SYS_CHANGE_OPERATION,
   Emp.FirstName, 
   Emp.LastName, 
   Emp.Gender,
   [FirstNameChanged?] = CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID('tblEmployee'), 
   'FirstName', 'ColumnId'), SYS_CHANGE_COLUMNS),
   [LastNameChanged?] = CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(
                          OBJECT_ID('tblEmployee'),'LastName', 'ColumnId')
                        , SYS_CHANGE_COLUMNS),
   [Gender?] = CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID(
                          'tblEmployee'),'Gender', 'ColumnId')
                        , SYS_CHANGE_COLUMNS)
 FROM CHANGETABLE (CHANGES tblEmployee, @PreviousVersion) AS CTTable 
  LEFT OUTER JOIN tblEmployee AS Emp 
    ON emp.EmployeeID = CTTable.EmployeeID
The results can be seen here:




If it depends from which point you want to track the changes, the version number comes handy. In the SQL code above if we change the value of @PreviousVersion from “0” to “2” you will see a different result (shown here): 




This change lets you know the change that has happened to a particular record in database from the time when you want to its current state. Change @PreviousVersion to values from 0 to 5 and see the differences. I would leave this as an exercise to the readers.

Note: Change tracking does not provide support for recovering from the loss of data. However, there are two options for detecting these types of synchronization issues:
  • Store a database version ID on the server, and update this value whenever a database is recovered or otherwise loses data. Each client application would store the ID, and each client would have to validate this ID when it synchronizes data. If data loss occurs, the IDs will not match and the clients would reinitialize.
  • When a client queries for changes, record the last synchronization version number for each client on the server. If there is a problem with the data, the last synchronized version numbers would not match. This indicates that a re-initialization is required.
Change tracking in SQL Server 2008 enables applications to obtain only changes that have been made to the user tables, along with the information about those changes. With change tracking integrated into SQL Server, complicated custom change tracking solutions no longer have to be developed. 
Change tracking is an important building block for applications that synchronize and replicate data in scenarios in which end-to-end replication solutions do not work and a custom solution is required. 

Change Data Capture - SQL Server 2008

http://www.sqlservercentral.com/articles/SQL+Server+2008/91359/

Change Data Capture - SQL Server 2008

By Nitesh Dedhia, 2013/06/24
Often applications need to keep track of all the changes that have been made to data. This helps in maintaining record history for auditing purposes, or for data security purposes. At times this can help with  other business decisions as well. Whenever we try to capture data from a transactional database for an auditing purpose or to save all the history of data changes for a data security purpose, we have either of the following options:
  • Compare the data between the already extracted data to the data present in database. The drawback is this takes time and impacts the performance of the transactional database.
  • Write “Triggers” so when a change is made to data by a transaction, this gets reported in an audit through the triggers. The drawback is this improves auditing on a real time basis but impacts the performance of a live application.
To overcome these issues we have wonderful feature in SQL Server 2008 – Change Data Capture (CDC). CDC allows us to maintain DML change history very efficiently and without complicated programming.
Here we will discuss how to configure CDC and will show how to make CDC work. For this we will create a new table then enable CDC on database and on this newly created table and then perform some DML operation and finally see how CDC captures this data change.
Step 1: Create table “tblEmployee”
Here we will create a table named “tblEmployee” in database “HRDatabase”. This table has 5 columns with EmployeeID as the Primary Key. The other columns hold the details of the Employees.
CREATE TABLE [dbo].[tblEmployee](
      [EmployeeID] [int] NOT NULL,
      [ManagerID] [int] NULL,
      [FirstName] [varchar](20) NOT NULL,
      [LastName] [varchar](20) NOT NULL,
      [Gender] [bit] NULL,
 CONSTRAINT [PK_tblEmployee] PRIMARY KEY CLUSTERED(
      [EmployeeID] ASC)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
       ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]) ON [PRIMARY]
Step 2: Enable Change Data Capture (CDC) on Database
Before we enable CDC on a table we need to enable it at a database level. This code does enables CDC.
USE HRDatabase
GO
EXEC sys.sp_cdc_enable_db
GO
Step 3: Enable CDC at Table Level
Now we enable CDC for the table “tblEmployee” as well. Here we can mention specific columns that we want to monitor for tracking changes. If no column is mentioned, by default all the columns within the table are tracked.
EXEC sys.sp_cdc_enable_table
 @Source_schema = N'dbo',
 @source_name = N'tblEmployee',
 @filegroup_name = N'PRIMARY',
 @supports_net_changes = 1
GO
 Once the above script is run, it creates the following jobs:
1] csc.HRDatabase_capture – This job executes the system stored procedure “sys.sp_MScdc_capture_job”, which internally calls another system stored procedure named “sys.sp_cdc_scan”. These two stored procedures enable SQL Server Agent, which enables the CDC feature in SQL Server 2008.
2] csc.HRDatabase_cleanup – This job executes the system stored procedure “sys.sp_MScdc_cleanup_job”. This is very useful as it prevents the change table from growing uncontrollably and enables the cleanup of the database change tables.
Both these jobs once created are also enabled automatically.
Step 4: Check if CDC is enabled correctly
Let’s check if the CDC has been enabled on the database and the table where we want it to be. This is basically a verification step, to make sure above queries have performed their task as expected. This code does that.
SELECT Name As DataBaseName, is_cdc_enabled FROM sys.databases Where is_cdc_enabled = 1
SELECT Name AS TableName, is_tracked_by_cdc FROM sys.tables Where is_tracked_by_cdc = 1 
 The results:




Once this is tested, we can check the system tables and find a new table named “cdc.dbo_tblEmployee_CT”. This table will henceforth contain all the DML changes performed on dbo.tblEmployee table. Once you expand the “Columns” of this table, we will find all the columns from “tblEmployee” table as well as an extra 5 more columns.

 

These columns are:
  1. __$start_lsn         : Start Log Sequence Number
  2. __$end_lsn          : End log Sequence Number
  3. __$seqval            : Sequence value used to order row changes within transaction
  4. __$operation        : Contains value corresponding to DML operations. Value list is as follows:
                                 1 = Delete
                                 2 = Insert
                                 3 = Before Update
                                 4 = After Update
  5. __$update_mask  : Provides details about the column which was/were updated in DML operation. This is bitmap field.
If we query both tables, we find they are both empty right now.
SELECT * FROM tblEmployee
SELECT * FROM cdc.dbo_tblEmployee_CT
 


Step 5: Perform DML operation and check Change Data Capture
First we insert a few records into the tblEmployee table.
INSERT INTO tblEmployee Values (1, Null, 'Mike', 'Fields', 1)
INSERT INTO tblEmployee Values (2, 1, 'John', 'Hopkins', 1)
INSERT INTO tblEmployee Values (3, 1, 'Henry', 'III', 1)
 Let’s check the impact of above Insert SQL’s on our tables (tblEmployee and cdc.dbo_tblEmployee_CT):
 


We did not enter any records into the table cdc.dbo_tblEmployee_CT table ourselves, yet due to CDC the complete insert DML operation was captured and recorded in this table. Let’s see how this behaves on Update and Delete operations.
Delete tblEmployee Where EmployeeID = 2
Update tblEmployee Set FirstName = 'Suresh', LastName = 'Sankar' Where EmployeeID = 3
Now let’s check changes in our 2 tables (tblEmployee and cdc.dbo_tblEmployee_CT):




Step 6: Retrieve Changed Information through CDC
Now since we have verified that CDC is working and tracking all DML changes on the table as expected, thus let’s prepare a strategy to retrieve this information for business use. We generally need to track changes over a particular timeframe. If we look at our CDC tracking table “cdc.dbo_tblEmployee_CT” there is no column with date/time value, however there is a field “__$start_lsn” which can help us to retrieve the required information. LSN stands for Log Sequence Number. Every record in the transaction log is uniquely identified by an LSN, and they are all incrementing numbers. Also the LSNs are associated with time and the mapping can be found using the table “cdc.lsn_time_mapping”. There are two more functions provided to help us achieve our goal.
These functions are created when table level CDC is enabled:
    1. cdc.fn_cdc_get_all_changes_dbo_tblEmployee: Capture ALL changes occurred during particular timeframe
    2. cdc.fn_cdc_get_net_changes_dbo_tblEmployee: Capture NET changes occurred during particular timeframe

 

Now let’s get all the changes that happened from yesterday till today. 
DECLARE @BEGIN_LSN BINARY(10), @END_LSN BINARY(10);

SET @BEGIN_LSN = (SELECT MIN(start_lsn) FROM cdc.lsn_time_mapping 
                 Where CAST(tran_begin_time As DATE) = CAST(GETDATE()-1 AS DATE) and tran_id != 0)
SET @END_LSN  = (SELECT MAX(start_lsn) FROM cdc.lsn_time_mapping 
                Where CAST(tran_begin_time As DATE) = CAST(GETDATE()-1 AS DATE) and tran_id != 0)

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_tblEmployee(@BEGIN_LSN, @END_LSN, 'all')




This provides the details of all the DML operation performed on the table during the specified time. If we need to get only the net changes on data due to various DML operations then we can use following:
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_tblEmployee(@BEGIN_LSN, @END_LSN, 'all')

CDC Cleanup Process

Once we set up CDC, all the changes occuring on a table get tracked and stored in CDC's respective tables. If CDC is enabled on all tables in a database, there is very good chance that we will fall short of available space on the hard-drive, which creates maintenance issues. To resolve this CDC enables an automatic cleanup process, which runs at regular intervals and by default is configured to cleanup after a three day interval. This period is configurable. The system stored procedure "sys.sp_cdc_cleanup_change_table" gets created once CDC is enabled on a database. This comes handy when we want to change the cleanup retention period.
The retention period value specified for CDC Cleanup Process is in minutes and is should be a positive value greater than 0 and <= 52594800, i.e less than 100 years.
EXEC SYS.SP_CDC_CHANGE_JOB @JOB_TYPE = 'Cleanup', @RETENTION = 500

Summary

With CDC we have a robust way in SQL Server 2008 to record all the DML operations that are performed on a database. This feature is a big help for auditing databases and also for debugging certain scenarioa where we need to know exact sequence of changes happening on the database.
Further Reading for Performance Tuning of Change Data Capture visit http://msdn.microsoft.com/en-us/library/dd266396(v=sql.100).aspx

Temporary Stored Procedures

http://www.sqlservercentral.com/articles/Temporary+Stored+Procedures./99822/

Temporary Stored Procedures

By Harsha Majety, 2013/06/25
Temporary stored procedure is a little known feature of SQL Server.  This article is an attempt to throw some light on this interesting feature.
Temporary stored procedures are like normal stored procedures but, as their name suggests, have fleeting existence. There are two kinds of temporary stored procedures local and global. Temporary stored procedures are created just like any other SPs; however the name must begin with a hash (#) for a local temporary SP and two hashes (##) for a global temporary stored procedure.
A local temporary stored procedure is available only in the current session and is dropped when the session is closed. A global temporary stored procedure is visible to all sessions and is dropped when the session of the user that created it is closed. If there are any executing versions of the global stored procedure when the creator session is closed, those are allowed to complete, but once they are done and the creator session is closed, no further execution of the global temporary stored procedure is allowed.
Aside from local and global temporary stored procedures, you can also create regular stored procs in tempdb. These only exist until sql server is restarted because tempdb is recreated each time server is restarted.
The following exercise illustrates these points. First, open a new query window and run the following script and keep the query window open.
-- Local Temporary Stored Procedure
CREATE PROC #HarshaTest
AS
DECLARE @Table TABLE ( col1 INT );

INSERT INTO @Table
 ( col1 )
VALUES ( 1),(2),(3),(4);

SELECT * FROM @Table

RETURN 0
GO

-- Global Temporary Stored Procedure
CREATE PROC ##HarshaTest
AS
DECLARE @Table TABLE ( col1 INT );

INSERT INTO @Table
 ( col1 )
 VALUES ( 1),(2),(3),(4);

SELECT * FROM @Table

RETURN 0

GO
USE tempdb
GO
-- Stored procedure in tempdb
CREATE PROC dbo.HarshaTest
AS
DECLARE @Table TABLE ( col1 INT );

INSERT INTO @Table
 ( col1 )
VALUES ( 1),(2),(3),(4);

SELECT * FROM @Table

RETURN 0

GO

 You should see this in Management Studio:



The picture shows stored procedures in tempdb while the session is active. Notice the long postfix that SQL Server automatically adds to the local temporary stored procedure. SQL Server adds some postfix to local temporary objects to distinquish the objects with the same name from different sessions.
Now, in the same window, execute the local & global temporary stored procedures that you created . Both will work and return results.
EXECUTE #HarshaTest;

EXECUTE ##HarshaTest;

The results should look like those shown below:

 


Keeping this window open, open a new query window and execute the procs again. You will notice that local temp stored proc will fail with ‘object not found' error, but global temporary stored proc will work, as shown below.


In the messages tab, you should see:
Msg 2812, Level 16, State 62, Line 1
Could not find stored procedure '#HarshaTest'.
Now close the window in which you ran the create proc scripts and execute the procs in another window. You will notice that both will fail with object not found error since the session is now closed.


Checking Management Studio 


The picture shows the procedures in tempdb after the session is closed. Note that local and global procs that were created don’t exist anymore.
An interesting observation about temp stored procedures is, since they are created in tempdb, they can be called from any database. SQL Server basically ignores database name and schema name in the case of temporary stored procedures.
For example, the following two commands work and return the same results . Note that [blah] is a non-existing db.
EXECUTE ##HarshaTest;

EXECUTE [blah].dbo.##HarshaTest;
If you are wondering if it is possible to create temporary functions or temporary views , unfortunately SQL Server does not support Temporary Functions and Temporary Views .
Example : The following SQL will throw an error 
CREATE FUNCTION #HarshaTempFunction()
RETURNS VARCHAR(100)

AS
BEGIN

RETURN('This is a Temporary Function');

END
The results are:
Msg 1074, Level 15, State 1, Line 1
Creation of temporary functions is not allowed.

Limitations:
  • Heavy use of temporary stored procedures can create contentionon system tables in tempdb . You also use sp_executesql instead of temporary procs. sp_executesql does not save information to system tables and hence there is no contention.
  • You cannot create CLR stored procedures/ functions as temporary objects.

Uses of Temporary Stored procedures:

One use for temporary stored procedures is code reuse and improved error handling when you don’t have object CREATE permissions in any of the databases to create stored procedures.
Example : In SSIS package SQL scripts, if the user doesn’t have object create permission but wants to take advantage of code reuse and execution plan caching features of stored procedures, he can create temporary stored procedures.
You can also use temporary procedures to test a stored procedure before actually creating it. Instead of creating and dropping a permanent stored proc, you can create a temporary stored proc and make it permanent after it performs satisfactorily .
During deployment, if you have a script that uses repetitive code, you can create a temporary stored proc at the beginning of the script and reuse it throughout the script. The temporary stored proc is dropped when the connection is closed so you will not have code lying around after deployment.
For example:
CREATE PROC #tempDeploymentRowsInserted
@stepId INT,
@rowCount INT
AS
BEGIN 
INSERT INTO RowsInsertedDuringDeploymentSteps(StepID,RowsCount) VALUES (@stepId, @rowCount);

END

GO
DECLARE @rowCount INT;

--{Code}

SET @rowCount = @@ROWCOUNT;
EXEC #tempDeploymentRowsInserted 1,@rowCount;

--{Code}

SET @rowCount = @@ROWCOUNT;
EXEC #tempDeploymentRowsInserted 2,@rowCount;

--{Code}

SET @rowCount = @@ROWCOUNT;
EXEC #tempDeploymentRowsInserted 3,@rowCount;

--Code

Temporary stored procedures are useful when connecting to earler versions of SQL server that do not support reuse of execution plans for T-SQL statement or batches( < SQL server 7.0 ) . You can take advantage of execution plans using temporary stored procedures. ODBC Drivers for instance make use of temporary stored procedures.
One use of global temporary stored procedures is code reuse during coordinated collaborative work. When a team is working on something (maintenance work, etc) and there is a code segment that everyone uses during the duration of the work, it can be put in a global temporary stored procedure. After the work is done, the user that created the global temporary variable can close the session and there will not be any vestigial code on the server.
In summary, temporary stored procedures are analogous to temporary tables. They can be use to store code that can be reused throughout the life of your connection.

Friday, June 21, 2013

Full vs Differential vs Transaction Log Backup


Full vs Differential vs Transaction Log Backup
SQL Server provides three backup options - Full backup, Differential backup and Transaction Logs backup. Today, we will review the three backup strategies and key differences between them.

Full Backup
Full backup is just that, a full backup of your database at a point in time. You can restore the full backup on the same or a different SQL Server.

Differential Backup
Differential backup backs only the changes since the last full backup. The benefit of differential backup is that it is very fast and takes less space since you are only backing up the changes.

Differential backup has a backup chain which starts from the last full backup. All differential backups are from the previous full backup. It is possible to take another full backup without breaking the differential backup chain, i.e. to have it continue from the previous full backup. If you take a full backup with COPY_ONLY option, you will not break the differential backup chain, but without the COPY_ONLY option, the previous differential backup chain will be broken and a new chain will start from the most recent backup.

You can perform a restore at point in time by restoring a full backup and then applying the most recent differential backup.

Transaction Logs
Transaction Logs are the changes since the last transaction log backup. I have seen some confusion about whether transaction log backups are from the last full backup or from the last transaction log backup. If you are taking full database backup for the very first time, you transaction log back up chain will start after the full backup. Any subsequent full or differential backups will not break the log chain and the next transaction log backup will be from the last transaction log backup and not the last full backup.

The transaction log backup only works in Full and Bulk Logged recovery model and the only way to break the log chain is by either switching the recovery model to Simple or if you choose to override existing backup set when creating a full backup media set.

If your database is set to full or bulk logged recovery model, you must take frequent log backups otherwise your log file won't truncate, filling up your hard drive.

It is good to use all three backup schemes in your database environment to ensure you keep the restore media set or files as current as possible, so you can restore to a point in time and minimize data loss.

**************************************************************
way to restore full/diff/log backup

http://blog.sqlauthority.com/2010/03/21/sql-server-fix-error-3117-the-log-or-differential-backup-cannot-be-restored-because-no-files-are-ready-to-rollforward/

I received the following email from one of my readers.
Dear Pinal,
I am new to SQL Server and our regular DBA is on vacation. Our production database had some problem and I have just restored full database backup to production server. When I try to apply log back I am getting following error. I am sure, this is valid log backup file. Screenshot is attached.
[Few other details regarding server/ip address removed]
Msg 3117, Level 16, State 1, Line 1
The log or differential backup cannot be restored because no files are ready to roll forward.
Msg 3013, Level 16, State 1, Line 1
RESTORE LOG is terminating abnormally.
Screenshot attached. [Removed as it contained live IP address]
Please help immediately.
Well I have answered this question in my earlier post, 2 years ago, over here SQL SERVER – Fix : Error : Msg 3117, Level 16, State 4 The log or differential backup cannot be restored because no files are ready to rollforward. However, I will try to explain it a little more this time.
For SQL Server database to be used it should in online state. There are multiple states of SQL Server Database.
  • ONLINE (Available – online for data)
  • OFFLINE
  • RESTORING
  • RECOVERING
  • RECOVERY PENDING
  • SUSPECT
  • EMERGENCY (Limited Availability)
If the database is online, it means it is active and in operational mode. It will not make sense to apply further log from backup if the operations have continued on this database. The common practice during the backup restore process is to specify the keyword RECOVERY when the database is restored. When RECOVERY keyword is specified, the SQL Server brings back the database online and will not accept any further log backups.
However, if you want to restore more than one backup files, i.e. after restoring the full back up if you want to apply further differential or log backup you cannot do that when database is online and already active. You need to have your database in the state where it can further accept the backup data and not the online data request. If the SQL Server is online and also accepts database backup file, then there can be data inconsistency. This is the reason that when there are more than one database backup files to be restored, one has to restore the database with NO RECOVERY keyword in the RESTORE operation.
I suggest you all to read one more post written by me earlier. In this post, I explained the time line with image and graphic SQL SERVER – Backup Timeline and Understanding of Database Restore Process in Full Recovery Model.
Sample Code for reference:
RESTORE DATABASE AdventureWorksFROM DISK = 'C:\AdventureWorksFull.bak'WITH NORECOVERY;RESTORE DATABASE AdventureWorksFROM DISK = 'C:\AdventureWorksDiff.bak'WITH RECOVERY;
In this post, I am not trying to cover complete backup and recovery. I am just attempting to address one type of error and its resolution. Please test these scenarios on the development server. Playing with live database backup and recovery is always very crucial and needs to be properly planned. Leave a comment here if you need help with this subject.



--Full Back Backup
BACKUP DATABASE [test] TO
DISK = N'D:\HDE\TestBackup\a1' WITH NOFORMAT
GO

--Transaction Log Backup
BACKUP LOG [test] TO
DISK = N'D:\HDE\TestBackup\a3' WITH NOFORMAT, NOINIT,
NAME = N'test-Transaction Log  Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10

--Differential Backup
BACKUP DATABASE [test] TO  DISK = N'D:\HDE\asdf'
WITH  DIFFERENTIAL , NOFORMAT, NOINIT,
NAME = N'test-Differential Database Backup',
SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

--restore full backup

restore database test from
DISK = N'D:\HDE\TestBackup\a1' WITH  FILE = 1,  NORECOVERY,  NOUNLOAD,  STATS = 10

--restore transaction log backup or differential backup
restore database test from
DISK = N'D:\HDE\TestBackup\a2' WITH  FILE = 1,  RECOVERY,  NOUNLOAD,  STATS = 10


Lets take an example to understand this better: TransactionLog backup
[Trasaction log backup is bascially incremental backup while differential backup is a additional back up from last full backup]

database test is having a table test and its having 1 record
1) Take full backup - a1
2) insert one more record in test table
3) take transaction log backup - a2
4) insert one more record in test table
5) take transaction log backup again - a3

now restore a1 with no recovery and then try to restore a3, it will not allow. a1, a2 and a3 needs to be in sequence.


Lets take an example to understand this better: TransactionLog backup
[Trasaction log backup is bascially incremental backup while differential backup is a additional back up from last full backup]

database test is having a table test and its having 1 record
1) Take full backup - a1
2) insert one more record in test table
3) take transaction log backup - a2
4) insert one more record in test table
5) take transaction log backup again - a3

now restore a1 with no recovery and then try to restore a3, it will not allow. a1, a2 and a3 needs to be in sequence.

Lets take an example to understand this better: Differential backup
[Trasaction log backup is bascially incremental backup while differential backup is a additional back up from last full backup]

database test is having a table test and its having 1 record
1) Take full backup - a1
2) insert one more record in test table
3) take differential log backup - a2
4) insert one more record in test table
5) take differential log backup again - a3

now restore a1 with no recovery and then try to restore a3, it will allow. after a1 either a2 or a3 can be restored.



Thursday, June 13, 2013

Scripting SSIS Package Deployments

http://www.sqlservercentral.com/blogs/chadmiller/2012/11/06/scripting-ssis-package-deployments/#comments

Chad Miller

Chad Miller is a Senior Manager of Database Administration at Raymond James Financial. Chad has worked with Microsoft SQL Server since 1999 and has been automating administration tasks using Windows Powershell since 2007. Chad is the Project Coordinator/Developer of the Powershell-based Codeplex project SQL Server PowerShell Extensions (SQLPSX). Chad leads the Tampa Powershell User Group and is a frequent speaker at users groups, SQL Saturdays and Code Camps.

Scripting SSIS Package Deployments

Before I delve into the subject of scripting SSIS package deployments, I’m going to take a slight detour and explain why the general subject of automating deployments is important.

Automating Deployments

One of the keys areas you should be looking at automation, possibly through scripting is deployments. If you’re doing deployments and you’re not using a scripted repeatable process, free of GUI and typing then you’re doing it wrong. I’ve seen many times in my career where deployments too often rely on complex instructions rather than using a tested script-based approach. It is inevitable; relying on even the most detailed step-by-step manual instructions will lead to deployments errors because of the human operator factor. When it actually comes time to deploy changes there should be zero typing or clicking. And if it’s not a fully automated deployment then any manual steps should be made as simple as possible such as “run this script with these parameters through copy and paste and report back results.” End Detour.

 SSIS Package Deployments

My SSIS package deployment requirements:
  1. The solution must support 2005, 2008, 2008 R2 and 2012 because I have a mixed environment
  2. The solution must support deploying to a SQL Server data storage in msdb from a dtsx file
  3. The solution must include verification of package installation
  4. The solution must be able to create any needed folder structures automatically
  5. The solution must include error handling and detailed output on operations performed
  6. The solution must support constrained parameters based on using SQL Server data store of a ServerInstance, the dtsx file and the full destination path on the SSIS server
When automating any task I’ll see if there’s already a solution either from Microsoft or third parties. I couldn’t find anything that out-of-the-box does meet all my requirements, but I did find two ways which provide partial solutions.
The first, writing Powershell code directly against Microsoft.SqlServer.ManagedDTS like I’ve done in the SSIS Powershell module I created for SQL Server Powershell Extensions. There’s is a function in the SSIS module called Copy-ISItemFileToSQL, however it provides only part of the solution and there’s a bigger problem of incompatibilities between versions to handle. The assembly for SSIS changes between 2005 and 2008/2008 R2 and 2012 which make crafting a complete solution difficult. I’ve given up on going down this path because it quickly becomes complex.
The second option and the one I went with, is to use the command-line utility dtutil.exe. The nice thing about dtutil–its included with  SQL Server 2005 and higher, well-documented and removes some of complexity of coding against the SSIS classes directly.Although dtutil.exe only meets requirements 1 through 3 above, I can fill in the rest with a bit of Powershell code. I present my Powershell script solution install-ispackage.ps1.

Using Install-ISpackage

To use install-ispackage simply download the script and from PoshCode and run by providing three parameters. Here’s an example of installing a dtsx file to my SSIS server:
1
./install-ispackage.ps1 -DtsxFullName "C:\Users\Public\bin\SSIS\sqlpsx1.dtsx" -ServerInstance "Z001\SQL1" -PackageFullName "SQLPSX\sqlpsx1"

Install-ISPackage Explanined

The install-ISPackage script provides an example of how you can approach calling native console applications (exe’s) from Powershell. You see error handling and handling output differs greatly when calling an exe vs. using cmdlets or .NET code. The former does not trigger errors and instead relies on exit codes defined by the console application developer. You have to check lastexitcode and read whatever documentation is provided with console application to determine what the exit codes mean.
I’ll step through a few things to explain:
When I’m dealing with scripts that make changes I like to set $ErrorActionPreference to Stop instead of the default of Continue. This way I can wrap some error handling and logging around any errors and be assured the script won’t proceed to the next step should an error occur.
I also like to make the exit code more user friendly. I’ll do this by reading the documentation for the command-line utility. On the msdn page for dtutil there a nice table under dtutil Exit Codes which I then create as a hashtable at the top of the script:
1
2
3
4
5
6
$exitCode = @{
0="The utility executed successfully."
1="The utility failed."
4="The utility cannot locate the requested package."
5="The utility cannot load the requested package."
6="The utility cannot resolve the command line because it contains either syntactic or semantic errors"}
I can then return a more useful error message by using the hastable with the built-in variable $lasterrorcode:
1
throw $exitcode[$lastexitcode]
You’ll notice in the Get-SqlVersion function I’m just using the classic sqlcmd.exe console application to run a query to get the SQL Server version number:
1
$SqlVersion = sqlcmd -S "$ServerInstance" -d "master" -Q "SET NOCOUNT ON; SELECT SERVERPROPERTY('ProductVersion')" -h -1 -W
I choose to use sqlcmd.exe instead of invoke-sqlcmd Powershell cmdlet because it’s installed on every SQL 2005 machine and it’s easier to use when I just want to return a single string:
1
2
C:Users\Public\bin\>Get-SqlVersion -ServerInstance Z001\sql1
10.50.2550.0
The Set-DtutilPath function tries to find the “right” dtutil.exe based on the SQL version being deployed to. You see although parameters for dtutil.exe are identical between version the utility isn’t backwards or forward compatible. You have to use the 9.0 version for 2005,  the 10.0 version for both 2008 and 2008 R2 and the 11.0 version for 2012.
The rest of the functions follow a basic pattern:
Run dtutil.exe and save the output to $result variable
$result will be an array of strings so create a single string separated by newlines:
1
$result = $result -join "`n"
Rather than returning an error on failure or nothing on success, instead return an object with details of what was run:
1
2
3
4
5
6
new-object psobject -property @{
ExitCode = $lastexitcode
ExitDescription = "$($exitcode[$lastexitcode])"
Command = "$Script:dtutil /File `"$DtsxFullName`" /DestServer `"$ServerInstance`" /Copy SQL;`"$PackageFullName`" /Quiet"
Result = $result
Success = ($lastexitcode -eq 0)}
I really like using this technique so that if there are failures as part of troubleshooting you can just run the Command property and you get other relevant details. The key here is you can always get back to the base utility so if something doesn’t work in the script you can prove it’s not the script when you get the same error in the utility alone. Note: I have seen errors a few times, usually because a developer will create an SSIS package in later version than the server being deployed to.
Check the $lasterrorcode after calling the utility and returning an object with details:
1
2
3
if ($lastexitcode -ne 0) {
throw $exitcode[$lastexitcode]
}
Here I’ll use the hashtable defined at the top of script to return a more friendly error message. If errors occur between the error returned and the detailed object I can troubleshoot any issues.
The rest of the functions follow a similar pattern. I will point out a non-zero exit code doesn’t necessarily mean an error. Some console application developers will use error code of 1 or other numbers to mean something other than error as is the case when testing if a folder path exists in dtutil. If it doesn’t exist an error code of 1 is returned. Sometimes it’s hard to determine when a non-zero error code means something other than error except through using the utility. Fortunately Powershell cmdlets don’t use weird exit codes to return status, they generally return an object or error object, but if you’re going to write Powershell scripts against command-line utilities you’ll need to be aware of exit codes and specific exit code meaning for the utility you’re using.
The other thing I’ll point out is the logic to create nest folder paths in Get-FolderList  and new-folder functions. The functions are in place to satisfy my fourth requirement to automatically create folders if they don’t exist.
The main section executes the series of functions in order, wrapped in a try/catch block and since I set my $ErrorAction and check the $lasterrorcode throwing an error in each function, the script will stop should an error occur.

Batch ETL of Multiple Data Files Using an SSIS Foreach Loop Container 3

8. Add a script task for moving the data files to backup

Drag-and-drop a script task from the toolbox into the Foreach Loop Container.

Add a precedence constraint from the data flow task to the script task.

Double-click on the script task to bring up the Script Task Editor. Add the read-only variable FilePath and click the OK button.

Click on the Edit Script button to bring up the script task code editor and enter the following Visual Basic code.
Option Strict On
Imports System
Imports System.IO
Imports System.Data
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    Enum ScriptResults
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    End Enum

    Public Sub Main()

        Dim DestinationDirectory As String = "C:\BatchETL\backup\"
        Dim file_name As String() = Split(Dts.Variables("FilePath").Value.ToString, "\")
        Dim input_file As New FileInfo(Dts.Variables("FilePath").Value.ToString)

        Try
            input_file.MoveTo(DestinationDirectory & file_name(2))
        Catch ex As Exception
        End Try

    End Sub

End Class
After entering the code click the  button to save it, then close the window.

9. Execute the finished package

Click on the  button to execute the package to process all the files in the c:\BatchETL\ folder.

Open the customers table to confirm that the 30,000 records in the six data files have been inserted.

Open the C:\BatchETL\ folder to confirm that the data files have been removed from the source folder.
Open the C:\BatchETL\backup\ folder to confirm that the data files were moved to the backup folder.

Summary

To configure a data flow task created by the SSIS Import/Export wizard to process mutliple data files:
  1. Add a Foreach Loop Container to the Control Flow window of the SSIS package
  2. Move the data flow task into the Foreach Loop Container
  3. Add a global string variable named FilePath to the package
  4. Configure the Foreach Loop Container Collection-Folder text box to point to the the folder where the data files will reside
  5. Configure the Foreach Loop Container Collection-Files text box to contain the appropriate root name of the data files
  6. Configure the Foreach Loop Container Variable Mappings-Variable to the FilePath variable and give it an index of zero
  7. Configure the ConnectionString property of the SourceConnectionFlatFile connection manager to the FilePath variable
  8. Add a script task to move the data files to a backup folder so that the same files will not be extracted multiple times

Resources:

BatchETL.zip