Monday, July 1, 2013

SSIS - delete files from a Network or Local path based on date

http://www.mssqltips.com/sqlservertip/2930/sql-server-integration-services-package-to-delete-files-from-a-network-or-local-path-based-on-date/

Problem

We have a requirement to delete a group of files that are older than the specified number of days from the company file share. This file share stores sensitive clients extracts (produced by DBAs as part of client’s SQL Server Agent extract jobs), database backups, scanned emails, etc. Moreover, due to the complex folder hierarchy and delicate nature of the data stored in these files, this task has to be originated from SQL Server. However, due to company security policy, and based on SQL Server security best practices, we blocked access to OLE Automation stored procedures, CLR features, and xp_cmdshell.  Is there any way to accomplish this task without using these features?  Check out this tip to learn more.

Solution

Well, this requirement can be fulfilled quiet easily by using the following SSIS toolbox components: Script Task, Foreach Loop container, and the File System Task. The following are the steps, which you can follow to accomplish this task using SSIS.

Steps to Build the SSIS Package for File Deletion

  • Launch the SQL Server Data Tools, then from the File menu, choose New, and click Project. This will launch the New Project dialog box.
  • In the New Project dialog box, select Integration Services Project from the templates pane. In the Name box, change the default name to DeleteRedundantFiles. Also, specify the project location, and once properly configured, click OK to create this new SSIS project.
  • By default, SQL Server Integration Service Project creates an empty package, named Package.dtsx, which is added to your project.  In the Solution Explorer, right-click on this package and rename it to DeleteRedundantFilesFromNetwork.dtsx.
using the following SSIS toolbox components

1: Defining Package Variables in SSIS

Next, add the following five variables to your SSIS package:
Variable Name: varFileAgeLimit
Variable Data Type: Int32
Description: This variable stores the age limit variable for the file. This variable is used within the Script Task, and accepts both positive and negative values. When a negative value is assigned to this variable, the code within the Script Task searches for files that are older than a specified number of days.  When the value is positive, the code within the Script Task searches the files that are newer than specified number of days.
Variable Name: varFileFQN
Variable Data Type: String
Description: This variable stores the file name with the path. For example: C:\DataFolder\Text.txt or \\MyServer\MyDataFolder\Test.txt.
Variable Name: varFilePattern
Variable Data Type: String
Description: The purpose of this variable is to store file name pattern. For example: *.* or *.bak or DB1_Extract.dat
Variable Name: varNetworkPath
Variable Data Type: String
Description: Stores either the Network location or the Local path, which is the code within the Script Task that will be used to search files in that location and within child directories of this location. For example: C:\MyLocation or \\MyServer\MyLocation.
Variable Name: varFileList
Variable Data Type: Object
Description: Stores the list of files that will be deleted.

2: Defining Package Tasks

Next, add and configure the following SSIS package tasks:

2.1: Configuring “Get File List – ST” Script Task

The first task you need for this solution is the Script Task. To add it to your package, simply drag it from SSIS Toolbox to design surface of the Control Flow tab. Now, on the Control Flow design surface, right-click on the newly added Script Task, then click Rename, and change the name from default name to Get File List – ST.
Next, double-click the Script Task, to open the Script Task Editor, and then configure the following properties:
  • Set the ScriptLanguage property to Microsoft Visual Studio C# 2010.
  • Add the varFileAgeLimit, varFilePattern and varNetworkPath package variables in the Script task’s ReadOnlyVariables property.
  • Add the varFileList package variable in the Script task’s ReadWriteVariables property.
  • Next, click the Edit Script button on the Script Task Editor page. This will open the VSTA integrated development environment (IDE) window. Copy the code below because this will search and return the list of files that are older than a specified number of days in a specified location, and within the child directories of this specified location. Once done, click the OK button on the Script Task Editor to save changes to the Script Task.
 double-click Script Task, to open Script Task Editor


Microsoft Visual Studio C# 2010 Code to Delete Files in the File System

#region Help:  Introduction to the script task
/* The Script Task search files in the specified location and within child directories of the specified location */
#endregion
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO; // Import for Directory class
using System.Collections; // Import for ArrayList class
#endregion
namespace ST_e10d8186ebb34debbc31bb734b0e29a5
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : 
 Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
   private string NETWORK_PATH;
   private string FILE_PATTREN;
   private bool isCheckForNewer = true;
   int fileAgeLimit;
   private ArrayList listForEnumerator = new ArrayList();
   private void GetFilesInFolder(string folderPath)
   {
      string[] AllFiles;
      DateTime fileChangeDate;
      TimeSpan fileAge;
      int fileAgeInDays;
      try
      {
         AllFiles = Directory.GetFiles(folderPath, FILE_PATTREN);
         foreach (string fileName in AllFiles)
            {
               fileChangeDate = File.GetLastWriteTime(fileName);
               fileAge = DateTime.Now.Subtract(fileChangeDate);
               fileAgeInDays = fileAge.Days;
               CheckAgeOfFile(fileName, fileAgeInDays); 
            }
         if (Directory.GetDirectories(folderPath).Length > 0)
            {
              foreach (string childFolder in Directory.GetDirectories(folderPath))
               {
                  GetFilesInFolder(childFolder);
               }
            }
      }
      catch (Exception e)
      {
        System.Windows.Forms.MessageBox.Show("Exception caught: " + e.ToString(), "Results",
            MessageBoxButtons.OK, MessageBoxIcon.Error);
      }
   }
   private void CheckAgeOfFile(string fileName, int fileAgeInDays)
   {
     if (isCheckForNewer)
      {
        if (fileAgeInDays <= fileAgeLimit)
          {
             listForEnumerator.Add(fileName);
          }
      }
      else
      {
        if (fileAgeInDays > fileAgeLimit)
          {
              listForEnumerator.Add(fileName);
          }
      }
   }
   public void Main()
   {
     // Initializing class variables with package variables
     fileAgeLimit = (int)(Dts.Variables["User::varFileAgeLimit"].Value);
     NETWORK_PATH = (string)(Dts.Variables["User::varNetworkPath"].Value);
     FILE_PATTREN = (string)(Dts.Variables["User::varFilePattern"].Value); ;
     if (fileAgeLimit < 0)
     {
       isCheckForNewer = false;
     }
     fileAgeLimit = Math.Abs(fileAgeLimit);
     GetFilesInFolder(NETWORK_PATH);
     // Return the list of files to the variable
     // for later use by the Foreach from Variable enumerator.
     Dts.Variables["User::varFileList"].Value = listForEnumerator;
     Dts.TaskResult = (int)ScriptResults.Success;
   }
   #region ScriptResults declaration
   enum ScriptResults
    {
      Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
      Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    };
    #endregion
  }
}

2.2: Configuring “Get Individual File FQN - FLC” Foreach Loop container

The next task you need is the Foreach Loop container. To add it to your package, drag it from the SSIS Toolbox to design the surface of the Control Flow tab. Now, on the Control Flow design surface, right-click the Foreach Loop container, then click Rename, and change the name from default name to Get Individual File FQN - FLC.
Next, double-click the Foreach Loop container, to open the Foreach Loop Editor. On left side of Foreach Loop Editor, click on the Collection, and then configure its properties:
  • Change Enumerator property to Foreach From Variable Enumerator.
  • Specify varFileList package variable as Enumerator configuration variable
Configuring “Get Individual File FQN - FLC” Foreach Loop container
Now, click on Variable Mappings and select varFileFQN package variable to map to the collection value.
click on Variable Mappings and select varFileFQN package variable
Once done, connect the Script task (Get File List – ST) with the Foreach Loop container (Get Individual File FQN – FLC).

2.3: Configuring “Delete Files on Remote Directory – FST” File System Task

Finally, add the File System Task to the Foreach Loop container (Get Individual File FQN – FLC). To do that, simply drag it from the SSIS Toolbox to the design surface of the Loop container (Get Individual File FQN – FLC). Then, right-click on the newly added File System Task, then click Rename, and change the name from default name to Delete Files on Remote Directory – FST.
Next, double-click on the File System Task, to open File System Task Editor, and then configure the following properties on General page:
  • Set Operation property to Delete file.
  • Set IsSourcePathVariable property to True.
  • Specify varFileFQN package variable as SourceVariable.
double-click File System Task, to open File System Task Editor
All done, our package is successfully configured, and it should look similar to figure below:
 our package is successfully configured

Testing

To test the package, simply assign values to the package variables, and then execute the package. For example, I specified the following values to package variables, to delete all files from \\JW02410\Temp\ shared folder that are older than 2 days.
assign values to package variables, and then execute the package


I specified the following values to package variables, to delete all files from \\JW02410\Temp\ shared folder that are older than 2 days
When I executed the package, it deleted all files from this location and within child directory of this location that are older than 2 days.
it deleted all files from this location

No comments:

Post a Comment