SQLSolutions and SSIS: April 2013

Tuesday, April 30, 2013

SQL SERVER – Better Performance – LEFT JOIN or NOT IN?

http://blog.sqlauthority.com/2008/04/22/sql-server-better-performance-left-join-or-not-in/

First of all answer this question : Which method of T-SQL is better for performance LEFT JOIN or NOT IN when writing query? Answer is : It depends! It all depends on what kind of data is and what kind query it is etc. In that case just for fun guess one option LEFT JOIN or NOT IN. If you need to refer the query which demonstrates the mentioned clauses, review following two queries.

USE AdventureWorks;GOSELECT ProductIDFROM Production.ProductWHERE ProductIDNOT IN (SELECT ProductIDFROM Production.WorkOrder);GOSELECT p.ProductIDFROM Production.Product pLEFT JOIN Production.WorkOrder w ON p.ProductID = w.ProductIDWHERE w.ProductID IS NULL;GO

Now let us examine the actual execution plan of both the queries. Click on image to see larger image.

You can clearly observe that first query with NOT IN takes 20% resources of execution plan and LEFT JOIN takes 80% resources of execution plan. It is better to use NOT IN clause over LEFT JOIN in this particular example. Please note that this is not generic conclusion and applies to this example only. Your results may vary on many factors. Let me know your comments if you have guessed this correct or not.

SQL SERVER – Query to Retrieve the Nth Maximum Value

http://blog.sqlauthority.com/2007/04/27/sql-server-query-to-retrieve-the-nth-maximum-value/

Replace Employee with your table name, and Salary with your column name. Where N is the level of Salary to be determined.

SELECT *FROM Employee E1WHERE (N-1) = (SELECT COUNT(DISTINCT(E2.Salary))FROM Employee E2WHERE E2.Salary > E1.Salary)

In the above example, the inner query uses a value of the outer query in its filter condition meaning; the inner query cannot be evaluated before evaluating the outer query. So each row in the outer query is evaluated first and the inner query is run for that row.

SQL SERVER – Difference Between Unique Index vs Unique Constraint

http://blog.sqlauthority.com/2007/04/26/sql-server-difference-between-unique-index-vs-unique-constraint/

Unique Index and Unique Constraint are the same. They achieve same goal. SQL Performance is same for both.

Add Unique Constraint

ALTER TABLE dbo.<tablename> ADD CONSTRAINT<namingconventionconstraint> UNIQUE NONCLUSTERED(
<columnname>
) ON [PRIMARY]

Add Unique Index

CREATE UNIQUE NONCLUSTERED INDEX<namingconventionconstraint> ON dbo.<tablename>
(
<columnname>
) ON [PRIMARY]

There is no difference between Unique Index and Unique Constraint. Even though syntax are different the effect is the same. Unique Constraint creates Unique Index to maintain the constraint to prevent duplicate keys. Unique Index or Primary Key Index are physical structure that maintain uniqueness over some combination of columns across all rows of a table. It is a convenient way to enforce a Unique Constraint for SQL Server.

Disable/Enable All the Foreign Key in Database

http://blog.sqlauthority.com/2013/04/29/sql-server-disable-all-the-foreign-key-constraint-in-database-enable-all-the-foreign-key-constraint-in-database/

Is there any way I can disable all the Constraint temporarily and load the random data, test my system and later delete all the inserted data and enable the Constraint back?”

This is indeed a great question, I often come across this question again and again. Here is a quick script I have written in my early career which I still use it when I need to do something similar.

-- Disable all the constraint in databaseEXEC sp_msforeachtable "ALTER TABLE ? NOCHECK CONSTRAINT all"

-- Enable all the constraint in databaseEXEC sp_msforeachtable "ALTER TABLE ? WITH CHECK CHECK CONSTRAINT all"

Remember above script when executed right away enable or disable constraints so be extremely careful to execute on production server.

There is one more thing, when you have disabled the constraint, you can delete the data from the table but if you attempt to truncate the table, it will still give you an error. If you need to truncate the table you will have to actually drop all the constraints. Do you use similar script in your environment? If yes, please leave a comment along with the script and I will post it on blog with due credit

SQL SERVER – Query to Find Seed Values, Increment Values and Current Identity Column value of the table

http://blog.sqlauthority.com/2007/04/23/sql-server-query-to-find-seed-values-increment-values-and-current-identity-column-value-of-the-table/

Following script will return all the tables which has identity column. It will also return the Seed Values, Increment Values and Current Identity Column value of the table.

SELECT IDENT_SEED(TABLE_NAME) AS Seed,IDENT_INCR(TABLE_NAME) AS Increment,IDENT_CURRENT(TABLE_NAME) AS Current_Identity,TABLE_NAMEFROM INFORMATION_SCHEMA.TABLESWHERE OBJECTPROPERTY(OBJECT_ID(TABLE_NAME), 'TableHasIdentity') = 1AND TABLE_TYPE = 'BASE TABLE'

SQL SERVER – @@DATEFIRST and SET DATEFIRST Relations and Usage

http://blog.sqlauthority.com/2007/04/22/sql-server-datefirst-and-set-datefirst-relations-and-usage/

The master database’s syslanguages table has a DateFirst column that defines the first day of the week for a particular language. SQL Server with US English as default language, SQL Server sets DATEFIRST to 7 (Sunday) by default. We can reset any day as first day of the week using

SET DATEFIRST 5
This will set Friday as first day of week.
@@DATEFIRST returns the current value, for the session, of SET DATEFIRST.

SET LANGUAGE italian
GOSELECT @@DATEFIRSTGO----This will return result as 1(Monday)SET LANGUAGE us_english
GOSELECT @@DATEFIRSTGO----This will return result as 7(Sunday)

In this way @@DATEFIRST and SET DATEFIRST are related.

Thursday, April 25, 2013

Find Currently Running Scheduled Job

http://www.sqlservercentral.com/articles/Automation/97447/

We have a job that frequently hangs. It is trying to import data from Oracle through a linked server and for some reason tends to just wait forever instead of timing out or failing. We know how long the job should take, so we decided to stop the job if it took twice as long as it should and restart it. Since the job truncates all the tables it is importing into as it starts, there's no reason to avoid simply restart the job from the beginning.
After doing this manually for awhile, we decided to automate the process. The system stored procedure msdb.dbo.sp_stop_job works just fine if you know the name of the job you're trying to stop and the same goes for sp_start_job. Automating those two aspects is no problem. The tricky part is programmatically determining when the job is still running. You can use sp_help_job, but since it returns a table, you need to be able to insert that data into a table to find the data you want. sp_help_job is mainly for visual and manual use. I found details on sp_help_job here: Detecting The State of a SQL Server Agent Job.
Since sp_help_job does what I need, clearly the next step is to look inside sp_help_job and see what it does to see if I can automate the process. The stored procedure (sp) sp_help_job is located in the msdb database which can be found under SystemDatabases. Expand StoredProcedures and then SystemStoredProcedures and marvel at the number of stored procedures located there. This is a great place to explore when you have time, there are tons of helpful stored procedures here.
I opened sp_help_jobs and checked for what would tell me if the job was currently running. (To open it, right click on the sp name and click Modify. Be careful you don't change anything or hit execute while the sp is open. If you're worried, comment out the 'ALTER PROCEDURE' line when you first open it.) There's a variable called @execution_status that looks like a good candidate, let's see where that's used. Scroll down through the sp and you'll see @execution_status is fed to sp_get_composite_job_info.
If we open sp_get_composite_job_info, we'll see the variable 'running' in the last declared table, that's what we're looking for. A little further down is how this table is populated. The first step in populating the table is a statement to determine if you're running SQL Server 8 or newer. I am, so the step I want is

EXECUTE
master.dbo.xp_sqlagent_enum_jobs @can_see_all_running_jobs, @job_owner, @job_id

The only difference is that in and after SQL Server 8, @job_id is a parameter. Before SQL Server 8, there is no @job_id parameter.
Since we're going as deep as we can to get just the parts we need, let's look at xp_sqlagent_enum_jobs. Note that this starts with xp_, not sp_ and it is in a different database. Move over to Master. Expand Extended Stored Procedures and then System Extended Stored Procedures. Here you'll find sys.xp_sqlagent_enum_jobs. Why this is in the schema 'sys' instead of the schema 'dbo' as listed in the Execute, I don't know. I can't find it in any folders with the dbo schema. There's also no way to look into this code. At least no way I know. Right clicking on it just brings a prompt to Start PowerShell. Checking Properties shows it is a DLL, I'd likely need a decompiler to take a look.
This is low enough to be a good stopping point though. It gets me the Running result in a format where I can use the value immediately and doesn't get me a ton of things I don't need at the moment. So how do we use this?
There's no need to reinvent the wheel, so just copy the parts we'll need from sp_get_composite_job_info. That's the declarations

DECLARE @can_see_all_running_jobs INTDECLARE @job_owner sysname
DECLARE @job_id uniqueidentifier

And the table declaration

DECLARE @xp_results TABLE 
(
job_id UNIQUEIDENTIFIER NOT NULL,
last_run_date INT NOT NULL,
last_run_time INT NOT NULL,
next_run_date INT NOT NULL,
next_run_time INT NOT NULL,
next_run_schedule_id INT NOT NULL,
requested_to_run INT NOT NULL, -- BOOL
request_source INT NOT NULL,
request_source_id sysname COLLATE database_default NULL,
running INT NOT NULL, -- BOOL
current_step INT NOT NULL,
current_retry_attempt INT NOT NULL,
job_state INT NOT NULL)

We'll also copy the parts that populate the variables

SELECT @can_see_all_running_jobs = ISNULL(IS_SRVROLEMEMBER(N'sysadmin'), 0)SELECT @job_owner = SUSER_SNAME()SELECT @JOB_ID = job_id FROM msdb.dbo.sysjobs_view
                  WHERE name = 'RunOften'

And the part that populates the table we need

INSERT INTO @xp_results
EXECUTE master.dbo.xp_sqlagent_enum_jobs @can_see_all_running_jobs, @job_owner, @job_id

Normally I don't like to populate a table without specifying the columns, but as I can't declare the columns in a following SELECT statement and this code is directly from a system stored procedure, I'm letting it go.
Now that we have all the data, we have to do something with it. We'll declare a variable to store our result

DECLARE @IsRunning bit = 0SELECT @IsRunning = xpr.running
FROM @xp_results xpr

Pass in the name of the job you want to check, then just check the Running stat. You don't need to join it to any other tables, just see if @IsRunning = 1 and if it does, stop the job with sp_stop_job

IF @IsRunning = 1
 BEGIN
 exec msdb.dbo.sp_stop_job @JobName
 WAITFOR DELAY '00:01:00'
 exec msdb.dbo.sp_start_job @JobName
 END

If this is a job you want to start again automatically, you can start it again with sp_start_job, but I recommend including a 1 minute wait before you do to make sure the job has stopped successfully before trying to start it again.
If you have multiple schedules and want to find out which one is running, the process is slightly different. First we declare a variable of our own

DECLARE @Phase tinyint = 0

And we'll assign a value to it based on the information we've retrieved from the xp_ procedure

SELECT @Phase = CASE S.name WHEN 'First Run' THEN 1 
WHEN 'Second Run' THEN 2WHEN 'Third Run' THEN 3WHEN 'Fourth Run' THEN 4ENDFROM @xp_results xpr
LEFT JOIN msdb.dbo.sysjobschedules JS
ON xpr.job_id = JS.job_id
AND xpr.next_run_schedule_id = JS.schedule_id
LEFT JOIN msdb.dbo.sysschedules S
ON JS.schedule_id = S.schedule_id
WHERE xpr.running = 1

In my job I have four distinct scheduled run times listed as 'First Run', 'Second Run', 'Third Run' and 'Fourth Run'. We can relate these named schedules back to the data we've retrieved through the system tables msdb.dbo.sysjobschedules and msdb.dbo.sysschedules. In sysschedules, the schedule's names are listed, we'll use that in the CASE statement to assign a number from 1 through 4. We just have to check the table we've brought back to see which of the schedules is currently running. The 'running' value for that job will be 1. (When you have a chance, check out sysjobschedules and sysschedules, you may find other useful information in there and it is good to understand the tables you're referencing.)
Finally, we return that @Phase value to the process calling this stored procedure

RETURN @Phase

We can perform whatever actions we need based on that. For example, I truncate the day's temporary tables at the beginning of the job, but only if the scheduled job = 1.
One other handy piece of information. When you're looking at the Job Activity Monitor, you'll notice there are two columns: Last Run and Next Run. Last Run refers to the last completed run of the job. Next Run refers to the next time the job runs, but doesn't change until that run has completed. So if you have a job that runs at 5, 6, 7 and 8 and you look at the Job Activity Monitor at 7:15 and Next Run says 7, that just means that the 7:00 job is still running. If it says 6:00, that means your 6:00 job is running over time, not that your 5:00 job is running long. I find this information useful in determining if a job has gone badly enough to stop it and investigate or if it is just a little slow. The terms can be confusing though, so remember, Last and Next refer to job completion, not starts.

How do I return a few of my resultset rows at the bottom of the entire resultset

Let us see the scenario one more time. In this following diagram you can notice that there are two rows which are with ProductID 712 and 715. The entire resultset is ordered by column ProductID. Now our final requirement is that we want row 715 to be the second last row in the resultset and 712 as a very last row in the resultset. Looking from outside the entire thing looks very simple however, in reality it is not as simple as it looks.

First look at the image below and see if you can come up with the solution to this problem.

Here is the script on AdventureWorks database which I have written generates the result as we have displayed in the image below.

USE AdventureWorks2012
GOSELECT [ProductID], COUNT(*) CountofProductIDFROM [Sales].[SalesOrderDetail]WHERE ProductID BETWEEN 707 AND 716GROUP BY [ProductID]
GO

Now instead of writing CASE statement in ORDER BY clause we will now write UNION ALL clause. In this case if you see there are two different values which we want at the bottom of the resultset.

USE AdventureWorks2012
GOSELECT [ProductID], COUNT(*) CountofProductIDFROM [Sales].[SalesOrderDetail]WHERE (ProductID BETWEEN 707 AND 716)
AND ProductID <> 715 AND ProductID <> 712GROUP BY [ProductID]UNION ALLSELECT [ProductID], COUNT(*) CountofProductIDFROM [Sales].[SalesOrderDetail]WHERE ProductID = 715GROUP BY [ProductID]UNION ALLSELECT [ProductID], COUNT(*) CountofProductIDFROM [Sales].[SalesOrderDetail]WHERE ProductID = 712GROUP BY [ProductID]
GO

Here is the resultset which we expected. We have to use two different ORDER BY clause to get the desired result. In our case if we have more than 2 special cases like these we will need keep on adding more and more UINON clauses and that will make this script more confusing and not easy to read at all.

Now let us compare the performance of the two different queries one from earlier blog post and one from current blog post. Execute following two queries together and check their execution plan. In the execution plan can be enabled by using CTRL + M keyword.

-- Method 1 - CASE and ORDER BYUSE AdventureWorks2012
GOSELECT [ProductID], COUNT(*) CountofProductIDFROM [Sales].[SalesOrderDetail]WHERE ProductID BETWEEN 707 AND 716GROUP BY [ProductID]ORDER BY CASE WHEN [ProductID] = 715 THEN 1WHEN [ProductID] = 712 THEN 2 ELSE 0 ENDGO-- Method 2 - UNION ALLUSE AdventureWorks2012
GOSELECT [ProductID], COUNT(*) CountofProductIDFROM [Sales].[SalesOrderDetail]WHERE (ProductID BETWEEN 707 AND 716)
AND ProductID <> 715 AND ProductID <> 712GROUP BY [ProductID]UNION ALLSELECT [ProductID], COUNT(*) CountofProductIDFROM [Sales].[SalesOrderDetail]WHERE ProductID = 715GROUP BY [ProductID]UNION ALLSELECT [ProductID], COUNT(*) CountofProductIDFROM [Sales].[SalesOrderDetail]WHERE ProductID = 712GROUP BY [ProductID]
GO

You will clearly notice that the solution with CASE and ORDER BY is a much better scenario than using UNION ALL clause.

So far we have seen two examples 1) CASE and ORDER BY clause and 2) UNION ALL clause. If you know any other trick to get the similar result, please leave a comment and I will post this on my blog with due credit.

Wednesday, April 24, 2013

Delete duplicate records

create table xyz(id int,name varchar(500))

insert into xyz select '1','a'insert into xyz select '1','b'insert into xyz select '2','c'insert into xyz select '2','d'
with xyz1 as (select *,rn=ROW_NUMBER() over (partition by id order by id) from xyz)delete from xyz1 where rn>1

Thursday, April 4, 2013

SQL SERVER – Group by Rows and Columns using XML PATH – Efficient Concating Trick

I hardly get hard time to come up with the title of the blog post. This was one of the blog post even though simple, I believe I have not come up with appropriate title. Any way here is the question I received.

"I have a table of students and the courses they are enrolled with the name of the professor besides it. I would like to group the result with course and instructor name.

For example here is my table:

How can I generate result as following?

Now you can see how easy the question is but so hard to come up with either solution or title of this blog post. We can use XML PATH and come up with the solution where we combine two or more columns together and display desired result.

Here is the quick script which does the task ask, I have used temporary tables so you can just take this script and quickly run on your machine and see how it returns results.

Let me know if there are any better ways to do the same.

-- Create tableCREATE TABLE #TestTable (StudentName VARCHAR(100), Course VARCHAR(100), Instructor VARCHAR(100), RoomNo VARCHAR(100))GO-- Populate tableINSERT INTO #TestTable (StudentName, Course, Instructor, RoomNo)SELECT 'Mark', 'Algebra', 'Dr. James', '101'UNION ALLSELECT 'Mark', 'Maths', 'Dr. Jones', '201'UNION ALLSELECT 'Joe', 'Algebra', 'Dr. James', '101'UNION ALLSELECT 'Joe', 'Science', 'Dr. Ross', '301'UNION ALLSELECT 'Joe', 'Geography', 'Dr. Lisa', '401'UNION ALLSELECT 'Jenny', 'Algebra', 'Dr. James', '101'GO-- Check orginal dataSELECT *FROM #TestTableGO-- Group by Data using column and XML PATHSELECTStudentName,STUFF((SELECT ', ' + Course + ' by ' + CAST(Instructor AS VARCHAR(MAX)) + ' in Room No ' + CAST(RoomNo AS VARCHAR(MAX))FROM #TestTableWHERE (StudentName = StudentCourses.StudentName)FOR XML PATH (''))
,1,2,'') AS NameValuesFROM #TestTable StudentCoursesGROUP BY StudentName
GO-- Clean upDROP TABLE #TestTableGO

Create Indexed Views

http://msdn.microsoft.com/en-us/library/ms191432.aspx

Server 2008 R2

Other Versions

2 out of 4 rated this helpful - Rate this topic

A view must meet the following requirements before you can create a clustered index on it:

The ANSI_NULLS and QUOTED_IDENTIFIER options must have been set to ON when the CREATE VIEW statement was executed. The OBJECTPROPERTY function reports this for views through the ExecIsAnsiNullsOn or ExecIsQuotedIdentOn properties.
The ANSI_NULLS option must have been set to ON for the execution of all CREATE TABLE statements that create tables referenced by the view.
The view must not reference any other views, only base tables.
All base tables referenced by the view must be in the same database as the view and have the same owner as the view.
The view must be created with the SCHEMABINDING option. Schema binding binds the view to the schema of the underlying base tables.
User-defined functions referenced in the view must have been created with the SCHEMABINDING option.
Tables and user-defined functions must be referenced by two-part names in the view. One-part, three-part, and four-part names are not allowed.

All functions referenced by expressions in the view must be deterministic. The IsDeterministic property of the OBJECTPROPERTY function reports whether a user-defined function is deterministic. For more information, see Deterministic and Nondeterministic Functions.

Note
When you refer to datetime and smalldatetime string literals in indexed views in SQL Server 2008, we recommend that you explicitly convert the literal to the date type you want by using a deterministic date format style. For a list of the date format styles that are deterministic, see CAST and CONVERT (Transact-SQL). Expressions that involve implicit conversion of character strings to datetime or smalldatetime are considered nondeterministic, unless the compatibility level is set to 80 or earlier. This is because the results depend on the LANGUAGE and DATEFORMAT settings of the server session. For example, the results of the expression CONVERT (datetime, '30 listopad 1996', 113) depend on the LANGUAGE setting because the string 'listopad' means different months in different languages. Similarly, in the expression DATEADD(mm,3,'2000-12-01'), SQL Server interprets the string '2000-12-01' based on the DATEFORMAT setting.

Note

When you refer to datetime and smalldatetime string literals in indexed views in SQL Server 2008, we recommend that you explicitly convert the literal to the date type you want by using a deterministic date format style. For a list of the date format styles that are deterministic, see CAST and CONVERT (Transact-SQL). Expressions that involve implicit conversion of character strings to datetime or smalldatetime are considered nondeterministic, unless the compatibility level is set to 80 or earlier. This is because the results depend on the LANGUAGE and DATEFORMAT settings of the server session. For example, the results of the expression CONVERT (datetime, '30 listopad 1996', 113) depend on the LANGUAGE setting because the string 'listopad' means different months in different languages. Similarly, in the expression DATEADD(mm,3,'2000-12-01'), SQL Server interprets the string '2000-12-01' based on the DATEFORMAT setting.

Implicit conversion of non-Unicode character data between collations is also considered nondeterministic, unless the compatibility level is set to 80 or earlier.
Creating indexes on views that contain these expressions is not allowed in 90 compatibility mode. However, existing views that contain these expressions from an upgraded database are maintainable. If you use indexed views that contain implicit string to date conversions, be certain that the LANGUAGE and DATEFORMAT settings are consistent in your databases and applications to avoid possible indexed view corruption.

If the view definition uses an aggregate function, the SELECT list must also include COUNT_BIG (*).
The data access property of a user-defined function must be NO SQL, and external access property must be NO.
Common language runtime (CLR) functions can appear in the select list of the view, but cannot be part of the definition of the clustered index key. CLR functions cannot appear in the WHERE clause of the view or the ON clause of a JOIN operation in the view.

CLR functions and methods of CLR user-defined types used in the view definition must have the properties set as shown in the following table.

Property	Note
DETERMINISTIC = TRUE	Must be declared explicitly as an attribute of the Microsoft .NET Framework method
PRECISE = TRUE	Must be declared explicitly as an attribute of the .NET Framework method.
DATA ACCESS = NO SQL	Determined by setting DataAccess attribute to DataAccessKind.None and SystemDataAccess attribute to SystemDataAccessKind.None.
EXTERNAL ACCESS = NO	This property defaults to NO for CLR routines.

For more information about how to set attributes of CLR routine methods, see Custom Attributes for CLR Routines.

Caution
We do not recommend setting the properties of CLR routine methods in contradiction to the functionality of the method. Doing this could lead to data corruption.

The SELECT statement in the view cannot contain the following Transact-SQL syntax elements:
- The * or table_name.* syntax to specify columns. Column names must be explicitly stated.
- A table column name used as a simple expression cannot be specified in more than one view column. A column can be referenced multiple times provided all, or all but one, reference to the column is part of a complex expression or a parameter to a function. For example, the following select list is not valid:
  Copy
  SELECT ColumnA, ColumnB, ColumnA
  This select list is valid:
  Copy
  SELECT SUM(ColumnA) AS SumColA, ColumnA % ColumnB AS ModuloColAColB, COUNT_BIG(*) AS cBig FROM dbo.T1 GROUP BY ModuloColAColB
- An expression on a column used in the GROUP BY clause, or an expression on the results of an aggregate.
- A derived table.
- A common table expression (CTE).
- Rowset functions.
- UNION, EXCEPT or INTERSECT operators.
- Subqueries.
- Outer or self joins.
- TOP clause.
- ORDER BY clause.
- DISTINCT keyword.
- COUNT (COUNT_BIG(*) is allowed.)
- The AVG, MAX, MIN, STDEV, STDEVP, VAR, or VARP aggregate functions. If AVG(expression) is specified in queries referencing the indexed view, the optimizer can frequently calculate the needed result if the view select list contains SUM(expression) and COUNT_BIG(expression). For example, an indexed view SELECT list cannot contain the expression AVG(column1). If the view SELECT list contains the expressions SUM(column1) and COUNT_BIG(column1), SQL Server can calculate the average for a query that references the view and specifies AVG(column1).
- A SUM function that references a nullable expression.
- The OVER clause, which includes ranking or aggregate window functions.
- A CLR user-defined aggregate function.
- The full-text predicates CONTAINS or FREETEXT.
- COMPUTE or COMPUTE BY clause.
- The CROSS APPLY or OUTER APPLY operators.
- The PIVOT or UNPIVOT operators
- Table hints (applies to compatibility level of 90 or higher only).
- Join hints.
- Direct references to Xquery expressions. Indirect references, such as Xquery expressions inside a schema-bound user-defined function, are acceptable.
If GROUP BY is specified, the view select list must contain a COUNT_BIG(*) expression, and the view definition cannot specify HAVING, ROLLUP, CUBE, or GROUPING SETS.

Requirements for the CREATE INDEX Statement

The first index created on a view must be a unique clustered index. After the unique clustered index has been created, you can create additional nonclustered indexes. The naming conventions for indexes on views are the same as for indexes on tables. The only difference is that the table name is replaced with a view name. For more information, see CREATE INDEX (Transact-SQL).
The CREATE INDEX statement must meet the following requirements as well as the regular CREATE INDEX requirements:

The user that executes the CREATE INDEX statement must be the view owner.
The following SET options must be set to ON when the CREATE INDEX statement is executed:
- ANSI_NULLS
- ANSI_PADDING
- ANSI_WARNINGS
- CONCAT_NULL_YIELDS_NULL
- QUOTED_IDENTIFIER
The NUMERIC_ROUNDABORT option must be set to OFF. This is the default setting.
If the database is running in 80 compatibility mode or earlier, the ARITHABORT option must be set to ON.
When you create a clustered or nonclustered index, the IGNORE_DUP_KEY option must be set to OFF (the default setting).
The view cannot include text, ntext, or image columns, even if they are not referenced in the CREATE INDEX statement.
If the SELECT statement in the view definition specifies a GROUP BY clause, the key of the unique clustered index can reference only columns specified in the GROUP BY clause.
An imprecise expression that forms the value of an index key column must reference a stored column in a base table underlying the view. This column may be a regular stored column or a persisted computed column. No other imprecise expressions can be part of the key column of an indexed view.

Considerations

The setting of the large_value_types_out_of_row option of columns in an indexed view is inherited from the setting of the corresponding column in the base table. This value is set by using sp_tableoption. The default setting for columns formed from expressions is 0. This means that large value types are stored in-row. For more information, see Using Large-Value Data Types.
After the clustered index is created, any connection that tries to modify the base data for the view must also have the same option settings required to create the index. SQL Server generates an error and rolls back any INSERT, UPDATE, or DELETE statement that will affect the result set of the view if the connection executing the statement does not have the correct option settings. For more information, see SET Options That Affect Results.
All indexes on a view are dropped when the view is dropped. All nonclustered indexes and auto-created statistics on the view are dropped when the clustered index is dropped. User-created statistics on the view are maintained. Nonclustered indexes can be individually dropped. Dropping the clustered index on the view removes the stored result set, and the optimizer returns to processing the view like a standard view.
Although only the columns that make up the clustered index key are specified in the CREATE UNIQUE CLUSTERED INDEX statement, the complete result set of the view is stored in the database. As in a clustered index on a base table, the B-tree structure of the clustered index contains only the key columns, but the data rows contain all the columns in the view result set.
If you want to add indexes to views in an existing system, you must schema bind any view on which you want to place an index. You can perform the following operations:

Drop the view and re-create it specifying WITH SCHEMABINDING.

You can create a second view that has the same text as the existing view but a different name. The optimizer considers the indexes on the new view, even if it is not directly referenced in the FROM clause of queries.

Note
Views or tables that participate in a view created with the SCHEMABINDING clause cannot be dropped, unless the view is dropped or changed so that it no longer has schema binding. Additionally, ALTER TABLE statements on tables that participate in views having schema binding will fail if these statements affect the view definition.

You must make sure that the new view meets all the requirements of an indexed view. This may require that you change the ownership of the view and all base tables it references so they are all owned by the same user.
Indexes on tables and views can be disabled. When a clustered index on a table is disabled, indexes on views associated with the table are also disabled. For more information, see Disabling Indexes.

Example

The following example creates a view and an index on that view. Two queries are included that use the indexed view.

Transact-SQL

Copy

USE AdventureWorks2008R2;
GO
--Set the options to support indexed views.
SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT,
    QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO
--Create view with schemabinding.
IF OBJECT_ID ('Sales.vOrders', 'view') IS NOT NULL
DROP VIEW Sales.vOrders ;
GO
CREATE VIEW Sales.vOrders
WITH SCHEMABINDING
AS
    SELECT SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS Revenue,
        OrderDate, ProductID, COUNT_BIG(*) AS COUNT
    FROM Sales.SalesOrderDetail AS od, Sales.SalesOrderHeader AS o
    WHERE od.SalesOrderID = o.SalesOrderID
    GROUP BY OrderDate, ProductID;
GO
--Create an index on the view.
CREATE UNIQUE CLUSTERED INDEX IDX_V1 
    ON Sales.vOrders (OrderDate, ProductID);
GO
--This query can use the indexed view even though the view is 
--not specified in the FROM clause.
SELECT SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS Rev, 
    OrderDate, ProductID
FROM Sales.SalesOrderDetail AS od
    JOIN Sales.SalesOrderHeader AS o ON od.SalesOrderID=o.SalesOrderID
        AND ProductID BETWEEN 700 and 800
        AND OrderDate >= CONVERT(datetime,'05/01/2002',101)
GROUP BY OrderDate, ProductID
ORDER BY Rev DESC;
GO
--This query can use the above indexed view.
SELECT  OrderDate, SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS Rev
FROM Sales.SalesOrderDetail AS od
    JOIN Sales.SalesOrderHeader AS o ON od.SalesOrderID=o.SalesOrderID
        AND DATEPART(mm,OrderDate)= 3
        AND DATEPART(yy,OrderDate) = 2002
GROUP BY OrderDate
ORDER BY OrderDate ASC;
GO

ERRORFILE and MAXERRORS option with BULK INSERT

In the below code MAXERRORS argument defines the total number of records which can be rejected before entire file will be rejected by the system. It is a tolerant level and can be any integer number based on your discretion and requirement
ERRORFILE will define the name and path of the error log file which will contain the erroneous records rejected during Bulk Insert. It will contain maximum of MAXERRORS+1 records.
So in the above script complete file will be rejected only if more than 500 records are rejected during BULK INSERT and 501 records will be logged into Error log file which will be present at the path defined in ERRORFILE argument.
While if less than 500 records are rejected (or no records are rejected) than those records will be logged into error log file while rest of the records will be successfully loaded into the staging table.

Conclusion :

Although these two arguments are rarely used but they can be very helpful if you want to know about the records which are rejected and also to set the tolerance level of the file loaded by defining MAXERRORS value.

DECLARE @SQL   varchar(2000)
DECLARE @FileToLoad   varchar(100)
DECLARE @DestinationTableName   varchar(50)
DECLARE @StartingRow   int
DECLARE @FormatFile  varchar(50)
DECLARE @ErrorLogFile varchar(50)
SET @FileToLoad = '\\servername\foldername\test.txt' --This is the name and path of the file which is to be BULK INSERTED into the staging table

SET @DestinationTableName = 'STAGING_TABLE' --Name of the staging table into which data from .txt file need to be loaded

SET @StartingRow = 2 --Tell us about the starting row in text file which needs to be loaded into staging table. 1st row is left as header

SET @FormatFile = '\\servername\Format_file\test.fmt' --path and name of the format file.Format file with extension.fmt is used to define the mapping between text file columns and staging table columns

SET @ErrorLogFile = '\\servername\TestErrorLog.txt' --path and name of the errorlog file.Records rejected during Bulk Insert will be saved in this error log file

SELECT @SQL = 'BULK INSERT DATABASENAME.dbo.' + @DestinationTableName + ' FROM ''' + @ FileToLoad + '''
   WITH ( FIRSTROW = ' + RTRIM(STR(@StartingRow)) + ',
    MAXERRORS = 500,
    FORMATFILE = ''' + @FormatFile + ''',
    ERRORFILE =''' + @ ErrorLogFile + ''',
    ) '
EXECUTE (@SQL)

Error logging with Bulk Insert

The bulk insert is used to bulk load data into staging tables from .txt or .csv files. It is very fast compared to normal insert statements because by default, CHECK and FOREIGN KEY constraints are disabled. Although this behavior can be controlled by using CHECK_CONSTRAINTS, it is highly recommended for better performance not to use this argument. For more information on the Bulk Insert statement please refer to the Microsoft documentation: http://msdn.microsoft.com/en-us/library/ms188365.aspx
The Bulk Insert statement rejects all the records that do not match the column data type or size such as:

We have a column defined as an Integer and a value present in that column is actually composed of characters.
The size defined is varchar(10) and actually the length of the value is more than 10 characters.

Only the records that violates the above rules are rejected and not the entire file. Again this behavior can be controlled by using the MAXERRORS argument. This argument defines how many records can be rejected before rejecting the complete file.
Let's examine a couple scenarios to show how this works
Scenario 1: Suppose we have defined the value of MAXERRORS to be 10 and 5 records are rejected during the Bulk Insert process. The rest of the records will be loaded successfully, and only those 5 records that don't meet the criteria are not loaded.
Scenario 2: Suppose we have defined the value of MAXERRORS to be 10 and 11 records are rejected during the Bulk Insert process. Then the entire file will be rejected and no records will be loaded into staging table.
Now the question arises about the 5 records rejected during Scenario 1. We can save those records in a different file if we enable error logging features. Use the argument ERRORFILE and give the path and name of the file where you want to log such records that are rejected during BULK INSERT.
Remember if the entire file is rejected due to exceeding the value of MAXERRORS value then only MAXERRORS+1 records will be logged in the error log file. This is because the system checks the file only to the point where it gets to MAXERRORS + 1 and not beyond this record.
Along with the error log file, another file is created by default by the system which has the extension .txt.error. It will contain row number and offset of erroneous records. Note that there are some issues with this file (.txt.error) in SQL Server 2005 SP2, which are fixed in the SP4 patch. The issue is SQL Server creates this file two times during the execution of BULK INSERT. When it is created the second time, it gives the error that the file already exists, which leads to the failure of the BULK INSERT statement.

Conclusion

Using the argument ERRORFILE in BULK INSERT statement helps you in identifying the records which are rejected and to fix those records and reload into the system. The MAXERRORS parameter also allows you to control whether you want to allow a file to load that might have specified number of errors.

Wednesday, April 3, 2013

Importing CSV file into SQLServer and create table dynamically

public void Main()

{

try{
// TODO: Add your code hereSqlConnection myADONETConnection = new SqlConnection();
string sDatabase = Dts.Variables["mDatabase"].Value.ToString();
// sDatabase = "MCUK";//Dts.Connections["Test1"].ConnectionString = "Data source=172.29.17.56;initial catalog=" + sDatabase + ";uid=sa;password=password1";Dts.Connections["Con"].ConnectionString = "Data source=86.54.116.59;initial catalog=" + sDatabase + ";Integrated Security=SSPI;";myADONETConnection = (
SqlConnection)(Dts.Connections["Con"].AcquireConnection(Dts.Transaction) as SqlConnection);
//MessageBox.Show(myADONETConnection.ConnectionString, "Test1");string line1 = "";//Reading file names one by onestring filenameonly1 = "TempCSVParticipation";
//string CompleteDirectory = Dts.Variables["mFilePath"].Value.ToString();//string ActualFileName = Path.GetFileName(CompleteDirectory);string SourceDirectory = Dts.Variables["mFilePath"].Value.ToString();
//SourceDirectory = @"S:\Production\MCUK\Extracts\201302\Participation";TrialBalanceLog("database-"+sDatabase, myADONETConnection);TrialBalanceLog(
"directory-" + SourceDirectory, myADONETConnection);
//MessageBox.Show(sDatabase);//MessageBox.Show(ActualFileName);//MessageBox.Show(SourceDirectory);string[] fileEntries = Directory.GetFiles(SourceDirectory);TrialBalanceLog(SourceDirectory, myADONETConnection);

foreach (string fileName in fileEntries){

//MessageBox.Show("f-" + fileName.ToString());TrialBalanceLog(fileName, myADONETConnection);

//if (fileName.ToString().ToUpper() == (SourceDirectory +'\\'+ ActualFileName).ToUpper())if (fileName.ToString().Contains("Actual_Participation")){
TrialBalanceLog(
"1-" + fileName, myADONETConnection);
//MessageBox.Show("1" + fileName);System.IO.StreamReader file2 = new System.IO.StreamReader(fileName);
//MessageBox.Show("2");string filenameonly = (((fileName.Replace(SourceDirectory, "")).Replace(".csv", "")).Replace("\\", ""));
//MessageBox.Show("3");line1 = (" IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo]." + filenameonly1 + "') AND type in (N'U')) DROP TABLE [dbo]." + filenameonly1 + " Create Table dbo." + filenameonly1 + "([" + file2.ReadLine().Replace(",", "] NVARCHAR(4000), [") + "] NVARCHAR(4000))").Replace(".txt", ""); file2.Close();
//MessageBox.Show("4");SqlCommand myCommand = new SqlCommand(line1, myADONETConnection);
//MessageBox.Show(line1);myCommand.ExecuteNonQuery();
//MessageBox.Show("6");line1 = "BULK INSERT " + filenameonly1 + " FROM '" + fileName + "' WITH (FIELDTERMINATOR = ',',ROWTERMINATOR = '\\n' ) ";
//MessageBox.Show(line1);SqlCommand myCommand1 = new SqlCommand(line1, myADONETConnection);
//MessageBox.Show("8");myCommand1.CommandTimeout = 0;
//MessageBox.Show("9");myCommand1.ExecuteNonQuery();
//MessageBox.Show("10");//MessageBox.Show(fileName.ToString() + " Completed");}
}
Dts.TaskResult = (int)ScriptResults.Success;}

catch (Exception e){

//MessageBox.Show("Error-" + e.Message);}
}