Wednesday, March 24, 2010

Removing full duplicates by using row_number() function and common table expression

This blog article is inspired by another very informative one which explains really well the meaning of the new row_number() function introduced since SQL2005. Since the original article did not give a full answer to how to delete multiple fully duplicated rows in a table, I am putting my code here for the test.

The good thing about row_number() function is that it can be based on partition of columns. New row number count starts for each partition of full duplicates and those rows with row number bigger than 1 can be easily spotted and removed. Here is the example that explains everything I am talking about here.

First of all, create a table in a test db

USE [test_db]

GO

CREATE TABLE [dbo].[Employee](
[EMPID] [int] NULL,
[FNAME] [varchar](50) NULL,
[LNAME] [varchar](50) NULL
) ON [PRIMARY]


Then let's insert many duplicated rows:

INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021110, 'MICHAEL', 'POLAND')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021110, 'MICHAEL', 'POLAND')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, 'JIM', 'KENNEDY')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, 'JIM', 'KENNEDY')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, 'JIM', 'KENNEDY')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2121000, 'JAMES', 'SMITH')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2121000, 'JAMES', 'SMITH')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2011111, 'ADAM', 'ACKERMAN')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, 'MARTHA', 'LEDERER')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, 'MARTHA', 'LEDERER')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, 'MARTHA', 'LEDERER')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, 'MARTHA', 'LEDERER')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (1021710, 'MARIAH', 'MANDEZ')
GO

If you selete everything from this table now, the following should be there:

EMPID FNAME LNAME
1021710 MARIAH MANDEZ
2011111 ADAM ACKERMAN
2021110 MICHAEL POLAND
2021110 MICHAEL POLAND
2021115 JIM KENNEDY
2021115 JIM KENNEDY
2021115 JIM KENNEDY
2121000 JAMES SMITH
2121000 JAMES SMITH
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER

And here comes the de-dup script with the help of common table expression:

with temp_employee as

( select row_number() over (partition by EMPID, FNAME, LNAME order by EMPID) as rowid, * from EMPLOYEE)
delete temp_employee where rowid > 1


Hope this article is illustrative enough.



Thursday, March 18, 2010

Several tips about query performance optimization

The original article is here.

The 4 following tips are particularly helpful and can be immediately put into daily use:

Views and Stored Procedures

We can avoid using heavy duty queries by using stored procedures and views. Stored procedures are commonly used flexible database programming objects. Stored procedures are compiled once, so it will not create the execution plan every time, thus reducing your network traffic. Views give more security. For example, you have a table containing sensitive data; you might wish to hide those columns from certain users, you can use views.

Table Variables

Avoid temporary tables if you don’t need transactions on that table. In that case you can use Table variables. When you create the temporary table, it will create the physical table in the tempdb database. However, if you create a table variable, it only resides in the memory. It will be much faster than temporary tables, and table variables reduce the recompilation of the code.

Notes: a good practice is to use table variable where you need a small table to operate on, loading big table variable into memory simply erodes its performance gain by interfering with other in-memory activities of SQL Server.

Row Count

If you’re in the situation to use the count(*), then use the following query. This is way faster than function count(), especially for huge tables with millions of rows

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND (indid < 2)

Set No Count On

If the No Count is off in your stored procedures, then the number of rows affected value will be returned. It will increase the network traffic. Set No Count should be On in your stored procedures. Hence the number of rows affected will not return.


Monday, March 15, 2010

How to move tempdb?

Today I got a call from our Window system admin that our production server is running low on C: drive. The cause turns out that the tempdb was bloated up and ate a huge chunk of disk space on our system drive. Since we have a separate drive F with plenty of free space, the task became how to move our tempdb from C: to F:. Here is an article that sheds much light on this solution. I am giving a more detailed action path here.

First of all, locate tempdb's files:

use tempdb
go

select * from sys.sysfiles
go

Then you'll have something like this:

fileid groupid size maxsize growth status perf name filename
1 1 1024 -1 10 1048578 0 tempdev c:\temp\tempdb.mdf
2 0 64 -1 10 1048642 0 templog c:\temp\templog.ldf

And the following script will move the database files to the designated locations:

alter database tempdb
modify file (name = tempdev, filename = 'F:\SQLDATA\tempdb.mdf')
go

alter database tempdb
modify file (name = templog, filename = 'F:\SQLDATA\templog.ldf')
go

But you'll also get a message saying:

The file "tempdev" has been modified in the system catalog. The new path will be used the next time the database is started.
The file "templog" has been modified in the system catalog. The new path will be used the next time the database is started.

The final step will be to restart SQL Sever service either from the SSMS directly or through windows services. Probably it is a better idea to do it outside working hours.

Wednesday, March 10, 2010

DAC connection problem from SSMS

Here is a common error when using SQL Server's Dedicated Administrator Connection:

'Dedicated administrator connections are not supported. (ObjectExplorer)'

A quick answer is to NOT use 'new query' button, but instead use 'new database engine query' button.

Here is a more detailed article regarding this issue.

Tuesday, March 2, 2010

How to grant viewing privileges of database diagrams to a user?

This is a biting question that I just bumped into today, it turns out MS has not supplied with established mechanisms to support distributing viewing privileges for database diagrams at least for SQL 2005 or 2008. According to BOL:

"Although any user with access to a database can create a diagram once it
has been created, the only users who can see it are the diagram's creator and
any member of the db_owner role."