Wednesday, March 24, 2010
Removing full duplicates by using row_number() function and common table expression
The good thing about row_number() function is that it can be based on partition of columns. New row number count starts for each partition of full duplicates and those rows with row number bigger than 1 can be easily spotted and removed. Here is the example that explains everything I am talking about here.
First of all, create a table in a test db
USE [test_db]
GO
CREATE TABLE [dbo].[Employee](
[EMPID] [int] NULL,
[FNAME] [varchar](50) NULL,
[LNAME] [varchar](50) NULL
) ON [PRIMARY]
Then let's insert many duplicated rows:
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021110, 'MICHAEL', 'POLAND')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021110, 'MICHAEL', 'POLAND')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, 'JIM', 'KENNEDY')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, 'JIM', 'KENNEDY')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, 'JIM', 'KENNEDY')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2121000, 'JAMES', 'SMITH')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2121000, 'JAMES', 'SMITH')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2011111, 'ADAM', 'ACKERMAN')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, 'MARTHA', 'LEDERER')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, 'MARTHA', 'LEDERER')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, 'MARTHA', 'LEDERER')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, 'MARTHA', 'LEDERER')
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (1021710, 'MARIAH', 'MANDEZ')
GO
If you selete everything from this table now, the following should be there:
EMPID FNAME LNAME
1021710 MARIAH MANDEZ
2011111 ADAM ACKERMAN
2021110 MICHAEL POLAND
2021110 MICHAEL POLAND
2021115 JIM KENNEDY
2021115 JIM KENNEDY
2021115 JIM KENNEDY
2121000 JAMES SMITH
2121000 JAMES SMITH
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER
And here comes the de-dup script with the help of common table expression:
with temp_employee as
( select row_number() over (partition by EMPID, FNAME, LNAME order by EMPID) as rowid, * from EMPLOYEE)
delete temp_employee where rowid > 1
Hope this article is illustrative enough.
Thursday, March 18, 2010
Several tips about query performance optimization
The 4 following tips are particularly helpful and can be immediately put into daily use:
Views and Stored Procedures
We can avoid using heavy duty queries by using stored procedures and views. Stored procedures are commonly used flexible database programming objects. Stored procedures are compiled once, so it will not create the execution plan every time, thus reducing your network traffic. Views give more security. For example, you have a table containing sensitive data; you might wish to hide those columns from certain users, you can use views.
Avoid temporary tables if you don’t need transactions on that table. In that case you can use Table variables. When you create the temporary table, it will create the physical table in the tempdb database. However, if you create a table variable, it only resides in the memory. It will be much faster than temporary tables, and table variables reduce the recompilation of the code.Table Variables
Notes: a good practice is to use table variable where you need a small table to operate on, loading big table variable into memory simply erodes its performance gain by interfering with other in-memory activities of SQL Server.
If you’re in the situation to use the count(*), then use the following query. This is way faster than function count(), especially for huge tables with millions of rows SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND (indid < 2) Row Count
If the No Count is off in your stored procedures, then the number of rows affected value will be returned. It will increase the network traffic. Set No Count should be On in your stored procedures. Hence the number of rows affected will not return.Set No Count On
Monday, March 15, 2010
How to move tempdb?
First of all, locate tempdb's files:
use tempdb
go
select * from sys.sysfiles
go
Then you'll have something like this:
fileid groupid size maxsize growth status perf name filename
1 1 1024 -1 10 1048578 0 tempdev c:\temp\tempdb.mdf
2 0 64 -1 10 1048642 0 templog c:\temp\templog.ldf
And the following script will move the database files to the designated locations:
alter database tempdb
modify file (name = tempdev, filename = 'F:\SQLDATA\tempdb.mdf')
go
alter database tempdb
modify file (name = templog, filename = 'F:\SQLDATA\templog.ldf')
go
But you'll also get a message saying:
The file "tempdev" has been modified in the system catalog. The new path will be used the next time the database is started.
The file "templog" has been modified in the system catalog. The new path will be used the next time the database is started.
The final step will be to restart SQL Sever service either from the SSMS directly or through windows services. Probably it is a better idea to do it outside working hours.
Wednesday, March 10, 2010
DAC connection problem from SSMS
'Dedicated administrator connections are not supported. (ObjectExplorer)'
A quick answer is to NOT use 'new query' button, but instead use 'new database engine query' button.
Here is a more detailed article regarding this issue.
Tuesday, March 2, 2010
How to grant viewing privileges of database diagrams to a user?
"Although any user with access to a database can create a diagram once it
has been created, the only users who can see it are the diagram's creator and
any member of the db_owner role."