删除重复数据并加载到SQL Server

时间:2015-12-23 03:03:22

标签: sql-server sql-server-2008

我对SQL Server有疑问。

表:emp

empid   |  name |sal
1       |  abc  |100
2       |  def  |200
3       |  test |300
2       |  har  |500
3       |  jai  |600
4       | kali  |240

此表具有基于上表的重复数据我想从emp

中删除重复数据

应将重复数据加载到empduplicate表中。

此处empid是唯一的。如果empid多次显示,则该记录被视为重复记录。

empduplicate结构如下所示:

Empid   |  name  | sal

最后删除重复数据后,我希望看到emp表中的数据如下所示:

empid  |  name  | sal 
1      |  abc   | 100
4      | kali   | 240

为了删除重复项,我尝试了这段代码:

;with duplicate as 
(
    select 
        *,
        row_number()over (partition by empid order by empid) as rn
    from emp
)
delete from duplicate 
where rn > 1

但我无法删除整个记录。

示例:empid=2有重复数据

empid|name |sal
2    |def  |200
2    |har  |500

我需要删除整个empid=2对应的记录。 empid=2有重复内容,需要将其从emp表中删除。

empduplicate表需要加载重复数据,如下所示:

empid    | name   |sal
2        |def     |200
2        |har     |500
3        |test    |300
3        |jai     |600

为了插入重复数据,我尝试了这段代码:

insert into empduplicate 
    select 
        id, name, sal 
    from 
         emp  
    group by 
         id 
    having 
         count(*) > 1

该查询引发错误:

  

列' duplicate.name'在选择列表中无效,因为它不包含在聚合函数或GROUP BY子句中。

请告诉我如何编写查询以在SQL Server中实现我的任务

3 个答案:

答案 0 :(得分:2)

你几乎就在那里。不要使用ROW_NUMBER,而是使用COUNT

WITH CteInsert AS(
    SELECT *,
        cnt = COUNT(empid) OVER(PARTITION BY empid)
    FROM emp
)
INSERT INTO empduplicate(empid, name, sal)
SELECT
    empid, name, sal
FROM CteInsert
WHERE cnt > 1;

WITH CteDelete AS(
    SELECT *,
        cnt = COUNT(empid) OVER(PARTITION BY empid)
    FROM emp
)
DELETE FROM CteDelete WHERE cnt > 1;

您需要先在INSERT之前执行DELETE。此外,您可能希望将其包含在单个事务中。

答案 1 :(得分:0)

BEGIN TRAN
SELECT * INTO empduplicate FROM
(
 SELECT * 
 FROM emp
 WHERE empid IN (
  SELECT empid FROM emp 
  GROUP BY empid
  HAVING COUNT(empid)>1
 )
) as M

DELETE FROM emp WHERE empid IN (
 SELECT empid FROM emp
 GROUP BY empid
 HAVING COUNT(empid)>1
)

COMMIT TRAN

答案 2 :(得分:0)

SELECT DISTINCT * INTO #tmp FROM emp
 DELETE FROM emp
INSERT INTO emp
SELECT * FROM #tmp DROP table #tmp

SELECT * FROM emp ---------------------------- All Distinct ID

SELECT * INTO #tmp FROM emp
WHERE empid in(
    SELECT empid FROM emp 
    group by empid having count(*) = 1
)
DELETE FROM emp
INSERT INTO emp
SELECT * FROM #tmp DROP table #tmp

SELECT * FROM emp ----------------------------All ID which is not duplicate

INSERT INTO empduplicate  
  SELECT * FROM emp where empid in(
    SELECT empid FROM emp 
    group by empid having count(*) >1
)

SELECT * FROM empduplicate  -------------------ALL Duplicate value.