删除重复...使用NULL

时间:2013-05-14 14:57:27

标签: sql-server sql-server-2008 null duplicates duplicate-removal

在MS SQL Server中,我试图从带有空值的表中删除重复项。呻吟声。很多NULL个。最重要的是我需要保留任何重复记录的一个副本,有或没有NULL s。我基本上希望NULL在操作期间像一个值为“NULL”的普通记录,然后再回到真正的NULL。这可能吗?有更简单的解决方案吗?

Table1看起来像:

UID        Data1    Data2   
1           A        NULL        
2           A        NULL       
3           B        abc     
4           B        abc       
5           C        NULL      
6           D        ghj

我希望命令丢弃第2行和第4行并保留其余部分。 (SELECT用于测试。)

;SELECT UID, Data1, Data2
 FROM Table1 AS T
 WHERE NOT EXISTS (
    SELECT 1
    FROM table1 AS T2
    WHERE 
      T2.Data1 = T.Data1
      AND T2.Data2 = T.Data2
      AND T2.UID >= T.UID
      )
    AND Data1 IS NOT NULL

注意:SELECT DISTINCT不起作用,因为重复项具有不同的时间戳。

4 个答案:

答案 0 :(得分:3)

这应该做:

;WITH CTE AS
(
    SELECT  *,
            RN = ROW_NUMBER() OVER(PARTITION BY Data1,Data2 ORDER BY UID)
    FROM table1
)
DELETE
--SELECT *
FROM CTE
WHERE RN > 1

更新后的评论

好的,如果您在删除该行数时遇到问题,那么可以尝试创建一个包含您要删除的ID的查找表,然后进行批量删除(您将但是必须测试批次行数量。这是一个想法(假设UID是pk):

;WITH CTE AS
(
    SELECT  *,
            RN = ROW_NUMBER() OVER(PARTITION BY Data1,Data2 ORDER BY UID)
    FROM table1
)
SELECT [UID]
INTO RowsToDelete
FROM CTE
WHERE RN > 1;

CREATE INDEX I_UID ON RowsToDelete([UID]);

WHILE 1=1
BEGIN
    DELETE TOP (10000)
    FROM table1 T
    INNER JOIN RowsToDelete L
          ON T.[UID] = L.[UID]
    IF @@ROWCOUNT < 10000 BREAK;
END

答案 1 :(得分:0)

SELECT DISTINCT Data1, Data2 FROM Table1会不够?

答案 2 :(得分:0)

试试这个

  ;WITH uTable AS (
    SELECT UID, Data1, Data2, ROW_NUMBER() OVER (PARTITION BY Data1,Data2 ORDER BY UID DESC) as rownum
     FROM Table1 AS T)

    SELECT UID, Data1, Data2
    FROM uTable
    WHERE rownum = 1

答案 3 :(得分:0)

我的解决方案:

declare @data TABLE (UID int, Data1 char(1), Data2 Char(3))

-- Your example data
INSERT INTO @data (UID, Data1, Data2)
VALUES (1,'A',NULL),(2,'A',NULL),(3,'B','abc'),(4,'B','abc'),(5,'C',NULL),(6,'D','ghj')

DELETE FROM @data WHERE UID in (
  SELECT UID FROM (
    SELECT UID, ROW_NUMBER() OVER(PARTITION BY Data1,Data2 ORDER BY UID) as RowNo FROM @data
  ) d WHERE d.rowNo>1
)

SELECT UID, Data1, Data2 FROM @data