成对的,相同的行。只更新其中一个

时间:2017-06-08 08:14:54

标签: sql-server common-table-expression row-number

在SQL-Server中处理相当大的表。表有一些相同的行。我需要删除重复的行。问题是我不能改变这个表,即创建一个ID列。

我可以在重复对上更新另一行的一个列值。然后使用此值删除。

如何仅更新这些行? 例如:首先/最后插入,第一次出现,最新/最旧..

谢谢!

表格结构

NrValue |   Comment |   Value1  |   Value2  |   Value3      |
--------|-----------|-----------|-----------|---------------|
00000   |   data0   |   zz      |   top     |   vivalasvegas|
00100   |   NULL    |   N/A     |   sex     |   no          |
00100   |   NULL    |   N/A     |   sex     |   no          |
00200   |   NULL    |   female  |   sex     |   yes         |
00200   |   NULL    |   female  |   sex     |   yes         |
00300   |   NULL    |   male    |   sex     |   yesplease   |
00300   |   NULL    |   male    |   sex     |   yesplease   |
00400   |   data21  |   M       |   --      |   na          |
00500   |   NULL    |   F       |   ezig    |   na          |

所以,我可以使用'Comment'-column来更新,但除了重复行之外我无法触摸。我知道NrValue哪些行可以更新。 结果将是:

NrValue |   Comment |   Value1  |   Value2  |   Value3      |
--------|-----------|-----------|-----------|---------------|
00000   |   data0   |   zz      |   top     |   vivalasvegas|
00100   |   1       |   N/A     |   sex     |   no          |
00100   |   2       |   N/A     |   sex     |   no          |
00200   |   3       |   female  |   sex     |   yes         |
00200   |   4       |   female  |   sex     |   yes         |
00300   |   5       |   male    |   sex     |   yesplease   |
00300   |   6       |   male    |   sex     |   yesplease   |
00400   |   data21  |   M       |   --      |   na          |
00500   |   NULL    |   F       |   ezig    |   na          |

最后,我删除NrValue = 00100,00200或00300 AND Comment = 2,4或6的行。

2 个答案:

答案 0 :(得分:2)

使用类似

的内容
ROW_NUMBER() OVER(PARTITION BY AllRelevantColumns ORDER BY SomeOrderCriteria)

这会为所有行生成1,但重复会获得2(或3 ...)

您可以将此值放在新列中,也可以将其用于清理......

更新您的测试数据......

DECLARE @mockup TABLE(NrValue INT,Comment VARCHAR(100),Value1 VARCHAR(100),Value2 VARCHAR(100),Value3 VARCHAR(100));      
INSERT INTO @mockup  VALUES
 (00000,'data0','zz','top','vivalasvegas')
,(00100,'NULL','N/A','sex','no')
,(00100,'NULL','N/A','sex','no')
,(00200,'NULL','female','sex','yes')
,(00200,'NULL','female','sex','yes')
,(00300,'NULL','male','sex','yesplease')
,(00300,'NULL','male','sex','yesplease')
,(00400,'data21','M','--','na')
,(00500,'NULL','F','ezig','na');

WITH Numbered AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY NrValue ORDER BY (SELECT NULL)) AS DupNr
          ,*
    FROM @mockup 
)
DELETE FROM Numbered 
WHERE DupNr>1;

SELECT * FROM @mockup;

此概念称为可更新CTE DELETE FROM Numbered ...实际上会影响基础表...

如果NrValue不足以将行检测为重复,只需向PARTITION BY

添加更多列

答案 1 :(得分:2)

您不需要更新,您想要删除重复项,那么为什么要进行中间步骤呢?

您的代码应如下所示:

    declare @t table (col1 int, col2 int);
    insert into @t values
    (1, 1), (1, 1),
    (1, 2), (1, 2),(1, 2), (1, 2),
    (3, 2), (3, 2),(3, 2);

    with cte as
    (
    select *, row_number() over (partition by col1, col2 order by 1/0) rn
    from @t
    )

    delete cte
    where rn > 1;


     select *
     from @t;

很抱歉没有在评论中发帖(行限制和代码格式丢失)