在SQL-Server中处理相当大的表。表有一些相同的行。我需要删除重复的行。问题是我不能改变这个表,即创建一个ID列。
我可以在重复对上更新另一行的一个列值。然后使用此值删除。
如何仅更新这些行? 例如:首先/最后插入,第一次出现,最新/最旧..
谢谢!
表格结构
NrValue | Comment | Value1 | Value2 | Value3 |
--------|-----------|-----------|-----------|---------------|
00000 | data0 | zz | top | vivalasvegas|
00100 | NULL | N/A | sex | no |
00100 | NULL | N/A | sex | no |
00200 | NULL | female | sex | yes |
00200 | NULL | female | sex | yes |
00300 | NULL | male | sex | yesplease |
00300 | NULL | male | sex | yesplease |
00400 | data21 | M | -- | na |
00500 | NULL | F | ezig | na |
所以,我可以使用'Comment'-column来更新,但除了重复行之外我无法触摸。我知道NrValue
哪些行可以更新。
结果将是:
NrValue | Comment | Value1 | Value2 | Value3 |
--------|-----------|-----------|-----------|---------------|
00000 | data0 | zz | top | vivalasvegas|
00100 | 1 | N/A | sex | no |
00100 | 2 | N/A | sex | no |
00200 | 3 | female | sex | yes |
00200 | 4 | female | sex | yes |
00300 | 5 | male | sex | yesplease |
00300 | 6 | male | sex | yesplease |
00400 | data21 | M | -- | na |
00500 | NULL | F | ezig | na |
最后,我删除NrValue
= 00100,00200或00300 AND Comment
= 2,4或6的行。
答案 0 :(得分:2)
使用类似
的内容ROW_NUMBER() OVER(PARTITION BY AllRelevantColumns ORDER BY SomeOrderCriteria)
这会为所有行生成1
,但重复会获得2
(或3
...)
您可以将此值放在新列中,也可以将其用于清理......
DECLARE @mockup TABLE(NrValue INT,Comment VARCHAR(100),Value1 VARCHAR(100),Value2 VARCHAR(100),Value3 VARCHAR(100));
INSERT INTO @mockup VALUES
(00000,'data0','zz','top','vivalasvegas')
,(00100,'NULL','N/A','sex','no')
,(00100,'NULL','N/A','sex','no')
,(00200,'NULL','female','sex','yes')
,(00200,'NULL','female','sex','yes')
,(00300,'NULL','male','sex','yesplease')
,(00300,'NULL','male','sex','yesplease')
,(00400,'data21','M','--','na')
,(00500,'NULL','F','ezig','na');
WITH Numbered AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY NrValue ORDER BY (SELECT NULL)) AS DupNr
,*
FROM @mockup
)
DELETE FROM Numbered
WHERE DupNr>1;
SELECT * FROM @mockup;
此概念称为可更新CTE 。 DELETE FROM Numbered ...
实际上会影响基础表...
如果NrValue
不足以将行检测为重复,只需向PARTITION BY
答案 1 :(得分:2)
您不需要更新,您想要删除重复项,那么为什么要进行中间步骤呢?
您的代码应如下所示:
declare @t table (col1 int, col2 int);
insert into @t values
(1, 1), (1, 1),
(1, 2), (1, 2),(1, 2), (1, 2),
(3, 2), (3, 2),(3, 2);
with cte as
(
select *, row_number() over (partition by col1, col2 order by 1/0) rn
from @t
)
delete cte
where rn > 1;
select *
from @t;
很抱歉没有在评论中发帖(行限制和代码格式丢失)