我创建了一个包含以下字段的表:
Record:
Id int Primary Key, Auto Increment
ForeignId int
IsDuplicateRecord bit NULL
然后我插入了一些数据:
INSERT INTO Record (ForeignId)
VALUES (5), (5), (1), (2), (3)
之后,我运行了以下更新语句(在http://archive.msdn.microsoft.com/SQLExamples/Wiki/View.aspx?title=DuplicateRows找到):
UPDATE Record
SET IsDuplicateRecord = 1
WHERE Id IN (
SELECT MAX(Id)
FROM Record
GROUP BY ForeignId
HAVING COUNT(*) > 1
)
到目前为止,查询影响了一行,现在表格如下:
Id ForeignId IsDuplicateRecord
0 5 NULL
1 5 1
2 1 NULL
3 2 NULL
4 3 NULL
我很开心,因为有那么一刻我觉得一切都会好起来的。但随后出现了一种像外面的云一样黑暗的怀疑在我脑海中浮现: 很沮丧,我输入了
INSERT INTO Record (ForeignId)
VALUES (1), (1)
再次运行上面的查询,这次产生了:
Id ForeignId IsDuplicateRecord
0 0 NULL
1 5 1
2 1 NULL
3 2 NULL
4 3 NULL
5 1 NULL
6 1 1
所以我想我会去StackOverflow,看看谁可以向我解释为什么ID为5的行中的IsDuplicatedRecord字段没有更新为1?你是那个吗?
答案 0 :(得分:5)
因为您运行的SQL仅将最后一个重复项标记为重复项。试试这个:
UPDATE Record
SET IsDuplicateRecord = 1
WHERE Id NOT IN (
SELECT MIN(Id)
FROM Record
GROUP BY ForeignId
)
这标志着每个ForeignId
的第二次和随后的重复,因为我认为你需要重复。
答案 1 :(得分:1)
UPDATE Record uu
SET IsDuplicateRecord = 1
-- if there exists a record with the same foreignid
-- but a lower id
-- this (uu) is a duplicate
WHERE EXISTS (
SELECT *
FROM Record ex
WHERE ex.ForeignId = uu.ForeignId
AND ex.Id < uu.Id
);
此EXISTS (...)
子查询与@DavidM的NOT IN (...)
子查询之间存在微妙的(但粗鲁)差异:NOT IN
不会产生NULL值,如果“ForeignId”恰好为NULL,则NOT IN版本为“True”,导致为ForeignId IS NULL的所有元组设置所有isDuplicateRecord标志。 (我怀疑ForeignId是FK,所以它很可能是NULLable)
对于不可为空的ForeignId,这两个版本基本相同。
更新:正如@MartinSmith所指出的,有些实现不喜欢没有FROM子句的UPDATE ... WHERE。我们可以使用一个自我加入的假人。 (还将第一个查询更新为正常)
-- DROP SCHEMA tmp CASCADE;
-- CREATE SCHEMA tmp ;
-- SET search_path='tmp';
DROP TABLE zrecord CASCADE;
CREATE TABLE zrecord
( id SERIAL NOT NULL PRIMARY KEY
, foreign_id INTEGER -- REFERENCES zrecord(id)
, is_duplicate boolean DEFAULT False
);
SELECT * FROM zrecord;
INSERT INTO zrecord(foreign_id) VALUES(NULL),(1),(NULL),(1),(NULL),(2),(NULL);
SELECT * FROM zrecord;
EXPLAIN ANALYZE
UPDATE zrecord uu
SET is_duplicate = True
--
-- This selfjoin is needed if UPDATE ... WHERE needs a FROM TABLE
--
FROM zrecord dum
WHERE dum.id = uu.id
AND EXISTS (
SELECT *
FROM zrecord ex
WHERE ex.foreign_id = uu.foreign_id
AND ex.Id < uu.Id
);
SELECT * FROM zrecord;
UPDATE2:PARTITION BY遇到与IN子句相同的可空性问题,所以似乎:
WITH zcte AS (
SELECT *
, row_number() OVER (PARTITION BY foreign_id ORDER BY id) AS rn
FROM zrecord
)
SELECT * FROM zcte;
结果:(原始测试集,在任何更新之前)
id | foreign_id | is_duplicate | rn
----+------------+--------------+----
2 | 1 | f | 1
4 | 1 | t | 2
6 | 2 | f | 1
1 | | f | 1
3 | | f | 2
5 | | f | 3
7 | | f | 4
答案 2 :(得分:0)