使用SQL Server 2008.我试图删除表中的一些重复行。下面列出了相关的表格和列:
ItemTable
----------
Id - autoincrement, PK
ItemLabel - the actual identifier of the items
Linktable
----------
Id - autoincrement, PK
ItemId - the Id from ItemTable
RelatedItemId - the Id from RelatedItemTable
RelatedItemTable
------
no need to touch this with the query..
所以链接表不包含项目的实际ID,而是包含两个表格中正在运行的行号
需要实现的目标:ItemTable包含具有重复ItemLabel的行,其中另一个列在链接表中(具有Id列的值)而另一个不是。从ItemTable中,必须删除未链接的那些。我知道如何使用count和group by选择重复的行,但是却无法弄清楚如何仅删除链接表中不存在的行。 ItemTable还包含没有关系的项目副本,其中一个必须保留(无关紧要)。
http://www.sqlfiddle.com/#!3/9d181这是一个虚拟数据的SQL小提琴。
P.S。不要问为什么链接表使用正在运行的id而不是实际的id(可能是PK'd)......这是一个遗留系统。
答案 0 :(得分:0)
使用LEFT JOIN
加入两个表格。显然,不存在的ItemTable.ID
将在Linktable.ItemID
上包含 null 值,这将在您的WHERE
子句中进行过滤。
DELETE a
FROM ItemTable a
LEFT JOIN Linktable b
ON a.ID = b.ItemID
WHERE b.ItemID IS NULL
答案 1 :(得分:0)
尝试一下:
DELETE t
OUTPUT deleted.*
FROM ItemTable t
JOIN (
SELECT DENSE_RANK() OVER (PARTITION BY ItemLabel ORDER BY lt.ItemID DESC, it.id) num
, it.Id
FROM ItemTable it
LEFT JOIN
LinkTable lt ON
lt.ItemId = it.id
) t2 ON t2.Id = t.Id
WHERE num > 1
<强> SQL Fiddle 强>
虽然上述方法适用于您的情况,但我建议更具可读性的方法,您将获得更多控制权和更好的概述。 这是一个多步骤的方法,可以分析和测试每个步骤:
-- get ItemLabels of duplicate records
SELECT ItemLabel
INTO #Duplicate_ItemLabels
FROM ItemTable it
GROUP BY
it.ItemLabel
HAVING COUNT(*) > 1
-- get ItemLabels of duplicate records that have at least one record related to LinkTable
SELECT *
INTO #Duplicate_ItemLabels_Related_To_LinkTable
FROM #Duplicate_ItemLabels d1
WHERE EXISTS
(
SELECT *
FROM ItemTable it
JOIN Linktable lt ON
lt.ItemID = it.ID
WHERE it.ItemLabel = d1.ItemLabel
)
-- get ItemLabels of duplicate records that don't have any records related to LinkTable
SELECT ItemLabel
INTO #Duplicate_ItemLabels_NOT_Related_To_LinkTable
FROM #Duplicate_ItemLabels
EXCEPT
SELECT ItemLabel
FROM #Duplicate_ItemLabels_Related_To_LinkTable
-- delete unwanted records for ItemLabels that have records related to linkTable
DELETE it
OUTPUT deleted.*
FROM ItemTable it
JOIN #Duplicate_ItemLabels_Related_To_LinkTable dup ON
dup.ItemLabel = it.ItemLabel
WHERE NOT EXISTS
(
SELECT *
FROM Linktable lt
WHERE lt.ItemID = it.ID
)
-- delete unwanted records for ItemLabels that don't have any records related to linkTable
DELETE it
OUTPUT deleted.*
FROM ItemTable it
JOIN #Duplicate_ItemLabels_NOT_Related_To_LinkTable dup ON
dup.ItemLabel = it.ItemLabel
JOIN
(
-- records deleted will be all those that have ID greater than the smallest ID for this ItemLabel
SELECT ItemLabel
, MIN(ID) ID
FROM ItemTable dup
GROUP BY
dup.ItemLabel
) gr ON
gr.ID < it.ID
AND gr.ItemLabel = dup.ItemLabel
-- if after these DELETEs there are still duplicate records, it
-- means that there are records for same ItemLabel with
-- different ID and all of them are related to LinkTable
您可以轻松修改它,测试结果并操纵将删除哪些记录。我创建了一个SQL Fiddle,其中我放置了不同的数据样本,以便您可以看到它是如何处理的。
为了对第二种方法的数据进行抽样,我还在ItemTable
中添加了记录,其中ItemLabel
具有相同ID
的{{1}},其中多个与LinkTable
相关(没有一个被任意删除。)