删除不在链接表中的重复行

时间:2013-03-11 08:40:05

标签: sql sql-server sql-delete

使用SQL Server 2008.我试图删除表中的一些重复行。下面列出了相关的表格和列:

ItemTable
----------
Id - autoincrement, PK
ItemLabel - the actual identifier of the items


Linktable
----------
Id - autoincrement, PK
ItemId - the Id from ItemTable
RelatedItemId - the Id from RelatedItemTable


RelatedItemTable
------
no need to touch this with the query..

所以链接表不包含项目的实际ID,而是包含两个表格中正在运行的行号

需要实现的目标:ItemTable包含具有重复ItemLabel的行,其中另一个列在链接表中(具有Id列的值)而另一个不是。从ItemTable中,必须删除未链接的那些。我知道如何使用count和group by选择重复的行,但是却无法弄清楚如何仅删除链接表中不存在的行。 ItemTable还包含没有关系的项目副本,其中一个必须保留(无关紧要)。

http://www.sqlfiddle.com/#!3/9d181这是一个虚拟数据的SQL小提琴。

P.S。不要问为什么链接表使用正在运行的id而不是实际的id(可能是PK'd)......这是一个遗留系统。

2 个答案:

答案 0 :(得分:0)

使用LEFT JOIN加入两个表格。显然,不存在的ItemTable.ID将在Linktable.ItemID上包含 null 值,这将在您的WHERE子句中进行过滤。

DELETE  a
FROM    ItemTable a
        LEFT JOIN Linktable b
            ON a.ID = b.ItemID
WHERE   b.ItemID IS NULL

答案 1 :(得分:0)

尝试一下:

DELETE t
OUTPUT deleted.*
FROM    ItemTable t
JOIN    (
 SELECT DENSE_RANK() OVER (PARTITION BY ItemLabel ORDER BY lt.ItemID DESC, it.id) num
        , it.Id
 FROM   ItemTable it
 LEFT JOIN 
        LinkTable lt ON
        lt.ItemId = it.id
) t2 ON t2.Id = t.Id
WHERE num > 1

<强> SQL Fiddle

虽然上述方法适用于您的情况,但我建议更具可读性的方法,您将获得更多控制权和更好的概述。 这是一个多步骤的方法,可以分析和测试每个步骤:

-- get ItemLabels of duplicate records
SELECT  ItemLabel
INTO    #Duplicate_ItemLabels
FROM    ItemTable it
GROUP BY
        it.ItemLabel
HAVING  COUNT(*) > 1

-- get ItemLabels of duplicate records that have at least one record related to LinkTable
SELECT  *
INTO    #Duplicate_ItemLabels_Related_To_LinkTable
FROM    #Duplicate_ItemLabels d1
WHERE   EXISTS
(
        SELECT  *
        FROM    ItemTable it
        JOIN    Linktable lt ON 
                lt.ItemID = it.ID
        WHERE   it.ItemLabel = d1.ItemLabel
)

-- get ItemLabels of duplicate records that don't have any records related to LinkTable
SELECT  ItemLabel
INTO    #Duplicate_ItemLabels_NOT_Related_To_LinkTable
FROM    #Duplicate_ItemLabels
EXCEPT
SELECT  ItemLabel
FROM    #Duplicate_ItemLabels_Related_To_LinkTable

-- delete unwanted records for ItemLabels that have records related to linkTable
DELETE  it
OUTPUT  deleted.*
FROM    ItemTable it
JOIN    #Duplicate_ItemLabels_Related_To_LinkTable dup ON
        dup.ItemLabel = it.ItemLabel
WHERE   NOT EXISTS
(
        SELECT  *
        FROM    Linktable lt
        WHERE   lt.ItemID = it.ID
)

-- delete unwanted records for ItemLabels that don't have any records related to linkTable
DELETE  it
OUTPUT  deleted.*
FROM    ItemTable it
JOIN    #Duplicate_ItemLabels_NOT_Related_To_LinkTable dup ON
        dup.ItemLabel = it.ItemLabel
JOIN    
(
        -- records deleted will be all those that have ID greater than the smallest ID for this ItemLabel
        SELECT  ItemLabel
                , MIN(ID) ID
        FROM    ItemTable dup
        GROUP BY
                dup.ItemLabel
)       gr ON
        gr.ID < it.ID
AND     gr.ItemLabel = dup.ItemLabel

-- if after these DELETEs there are still duplicate records, it 
-- means that there are records for same ItemLabel with 
-- different ID and all of them are related to LinkTable

您可以轻松修改它,测试结果并操纵将删除哪些记录。我创建了一个SQL Fiddle,其中我放置了不同的数据样本,以便您可以看到它是如何处理的。

为了对第二种方法的数据进行抽样,我还在ItemTable中添加了记录,其中ItemLabel具有相同ID的{​​{1}},其中多个与LinkTable相关(没有一个被任意删除。)