Question

使用SQL Server 2008.我试图删除表中的一些重复行。下面列出了相关的表格和列：

ItemTable
----------
Id - autoincrement, PK
ItemLabel - the actual identifier of the items


Linktable
----------
Id - autoincrement, PK
ItemId - the Id from ItemTable
RelatedItemId - the Id from RelatedItemTable


RelatedItemTable
------
no need to touch this with the query..

所以链接表不包含项目的实际ID，而是包含两个表格中正在运行的行号

需要实现的目标：ItemTable包含具有重复ItemLabel的行，其中另一个列在链接表中（具有Id列的值）而另一个不是。从ItemTable中，必须删除未链接的那些。我知道如何使用count和group by选择重复的行，但是却无法弄清楚如何仅删除链接表中不存在的行。 ItemTable还包含没有关系的项目副本，其中一个必须保留（无关紧要）。

http://www.sqlfiddle.com/#!3/9d181这是一个虚拟数据的SQL小提琴。

P.S。不要问为什么链接表使用正在运行的id而不是实际的id（可能是PK'd）......这是一个遗留系统。

Answer 1

使用LEFT JOIN加入两个表格。显然，不存在的ItemTable.ID将在Linktable.ItemID上包含 null 值，这将在您的WHERE子句中进行过滤。

DELETE  a
FROM    ItemTable a
        LEFT JOIN Linktable b
            ON a.ID = b.ItemID
WHERE   b.ItemID IS NULL

Answer 2

尝试一下：

DELETE t
OUTPUT deleted.*
FROM    ItemTable t
JOIN    (
 SELECT DENSE_RANK() OVER (PARTITION BY ItemLabel ORDER BY lt.ItemID DESC, it.id) num
        , it.Id
 FROM   ItemTable it
 LEFT JOIN 
        LinkTable lt ON
        lt.ItemId = it.id
) t2 ON t2.Id = t.Id
WHERE num > 1

<强> SQL Fiddle

虽然上述方法适用于您的情况，但我建议更具可读性的方法，您将获得更多控制权和更好的概述。这是一个多步骤的方法，可以分析和测试每个步骤：

-- get ItemLabels of duplicate records
SELECT  ItemLabel
INTO    #Duplicate_ItemLabels
FROM    ItemTable it
GROUP BY
        it.ItemLabel
HAVING  COUNT(*) > 1

-- get ItemLabels of duplicate records that have at least one record related to LinkTable
SELECT  *
INTO    #Duplicate_ItemLabels_Related_To_LinkTable
FROM    #Duplicate_ItemLabels d1
WHERE   EXISTS
(
        SELECT  *
        FROM    ItemTable it
        JOIN    Linktable lt ON 
                lt.ItemID = it.ID
        WHERE   it.ItemLabel = d1.ItemLabel
)

-- get ItemLabels of duplicate records that don't have any records related to LinkTable
SELECT  ItemLabel
INTO    #Duplicate_ItemLabels_NOT_Related_To_LinkTable
FROM    #Duplicate_ItemLabels
EXCEPT
SELECT  ItemLabel
FROM    #Duplicate_ItemLabels_Related_To_LinkTable

-- delete unwanted records for ItemLabels that have records related to linkTable
DELETE  it
OUTPUT  deleted.*
FROM    ItemTable it
JOIN    #Duplicate_ItemLabels_Related_To_LinkTable dup ON
        dup.ItemLabel = it.ItemLabel
WHERE   NOT EXISTS
(
        SELECT  *
        FROM    Linktable lt
        WHERE   lt.ItemID = it.ID
)

-- delete unwanted records for ItemLabels that don't have any records related to linkTable
DELETE  it
OUTPUT  deleted.*
FROM    ItemTable it
JOIN    #Duplicate_ItemLabels_NOT_Related_To_LinkTable dup ON
        dup.ItemLabel = it.ItemLabel
JOIN    
(
        -- records deleted will be all those that have ID greater than the smallest ID for this ItemLabel
        SELECT  ItemLabel
                , MIN(ID) ID
        FROM    ItemTable dup
        GROUP BY
                dup.ItemLabel
)       gr ON
        gr.ID < it.ID
AND     gr.ItemLabel = dup.ItemLabel

-- if after these DELETEs there are still duplicate records, it 
-- means that there are records for same ItemLabel with 
-- different ID and all of them are related to LinkTable

您可以轻松修改它，测试结果并操纵将删除哪些记录。我创建了一个SQL Fiddle，其中我放置了不同的数据样本，以便您可以看到它是如何处理的。

为了对第二种方法的数据进行抽样，我还在ItemTable中添加了记录，其中ItemLabel具有相同ID的{{1}}，其中多个与LinkTable相关（没有一个被任意删除。）

删除不在链接表中的重复行

2 个答案: