使用子查询删除重复的行

时间:2016-09-19 04:40:56

标签: sql sql-server duplicates subquery

我正在使用SQL Server 2014并使用Microsoft提供的AdventureWorks2012示例数据库。

我正在尝试使用下面的子查询删除重复的行(选项#2):

/ *选项#2:SUBQUERY * /

--SELECT * FROM
DELETE SQLPractice.[dbo].[CURRENCY]
WHERE EXISTS (SELECT * 
              FROM
                  (SELECT 
                       NAME,
                       ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
                   FROM  
                       SQLPractice.[dbo].[CURRENCY]) AS T
              WHERE Flag > 1) 
GO

但它删除了表中的所有行。

但是另一个选项(CTE)确实只删除了重复的行。

/*** Option #3: CTE ***/ 
;WITH RepFlag AS
(
    SELECT 
        NAME,
        ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
    FROM 
        SQLPractice.[dbo].[CURRENCY]
)
--SELECT * FROM RepFlag
DELETE RepFlag
WHERE Flag > 1

SELECT * 
FROM SQLPractice.[dbo].[CURRENCY]

请使用以下代码创建自己的测试表。

/*** REMOVING DUPLICATE ROWS OPTION ***/
-- Creating a table 
SELECT TOP 0 *
INTO [dbo].[CURRENCY]
FROM AdventureWorks2012.Sales.Currency
WHERE NAME LIKE  '%A';

-- inserting duplicate rows 
INSERT [dbo].[CURRENCY]
SELECT * FROM AdventureWorks2012.Sales.Currency
WHERE NAME LIKE  '%A';

/***** SELECTING COUNT OF DUPLICATED ROWS *****/ 

/*** Option #1: "GROUP BY" with "HAVING" ***/ 
SELECT 
    NAME, COUNT(*) AS Qty   
FROM 
    SQLPractice.[dbo].[CURRENCY]
GROUP BY 
    NAME
HAVING 
    COUNT(*) >1
GO

6 个答案:

答案 0 :(得分:1)

选项#2删除所有行,因为EXISTS中的子查询将始终返回表的所有行的行。 EXISTS内的子查询与父查询之间必定存在某种关系。子查询必须根据表的每一行生成不同的结果。当表具有标识col时,使用子查询删除重复行的选项为:

DELETE from SQLPractice.[dbo].[CURRENCY]
where identityCol not in ( select min(identityCol) FROM SQLPractice.[dbo].[CURRENCY] GROUP BY NAME)

答案 1 :(得分:1)

可能的方法之一:

DELETE tt
FROM [your table] tt
   INNER JOIN

    (SELECT NAME, MIN(PK) AS MIN_KEY)
    FROM [your table]
    GROUP BY Name
    HAVING COUNT(*) > 1) dup ON dup.name = tt.name and tt.PK <> dup.MIN_KEY

答案 2 :(得分:1)

在您的示例中,Row_Number()不会帮助您解决问题。 因为即使在主键(候选字段)中也是相同的行,即CurrencyCode

由于您只是将同一行插入目标表,因此ModifiedDate字段也是相同的。

对于示例案例,您可以应用delete duplicate rows where no primary key exists

中描述的解决方案

您可以测试并查看以下DELETE命令将删除表中的所有行

bash: ./test.o: Permission denied

例如,从教程中,建议使用cursor方法 您可以使用以下

delete [dbo].[CURRENCY]
from [dbo].[CURRENCY]
inner join (
    select ROW_NUMBER() over (partition by CurrencyCode order by ModifiedDate) rn, CurrencyCode, ModifiedDate from [dbo].[CURRENCY]
) dublicates
    on dublicates.CurrencyCode = [dbo].[CURRENCY].CurrencyCode and
       dublicates.ModifiedDate = [dbo].[CURRENCY].ModifiedDate
where dublicates.rn > 1

答案 3 :(得分:1)

如果您想使用subquery删除重复的名称,请使用以下方法。

DELETE t
FROM  (SELECT  NAME,ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
              FROM  SQLPractice.[dbo].[CURRENCY]
            ) t
WHERE t.Flag > 1
GO

您也可以使用c ommon table expression (CTE)实现此目的。

;WITH cte_1
AS (SELECT  NAME,ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
              FROM  SQLPractice.[dbo].[CURRENCY]
            ) 
DELETE FROM cte_1
WHERE Flag > 1

答案 4 :(得分:0)

使用语句只删除重复的行,因为它收集所有重复的记录,然后执行删除操作。

在您的子查询中,您还没有指定要删除哪些记录的条件,应该写成如下:

DELETE SQLPractice.[dbo].[CURRENCY]
WHERE EXISTS  
(
    SELECT * FROM 
    (
        SELECT 
        NAME,
        ID,
        ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
        FROM SQLPractice.[dbo].[CURRENCY] 
    )   AS T
    WHERE Flag > 1 AND T.ID=[CURRENCY].ID
) 

答案 5 :(得分:0)

你可以通过这个查询尝试这个只重复的记录将被删除我做了这一个基于货币重复值它删除所有重复值

delete from test where currency in(select currency from test group by currency having count(*) >1)