如何在没有主键或ID字段的表中查找重复项?

时间:2016-06-16 20:07:38

标签: sql sql-server sql-server-2008 sql-server-2005 sql-server-2012

我继承了一个包含重复数据的SQL Server数据库。我需要找到并删除重复的行。但是没有id字段,我不确定如何找到行。

通常情况下,我会使用LEFT JOIN将其与自身进行比较,并检查所有字段是否相同,但ID字段为table1.id <> table2.id,但如果没有,我就不会这样做。我知道如何找到重复的行,而不是它本身也匹配。

TABLE:

productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null

示例数据

1, 3, "started", "2016-06-15 04:23:12.000"
2, 3, "started", "2016-06-15 04:21:12.000"
1, 3, "started", "2016-06-15 04:23:12.000"
1, 3, "done", "2016-06-15 04:23:12.000"

在该样本中,只有第1行和第3行是重复的。

如何找到重复项?

4 个答案:

答案 0 :(得分:6)

使用(和分组)

select 
    productId 
  , categoryId 
  , state
  , dateDone
  , count(*)
from your_table 
group by productId ,categoryId ,state, dateDone
having count(*) >1

答案 1 :(得分:1)

出于某种原因,我认为你想要删除它们我想我读错了但只是在我的语句中将DELETE切换到SELECT,现在你有所有重复而不是原始的。但是使用DELETE会删除所有重复内容,但仍然会留下1条记录,我怀疑这是你的愿望。

IF OBJECT_ID('tempdb..#TT') IS NOT NULL
    BEGIN
        DROP TABLE #TT
    END

CREATE TABLE #TT (
    productId int not null,
    categoryId int not null,
    state varchar(255) not null,
    dateDone DATETIME not null
)

INSERT INTO #TT (productId, categoryId, state, dateDone)
VALUES (1, 3, 'started', '2016-06-15 04:23:12.000')
,(2, 3, 'started', '2016-06-15 04:21:12.000')
,(1, 3, 'started', '2016-06-15 04:23:12.000')
,(1, 3, 'done', '2016-06-15 04:23:12.000')


SELECT *
FROM
    #TT

;WITH cte AS (
    SELECT
       *
       ,RowNum = ROW_NUMBER() OVER (PARTITION BY productId, categoryId, state, dateDone ORDER BY productId) --note what you order by doesn't matter

    FROM

           #TT
    )

--if you want to delete them just do this otherwise change DELETE TO SELECT
    DELETE
    FROM
        cte
    WHERE
        RowNum > 1

    SELECT *
    FROM
        #TT

如果您希望并且可以更改架构,您也可以在事后添加标识列,它将填充现有记录

ALTER TABLE #TT
ADD Id INTEGER IDENTITY(1,1) NOT NULL

答案 2 :(得分:1)

您可以使用窗口功能执行此操作。例如

create table #tmp
   (
        Id INT
   )


insert into #tmp
VALUES (1), (1), (2) --so now we have duplicated rows



WITH CTE AS 
    (
     SELECT 
       ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id) AS [DuplicateCounter], 
       Id
     FROM #tmp
    )
DELETE FROM CTE
WHERE DuplicateCounter > 1 --duplicated rows have DuplicateCounter > 1

答案 3 :(得分:0)

您可以尝试private static string a = "a"; private static string b = "b"; public string Property1 { get; } = a + b; public string Property2 => a + b; Console.WriteLine(Property1 == Property2); // true, since "ab" == "ab" a = "no more a"; Console.WriteLine(Property1 == Property2); // false, since "ab" != "no more ab" ,然后将实际选择范围从CTE限制为CTE。这是查询: -

RN = 1