如何识别重复记录的唯一标识符?

时间:2013-06-05 20:46:35

标签: sql sql-server

我在表格中有重复的记录。我需要能够只识别一个唯一标识符,以便我可以从表中删除它。

我知道有一个重复的唯一方法是来自subjectdescription列,所以如果至少有两个相同的主题和相同的描述,我需要删除一个并留下一个

所以我能够获得重复记录的列表,但是我无法获得唯一标识符以便能够删除它。

这是我为识别重复记录所做的工作。

SELECT 
    p.accountid, p.subject, p.description, count(*) AS total
FROM
    activities AS p 
WHERE     
    (p.StateCode = 1) AND p.createdon >= getdate()-6
GROUP BY 
    p.accountid, p.subject, p.description
HAVING 
    count(*) > 1
ORDER BY 
    p.accountid

有一列record_id,其中包含每条记录的唯一标识符。但是如果我将record_id添加到我的select语句中,那么我就没有结果,因为不可能有重复的唯一标识符

如何使用SQL Server获取record_id

注意:record_id不是整数,类似于“D32B275B-0B2F-4FF6-8089-00000FDA9E8E”

由于

4 个答案:

答案 0 :(得分:4)

我喜欢SQL Server的一个很好的功能是使用带有updatedelete语句的CTE。

您正在寻找重复记录,并且可能希望保留最低或最高record_id。您可以获取计数和id以继续使用CTE和窗口函数:

with todelete as (
    SELECT p.accountid, p.subject, p.description,
           COUNT(*) over (partition by p.accountid, p.subject, p.description) as total,
           MIN(record_id) over (partition by p.accountid, p.subject, p.description) as IdToKeep
    FROM activities AS p 
    WHERE (p.StateCode = 1) AND p.createdon >= getdate()-6
   )
delete from todelete
    where total > 1 and record_id <> IdToKeep;

最后的where子句只使用逻辑来选择要删除的正确行。

我应该补充一点,如果您只想要删除的列表,可以使用类似的查询:

with todelete as (
    SELECT p.accountid, p.subject, p.description,
           COUNT(*) over (partition by p.accountid, p.subject, p.description) as total,
           MIN(record_id) over (partition by p.accountid, p.subject, p.description) as IdToKeep
    FROM activities AS p 
    WHERE (p.StateCode = 1) AND p.createdon >= getdate()-6
   )
select *
from todelete
 where total > 1 and record_id <> IdToKeep;

over函数表示函数正被用作窗口函数。这个想法很简单。 Count(*) over返回partition子句中具有相同值的所有记录的计数。它与聚合函数非常相似,只不过你在每一行都得到了值。这类功能非常强大,我建议您了解更多相关信息。

答案 1 :(得分:0)

也许是这样的?

SELECT max(p.record_id), p.accountid, p.subject, p.description, count(*) AS total
FROM activities AS p 
WHERE (p.StateCode = 1) AND p.createdon >= getdate()-6
GROUP BY p.accountid, p.subject, p.description
HAVING count(*) > 1
ORDER BY p.accountid

答案 2 :(得分:0)

对我来说,你需要先做一个内部查询,然后加入更大的表来获得你想要的东西。

SELECT ALL
    *
FROM (SELECT p.accountid
  FROM activities AS p
  WHERE p.statecode = 1 AND p.createdon >= getdate()-6
  GROUP BY p.accountid
  HAVING count(*) > 1) AS x
JOIN activities AS a ON x.accountid = a.accountid
ORDER BY p.accountid

答案 3 :(得分:0)

试试这个:

;with recordsToDelete as (
SELECT 
     recordId
    ,Row_Number() OVER(partition p.subject, p.description) as rowNum
FROM activities AS p 
)

select
*
from recordsToDelete
where rowNum > 1

如果看起来正确,您可以将选择替换为:

delete from recordsToDelete
    where rowNum > 1