检查重复记录的状态

时间:2013-07-19 08:13:39

标签: mysql select duplicates

假设我们有一个名为record的表,其中包含4个字段

id    (INT 11 AUTO_INC)

email (VAR 50)

timestamp (INT 11)

status (INT 1)

该表包含以下数据

enter image description here

现在我们可以看到电子邮件地址test@xample.com重复了4次(时间戳最低的记录是原始记录,之后的所有副本都是重复记录)。我可以使用

轻松计算唯一记录的数量
SELECT COUNT(DISTINCT email) FROM record

我还可以轻松找出使用

重复了多少次电子邮件地址
SELECT email, count(id) FROM record GROUP BY email HAVING COUNT(id)>1

但现在业务问题是

所有重复记录中STATUS的次数是多少次?

例如:

  • 对于test@example.com,没有状态为1的重复记录
  • 对于second@example.com,有1条状态为1的重复记录
  • 对于third@example.com,有1个重复记录,状态为1
  • 对于four@example.com,没有重复记录,状态为1
  • 对于five@example.com,有2个重复记录,状态为1

因此所有数字的总和为0 + 1 + 1 + 0 + 2 = 4

这意味着有4个重复记录在表

中有status = 1

问题

有多少重复记录的状态= 1?

3 个答案:

答案 0 :(得分:1)

这是一种更好的新解决方案。它会删除每封电子邮件的第一个条目,然后计算其余条目。它不容易阅读,如果可能的话,我会在存储过程中写这个,但这可行。

select sum(status)
  from dude d1
  join (select email, 
               min(ts) as ts 
          from dude 
         group by email) mins 
 using (email)
 where d1.ts != mins.ts;

sqlfiddle

以下原始答案

您自己的查询,以查找“使用”

重复了多少次电子邮件地址
SELECT email, 
       count(id) as duplicates 
  FROM record 
 GROUP BY email 
HAVING COUNT(id)>1

可以很容易地修改为回答“有多少重复记录有状态= 1”

SELECT email, 
       count(id) as duplicates_status_sum 
  FROM record 
 GROUP BY email 
 WHERE status = 1 
HAVING COUNT(id)>1

这两个查询都将回答包括原始行,因此它实际上是“包括原始行的重复”。如果原始值总是具有状态1,则可以从总和中减去1。

SELECT email, 
       count(id) -1 as true_duplicates 
  FROM record 
 GROUP BY email 
HAVING COUNT(id)>1

SELECT email, 
       count(id) -1 as true_duplicates_status_sum 
  FROM record 
 GROUP BY email 
 WHERE status = 1 
HAVING COUNT(id)>1

答案 1 :(得分:0)

如果我在理解上没有错,那么你的查询应该是

SELECT  `email` , COUNT(  `id` ) AS  `tot` 
FROM  `record` , (
SELECT  `email` AS  `emt` , MIN(  `timestamp` ) AS  `mtm` 
FROM  `record` 
GROUP BY  `email`
) AS  `temp` 
WHERE  `email` =  `emt` 
AND  `timestamp` >  `mtm` 
AND  `status` =1
GROUP BY  `email` 
HAVING COUNT(  `id` ) >=1

首先,我们需要获取最小时间戳,然后找到在此时间戳之后插入且状态为1的重复记录。

如果您想要总和,那么查询是

SELECT SUM(  `tot` ) AS  `duplicatesWithStatus1` 
FROM (
SELECT  `email` , COUNT(  `id` ) AS  `tot` 
FROM  `record` , (
SELECT  `email` AS  `emt` , MIN(  `timestamp` ) AS  `mtm` 
FROM  `record` 
GROUP BY  `email`
) AS  `temp` 
WHERE  `email` =  `emt` 
AND  `timestamp` >  `mtm` 
AND  `status` =1
GROUP BY  `email` 
HAVING COUNT(  `id` ) >=1
) AS t

希望这是你想要的

答案 2 :(得分:0)

您可以通过

获取重复记录的计数状态= 1
select count(*) as Duplicate_Record_Count
from (select *
from record r
where r.status=1
group by r.email,r.status
having count(r.email)>1 ) t1

以下查询将返回状态为1 count和timestamp

的重复电子邮件
select  r.email,count(*)-1 as Duplicate_Count,min(r.timestamp) as timestamp
from record r
where r.status=1
group by r.email
having count(r.email)>1