假设我们有一个名为record
的表,其中包含4个字段
id (INT 11 AUTO_INC)
email (VAR 50)
timestamp (INT 11)
status (INT 1)
该表包含以下数据
现在我们可以看到电子邮件地址test@xample.com重复了4次(时间戳最低的记录是原始记录,之后的所有副本都是重复记录)。我可以使用
轻松计算唯一记录的数量SELECT COUNT(DISTINCT email) FROM record
我还可以轻松找出使用
重复了多少次电子邮件地址SELECT email, count(id) FROM record GROUP BY email HAVING COUNT(id)>1
但现在业务问题是
所有重复记录中STATUS
的次数是多少次?
例如:
因此所有数字的总和为0 + 1 + 1 + 0 + 2 = 4
这意味着有4个重复记录在表
中有status = 1
问题
有多少重复记录的状态= 1?
答案 0 :(得分:1)
这是一种更好的新解决方案。它会删除每封电子邮件的第一个条目,然后计算其余条目。它不容易阅读,如果可能的话,我会在存储过程中写这个,但这可行。
select sum(status)
from dude d1
join (select email,
min(ts) as ts
from dude
group by email) mins
using (email)
where d1.ts != mins.ts;
以下原始答案
您自己的查询,以查找“使用”
重复了多少次电子邮件地址SELECT email,
count(id) as duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
可以很容易地修改为回答“有多少重复记录有状态= 1”
SELECT email,
count(id) as duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
这两个查询都将回答包括原始行,因此它实际上是“包括原始行的重复”。如果原始值总是具有状态1,则可以从总和中减去1。
SELECT email,
count(id) -1 as true_duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
SELECT email,
count(id) -1 as true_duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
答案 1 :(得分:0)
如果我在理解上没有错,那么你的查询应该是
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
首先,我们需要获取最小时间戳,然后找到在此时间戳之后插入且状态为1的重复记录。
如果您想要总和,那么查询是
SELECT SUM( `tot` ) AS `duplicatesWithStatus1`
FROM (
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
) AS t
希望这是你想要的
答案 2 :(得分:0)
您可以通过
获取重复记录的计数状态= 1select count(*) as Duplicate_Record_Count
from (select *
from record r
where r.status=1
group by r.email,r.status
having count(r.email)>1 ) t1
以下查询将返回状态为1 count和timestamp
的重复电子邮件select r.email,count(*)-1 as Duplicate_Count,min(r.timestamp) as timestamp
from record r
where r.status=1
group by r.email
having count(r.email)>1