如何查询在5分钟内插入超过2次的重复记录,具有相同的电子邮件地址和状态= 1?

时间:2016-02-05 12:58:50

标签: php mysql sql duplicates

我在数据库表中有我的样本数据如下。

id   email            created_at              status
1    e@mail.com       2016-01-01 01:01:30      1
2    e@mail.com       2016-01-01 01:02:20     -1
3    e@mail.com       2016-01-01 01:03:30      1
4    new@mail.com     2016-01-01 01:04:00      1
5    e@mail.com       2016-01-01 01:04:30      1
6    new@mail.com     2016-01-01 02:59:08      1
7    new@mail.com     2016-01-01 03:01:24      1
8    iii@mail.com     2016-12-24 04:20:30      1
9    iii@mail.com     2016-12-24 04:23:29     -2
10   new@mail.com     2016-12-24 04:24:08      1
11   iii@mail.com     2016-12-24 04:25:29      1
12   new@mail.com     2016-12-24 04:32:08      1
13   e@mail.com       2016-12-24 05:16:30      1
14   iii@mail.com     2016-12-24 06:00:00      1
15   aa@email.com     2017-07-17 15:03:00      1
16   aa@email.com     2017-07-17 15:04:00      1
17   aa@email.com     2017-07-17 15:08:01      1

我的要求是:

a. Records are duplicated by email
b. The duplicated records are more than 2, thus 3 and upper
c. Those 3 or upper duplicated records have been inserted within 5 minutes Interval.
d. status = 1

以下是我的SQL查询,由@Strawberry提供。

SELECT DISTINCT a.*
       FROM my_table a
       JOIN 
          ( SELECT x.* 
                 , MAX(y.created_at) AS range_end
              FROM my_table x
              JOIN my_table y
                ON y.email = x.email
               AND y.id >= x.id 
               AND y.created_at <= x.created_at + INTERVAL 5 MINUTE
             GROUP
                BY x.id HAVING COUNT(*) >= 3
          ) b
         ON b.email = a.email 
        AND a.created_at BETWEEN b.created_at AND b.range_end;

以上查询返回以下记录。

id   email            created_at              status
1    e@mail.com       2016-01-01 01:01:30      1
2    e@mail.com       2016-01-01 01:02:20     -1
3    e@mail.com       2016-01-01 01:03:30      1
5    e@mail.com       2016-01-01 01:04:30      1
8    iii@mail.com     2016-12-24 04:20:30      1
9    iii@mail.com     2016-12-24 04:23:29     -2
11   iii@mail.com     2016-12-24 04:25:29      1

我尝试将"WHERE status = 1"仅用于获取以下记录,因为它们符合我的要求。

id   email            created_at             status
1    e@mail.com       2016-01-01 01:01:30     1
3    e@mail.com       2016-01-01 01:03:30     1
5    e@mail.com       2016-01-01 01:04:30     1

我想要检索的是由同一个电子邮件地址复制的记录,它们在5分钟内插入了2次以上,状态为1.如何"WHERE status = 1"得到我想要的结果?< / p>

2 个答案:

答案 0 :(得分:1)

我认为您的查询对于MySQL来说有点过于复杂:

select t.*
from my_table t join
     my_table t2
     on t.email = t2.email and
        t2.created_at > t.created_at and
        t2.created_at <= date_add(t.created_at, interval 5 minute) and
        t2.status = 1
where t.id = 1
group by t.id
having count(*) >= 3;

由于id在您的表中是唯一的,因此可以group by该列,并从表中选择其他列。实际上,这种MySQL扩展的使用甚至与ANSI标准SQL一致。

答案 1 :(得分:0)

SELECT DISTINCT a.*
       FROM service_request a
       JOIN 
          ( SELECT x.* 
                 , MAX(y.created_at) AS range_end
              FROM service_request x
              JOIN service_request y
                ON y.email = x.email
               AND y.id >= x.id 
               AND y.status = x.status
               AND y.created_at <= x.created_at + INTERVAL 5 MINUTE
             WHERE x.status = 1 
             GROUP
                BY x.id HAVING COUNT(*) >= 3
          ) b
         ON b.email = a.email 
        AND a.created_at BETWEEN b.created_at AND b.range_end;