选择在固定时间范围内写入最少量邮件的所有用户

时间:2014-11-03 10:59:06

标签: sql

user_message

+----+---------+-------+------------+
| id | from_id | to_id | time_stamp |
+----+---------+-------+------------+
|  1 |    1    |   2   | 1414700000 |
|  2 |    2    |   1   | 1414700100 |
|  3 |    3    |   1   | 1414701000 |
|  4 |    3    |   2   | 1414701001 |
|  5 |    3    |   4   | 1414701002 |
|  6 |    1    |   3   | 1414701100 |
+----+---------+-------+------------+

我现在正试图让所有在固定时间范围内向其他用户写入最少量消息(比方说3)的用户,比方说5秒。在这个例子中,我想得到一个与此类似的结果:

+----+----+-------+
| from_id | count |
+---------+-------+
|    3    |   3   |
+---------+-------+

这样做的想法是检查垃圾邮件。一个很好的奖励是只记录共享相同内容的消息。

2 个答案:

答案 0 :(得分:2)

以下使用join来实现此目的:

select um.*, count(*) as cnt
from user_message um join
     user_message um2
     on um.from_id = um2.from_id and
        um2.time_stamp between um.time_stamp and um.time_stamp + 3
group by um.id
having count(*) >= 3;

对于性能,您需要user_message(from_id, time_stamp)上的索引。即使使用索引,如果你有一个大桌子,性能可能也不会那么好。

编辑:

实际上,写这个可能更有效的另一种方法是:

select um.*,
       (select count(*)
        from user_message um2
        where um.from_id = um2.from_id and
              um2.time_stamp between um.time_stamp and um.time_stamp + 3
       ) as cnt
from user_message um
having cnt >= 3;

这使用MySQL扩展,允许在非聚合查询中使用having

答案 1 :(得分:1)

对于每条消息(u1),查找在此秒或前四秒内从同一用户发送的所有消息(u2)。保持那些至少有3 u2的u1。最后一组由from_id显示每个from_id一条记录,其中包含最大发送消息数。

select from_id, max(cnt) as max_count
from
(
  select u1.id, u1.from_id, count(*) as cnt
  from user_message u1
  join user_message u2 
    on u2.from_id = u1.from_id 
    -- and u2.content = u1.content
    and u2.time_stamp between u1.time_stamp - 4 and u1.time_stamp
  group by u1.id, u1.from_id
  having count(*) >= 3
) as init
group by from_id;