我有一张表posts
,如下所示:
id | user_id | created_at
我想要一个查询,该查询返回当天最近60天内每天发布的不同用户数。所以结果看起来像是:
date | count
2017-12-9 | 28
2017-12-10 | 25 (there were 25 different users posting during the previous 60 days to 2017-12-10)
我在MySQL中尝试使用相关子查询,但无论我尝试哪些索引,性能都很糟糕。我当前的查询如下:
SELECT
DATE(orig_posts.created_at) as date,
(
SELECT
COUNT(DISTINCT(posts.user_id))
FROM
posts
WHERE
posts.created_at BETWEEN (orig_posts.created_at - INTERVAL 60 DAY) AND orig_posts.created_at
) as count
FROM
posts as orig_posts
WHERE
orig_posts.created_at > NOW() - INTERVAL 5 DAY
GROUP BY
DATE(orig_posts.created_at)
在解释查询之后,罪魁祸首似乎是COUNT(DISTINCT(user_id))
,但我无法想出这个查询的更好的替代方案。解释输出是:
+----+--------------------+---------------+-------+------------------------------------------------------------------------------------+----------------+---------+------+--------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+-------+------------------------------------------------------------------------------------+----------------+---------+------+--------+-----------------------------------------------------------+
| 1 | PRIMARY | orig_posts | range | created_at_idx,created_at_plus_type_idx,created_at_plus_type_plus_user_id_idx | created_at_idx | 5 | NULL | 12179 | Using where; Using index; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | posts | ALL | created_at_idx,created_at_plus_type_idx,created_at_plus_type_plus_user_id_idx | NULL | NULL | NULL | 548653 | Range checked for each record (index map: 0x580) |
+----+--------------------+---------------+-------+------------------------------------------------------------------------------------+----------------+---------+------+-------+-----------------------------------------------------------+