编辑:在查看了这里的一些答案和研究时间之后,我的团队得出结论,很可能没有办法优化这一点,而不是我们能够实现的4.5秒(除非可能在offers_clicks上进行分区,但这会产生一些难看的副作用)。最后,经过大量的头脑风暴,我们决定拆分两个查询,创建两组用户ID(一个来自users表,一个来自offers_clicks),并将它们与Python中的set进行比较。来自users表的id组仍然是从SQL中提取的,但我们决定将offers_clicks移动到Lucene并在其上添加了一些缓存,这样就可以从中获取另一组ID了。最终结果是缓存为0.9秒,无缓存为0.9秒。
原始帖子开始:我无法优化查询。第一个版本的查询很好,但是在第二个查询中加入了offers_clicks,查询变得相当慢。 Users表包含1000万行,offers_clicks包含5300万行。
可接受的表现:
SELECT count(distinct(users.id)) AS count_1
FROM users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26';
1 row in set (0.35 sec)
为:
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks USE index (user_id_3), users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (7.39 sec)
以下是它的外观而不指定任何索引(甚至更糟):
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks, users
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (17.72 sec)
说明:
explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | country_2 | country_2 | 14 | NULL | 245014 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id_3 | user_id_3 | 4 | dejong_pointstoshop.users.id | 270153 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
解释而不指定任何索引:
mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | PRIMARY,last_active,country,last_active_2,country_2 | country_2 | 14 | NULL | 221606 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4 | dejong_pointstoshop.users.id | 3 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
这是我试过的一大堆索引并没有太大的成功:
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| offers_clicks | 1 | user_id_3 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 2 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 3 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 1 | user_id | A | 17838712 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 2 | date | A | 53516137 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 2 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 3 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 1 | country | A | 14 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 2 | last_active | A | 8048529 | NULL | NULL | | BTREE | | |
简化用户架构:
+---------------------------------+---------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------------+---------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| country | char(2) | NO | MUL | | |
| last_active | datetime | NO | MUL | 2000-01-01 00:00:00 | |
Simplified提供点击模式:
+-----------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | 0 | |
| offer_id | int(11) unsigned | NO | MUL | NULL | |
| date | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| ranking_score | decimal(5,2) | NO | MUL | 0.00 | |
答案 0 :(得分:5)
这是您的查询:
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc JOIN
users u
ON oc.user_id = u.id
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49;
首先,您可以考虑将查询编写为:
,而不是count(distinct)
SELECT count(*) AS count_1
FROM users u
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
EXISTS (SELECT 1
FROM offers_clicks oc
WHERE oc.user_id = u.id AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49
)
然后,此查询的最佳索引为:users(country, last_active, id)
以及offers_clicks(user_id, date, ranking_score)
或offers_clicks(user_id, ranking_score, date)
。
答案 1 :(得分:1)
SELECT count(distinct u.id) AS count_1
FROM users u
STRAIGHT_JOIN offers_clicks oc
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
确保您拥有用户索引 - (id
,last_active
,country
)列
和offers_clicks - (user_id
,date
,ranking_score
)
或者您可以撤销订单
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc
STRAIGHT_JOIN users u
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
确保您在offers_clicks - (user_id
)列上有索引
和用户 - (id
,last_active
,country
)
答案 2 :(得分:0)
SELECT count(users.id) AS count_1
FROM users
INNER JOIN
(SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
) as clicks
ON clicks.user_id = users.id
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
你能为sqlfiddle提供一些数据吗?
你能告诉我这个查询的执行时间是什么:
SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
编辑问题 这个需要多长时间?
SELECT
DISTINCT user_id
FROM
offers_clicks USE INDEX (user_id_4)
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
答案 3 :(得分:0)
尝试另外一步:
SELECT COUNT(users.id)
FROM users, offers_clicks
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
答案 4 :(得分:0)
试试这个:
SELECT count(distinct users.id) AS count_1
FROM users USE index (<see below>)
JOIN offers_clicks USE index (<see below>)
ON offers_clicks.user_id = users.id
AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE
AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49
WHERE users.country = 'US'
AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE
确保users(country, last_active, id)
以及offers_clicks(user_id, ranking_score, date)
和USE
上有索引。
让我知道它是如何表现的,如果它有效,我将解释原因。
答案 5 :(得分:0)
首先,我还认为您应该使用join,并尝试仅加入您在结果中真正需要的行。
至于tables offers_clicks,我认为你不应该使用索引user_id_3并使用user_id_2
因为user_id_2的基数高于user_id_3的基数(相应于你的索引)
它应该更快。
SELECT
count(distinct(users.id)) AS count_1
FROM users USE INDEX (country_2)
JOIN offers_clicks USE INDEX (user_id_2)
ON offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
WHERE users.country = 'US' AND users.last_active > '2015-02-26'
;
对于此查询,您不需要更改表,这就是为什么我认为您可以尝试它。
也许有助于尝试减少日期范围,并且因此减少结果中的行数,它应该更快。
不确定我会有所帮助......