优化慢速MySQL选择查询

时间:2015-03-08 21:29:56

标签: mysql sql query-optimization

编辑:在查看了这里的一些答案和研究时间之后,我的团队得出结论,很可能没有办法优化这一点,而不是我们能够实现的4.5秒(除非可能在offers_clicks上进行分区,但这会产生一些难看的副作用)。最后,经过大量的头脑风暴,我们决定拆分两个查询,创建两组用户ID(一个来自users表,一个来自offers_clicks),并将它们与Python中的set进行比较。来自users表的id组仍然是从SQL中提取的,但我们决定将offers_clicks移动到Lucene并在其上添加了一些缓存,这样就可以从中获取另一组ID了。最终结果是缓存为0.9秒,无缓存为0.9秒。

原始帖子开始:我无法优化查询。第一个版本的查询很好,但是在第二个查询中加入了offers_clicks,查询变得相当慢。 Users表包含1000万行,offers_clicks包含5300万行。

可接受的表现:

SELECT count(distinct(users.id)) AS count_1
FROM users USE index (country_2)
WHERE users.country = 'US'
  AND users.last_active > '2015-02-26';
1 row in set (0.35 sec)

为:

SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks USE index (user_id_3), users USE index (country_2)
WHERE users.country = 'US'
  AND users.last_active > '2015-02-26'
  AND offers_clicks.user_id = users.id
  AND offers_clicks.date > '2015-02-14'
  AND offers_clicks.ranking_score < 3.49
  AND offers_clicks.ranking_score > 0.24;
1 row in set (7.39 sec)

以下是它的外观而不指定任何索引(甚至更糟):

SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks, users
WHERE users.country IN ('US')
  AND users.last_active > '2015-02-26'
  AND offers_clicks.user_id = users.id
  AND offers_clicks.date > '2015-02-14'
  AND offers_clicks.ranking_score < 3.49
  AND offers_clicks.ranking_score > 0.24;
1 row in set (17.72 sec)

说明:

explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table         | type  | possible_keys | key       | key_len | ref                          | rows   | Extra                    |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
|  1 | SIMPLE      | users         | range | country_2     | country_2 | 14      | NULL                         | 245014 | Using where; Using index |
|  1 | SIMPLE      | offers_clicks | ref   | user_id_3     | user_id_3 | 4       | dejong_pointstoshop.users.id | 270153 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+

解释而不指定任何索引:

mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table         | type  | possible_keys                                                          | key       | key_len | ref                          | rows   | Extra                    |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
|  1 | SIMPLE      | users         | range | PRIMARY,last_active,country,last_active_2,country_2                    | country_2 | 14      | NULL                         | 221606 | Using where; Using index |
|  1 | SIMPLE      | offers_clicks | ref   | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4       | dejong_pointstoshop.users.id |      3 | Using where              |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+

这是我试过的一大堆索引并没有太大的成功:

+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table         | Non_unique | Key_name                    | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| offers_clicks |          1 | user_id_3                   |            1 | user_id         | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_3                   |            2 | ranking_score   | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_3                   |            3 | date            | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_2                   |            1 | user_id         | A         |    17838712 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_2                   |            2 | date            | A         |    53516137 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_4                   |            1 | user_id         | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_4                   |            2 | date            | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_4                   |            3 | ranking_score   | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| users         |          1 | country_2                   |            1 | country         | A         |          14 |     NULL | NULL   |      | BTREE      |         |               |
| users         |          1 | country_2                   |            2 | last_active     | A         |     8048529 |     NULL | NULL   |      | BTREE      |         |               |

简化用户架构:

+---------------------------------+---------------+------+-----+---------------------+----------------+
| Field                           | Type          | Null | Key | Default             | Extra          |
+---------------------------------+---------------+------+-----+---------------------+----------------+
| id                              | int(11)       | NO   | PRI | NULL                | auto_increment |
| country                         | char(2)       | NO   | MUL |                     |                |
| last_active                     | datetime      | NO   | MUL | 2000-01-01 00:00:00 |                |

Simplified提供点击模式:

+-----------------+------------------+------+-----+---------------------+----------------+
| Field           | Type             | Null | Key | Default             | Extra          |
+-----------------+------------------+------+-----+---------------------+----------------+
| id              | int(11)          | NO   | PRI | NULL                | auto_increment |
| user_id         | int(11)          | NO   | MUL | 0                   |                |
| offer_id        | int(11) unsigned | NO   | MUL | NULL                |                |
| date            | datetime         | NO   | MUL | 0000-00-00 00:00:00 |                |
| ranking_score   | decimal(5,2)     | NO   | MUL | 0.00                |                |

6 个答案:

答案 0 :(得分:5)

这是您的查询:

SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc JOIN
     users u
     ON oc.user_id = u.id
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
      oc.date > '2015-02-14' AND
      oc.ranking_score > 0.24 AND oc.ranking_score < 3.49;

首先,您可以考虑将查询编写为:

,而不是count(distinct)
SELECT count(*) AS count_1
FROM users u
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
      EXISTS (SELECT 1
              FROM offers_clicks oc
              WHERE oc.user_id = u.id AND
                    oc.date > '2015-02-14' AND
                    oc.ranking_score > 0.24 AND oc.ranking_score < 3.49
             )

然后,此查询的最佳索引为:users(country, last_active, id)以及offers_clicks(user_id, date, ranking_score)offers_clicks(user_id, ranking_score, date)

答案 1 :(得分:1)

SELECT count(distinct u.id) AS count_1
FROM users u
STRAIGHT_JOIN offers_clicks oc
     ON oc.user_id = u.id
WHERE 
    u.country IN ('US') 
    AND u.last_active > '2015-02-26' 
    AND oc.date > '2015-02-14' 
    AND oc.ranking_score > 0.24 
    AND oc.ranking_score < 3.49;

确保您拥有用户索引 - (idlast_activecountry)列 和offers_clicks - (user_iddateranking_score

或者您可以撤销订单

SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc 
STRAIGHT_JOIN users u
     ON oc.user_id = u.id
WHERE 
    u.country IN ('US') 
    AND u.last_active > '2015-02-26' 
    AND oc.date > '2015-02-14' 
    AND oc.ranking_score > 0.24 
    AND oc.ranking_score < 3.49;

确保您在offers_clicks - (user_id)列上有索引 和用户 - (idlast_activecountry

答案 2 :(得分:0)

SELECT count(users.id) AS count_1 
FROM users 
INNER JOIN
  (SELECT
    DISTINCT user_id
  FROM
    offers_clicks
  WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24
  ) as clicks
ON clicks.user_id  = users.id
WHERE users.country IN ('US') 
    AND users.last_active > '2015-02-26' 

你能为sqlfiddle提供一些数据吗?

你能告诉我这个查询的执行时间是什么:

SELECT
    DISTINCT user_id
  FROM
    offers_clicks
  WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24

编辑问题 这个需要多长时间?

SELECT
    DISTINCT user_id
  FROM
    offers_clicks USE INDEX (user_id_4)
  WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24

答案 3 :(得分:0)

尝试另外一步:

SELECT COUNT(users.id)
    FROM users, offers_clicks
    WHERE users.country = 'US'
        AND users.last_active > '2015-02-26'
        AND offers_clicks.user_id = users.id
        AND offers_clicks.date > '2015-02-14'
        AND offers_clicks.ranking_score < 3.49
        AND offers_clicks.ranking_score > 0.24;

答案 4 :(得分:0)

试试这个:

SELECT count(distinct users.id) AS count_1
FROM users USE index (<see below>)
JOIN offers_clicks USE index (<see below>)
    ON offers_clicks.user_id = users.id
    AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE
    AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49
WHERE users.country = 'US'
AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE

确保users(country, last_active, id)以及offers_clicks(user_id, ranking_score, date)USE上有索引。

让我知道它是如何表现的,如果它有效,我将解释原因。

答案 5 :(得分:0)

首先,我还认为您应该使用join,并尝试仅加入您在结果中真正需要的行。
至于tables offers_clicks,我认为你不应该使用索引user_id_3并使用user_id_2 因为user_id_2的基数高于user_id_3的基数(相应于你的索引) 它应该更快。

SELECT
    count(distinct(users.id)) AS count_1
FROM users USE INDEX (country_2)
JOIN offers_clicks USE INDEX (user_id_2)
    ON  offers_clicks.user_id = users.id
    AND offers_clicks.date > '2015-02-14'
    AND offers_clicks.ranking_score < 3.49
    AND offers_clicks.ranking_score > 0.24
WHERE users.country = 'US' AND users.last_active > '2015-02-26'
;

对于此查询,您不需要更改表,这就是为什么我认为您可以尝试它。
也许有助于尝试减少日期范围,并且因此减少结果中的行数,它应该更快。

不确定我会有所帮助......