如何优化或正确编写此MYSQL查询?

时间:2011-06-08 14:58:36

标签: mysql

今天在我的mysql-slow.log

中发现了一个非常长的SQL查询

我想问一些SQL专家如何正确格式化并执行此SQL。

sql背后的想法: 从2个表中返回所有不在mailchimp表中的电子邮件,并仅返回DISTINCT值(用户和订阅者电子邮件可能重复)。还包括城市和语言以及结果。

正如你所看到的,query_time是怪物长,检查的行只是wtf组合2个表,应该只有大约20k行。

Query_time: 113.216544  Lock_time: 0.000180 Rows_sent: 43  Rows_examined: 208280841

SELECT * FROM 
    ( SELECT u.email AS email, u.city, u.language FROM users AS u 
        LEFT JOIN mailchimp AS m ON u.email = m.email WHERE m.email IS NULL GROUP BY u.email 
        UNION SELECT s.email AS email, s.city, s.language FROM subscribers AS s 
        LEFT JOIN mailchimp AS m ON s.email = m.email WHERE m.email IS NULL GROUP BY s.email ) 
    AS sync GROUP BY sync.email ORDER BY sync.email ASC;

EXPLAIN for query

+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows  | Extra                           |
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
|  1 | PRIMARY      | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    23 | Using temporary; Using filesort |
|  2 | DERIVED      | u          | ALL  | NULL          | NULL | NULL    | NULL | 10482 | Using temporary; Using filesort |
|  2 | DERIVED      | m          | ALL  | NULL          | NULL | NULL    | NULL | 11411 | Using where; Not exists         |
|  3 | UNION        | s          | ALL  | NULL          | NULL | NULL    | NULL |  2709 | Using temporary; Using filesort |
|  3 | UNION        | m          | ALL  | NULL          | NULL | NULL    | NULL | 11411 | Using where; Not exists         |
| NULL | UNION RESULT | <union2,3> | ALL  | NULL          | NULL | NULL    | NULL |  NULL |                                 |
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
6 rows in set (2 min 1.65 sec)

3 个答案:

答案 0 :(得分:3)

在解释计划中请注意,没有可用的密钥。这会让表现变得糟糕。对于每个用户记录,您必须扫描整个mailchimp表。然后,对于每个订户记录,您扫描整个mailchimp表。你做了大约10482 * 11411 + 2709 * 11411的读取。

也许一个MySQL专家可以在这里讨论,但正如我理解MySQL文档,它不像其他数据库引擎那样进行哈希匹配。一切都是循环和匹配。

您可以通过在mailchimp.email上创建索引来显着提高性能。

答案 1 :(得分:1)

这有助于解决您的结果吗?我添加了UNION ALL,一个简单的UNION是一种浪费的循环,因为你在外部查询中进行分组。

SELECT * FROM 
    ( SELECT u.email AS email, u.city, u.language FROM users AS u 
        LEFT JOIN mailchimp AS m ON u.email = m.email WHERE m.email IS NULL GROUP BY u.email 
        UNION ALL
      SELECT s.email AS email, s.city, s.language FROM subscribers AS s 
        LEFT JOIN mailchimp AS m ON s.email = m.email WHERE m.email IS NULL GROUP BY s.email ) 
    AS sync GROUP BY sync.email ORDER BY sync.email ASC;

答案 2 :(得分:1)

我猜你这三张桌子上没有索引。在所有3个表上的字段email上添加索引; userssubscribersmailchimp并再次运行查询 - 和EXPLAIN - 。

您的查询:

SELECT *
FROM 
  ( SELECT u.email AS email, u.city, u.language
    FROM users AS u 
      LEFT JOIN mailchimp AS m
        ON u.email = m.email
      WHERE m.email IS NULL
      GROUP BY u.email 
  UNION
    SELECT s.email AS email, s.city, s.language
    FROM subscribers AS s 
    LEFT JOIN mailchimp AS m
      ON s.email = m.email
    WHERE m.email IS NULL
    GROUP BY s.email
  ) 
  AS sync
GROUP BY sync.email
ORDER BY sync.email ASC;

可以这样写(删除两个内部GROUP BY并将UNION转换为UNION ALL):

SELECT *
FROM 
  ( SELECT u.email AS email, u.city, u.language
    FROM users AS u 
      LEFT JOIN mailchimp AS m
        ON u.email = m.email
      WHERE m.email IS NULL
  UNION ALL
    SELECT s.email AS email, s.city, s.language
    FROM subscribers AS s 
    LEFT JOIN mailchimp AS m
      ON s.email = m.email
    WHERE m.email IS NULL
  ) 
  AS sync
GROUP BY sync.email
ORDER BY sync.email ASC;

或者像这样(将LEFT JOIN - check IS NULL转换为NOT EXISTS),这有时会更快:

SELECT *
FROM 
  ( SELECT u.email AS email, u.city, u.language
    FROM users AS u 
    WHERE NOT EXISTS
      ( SELECT * 
        FROM mailchimp AS m
        WHERE u.email = m.email
      )
  UNION ALL
    SELECT s.email AS email, s.city, s.language
    FROM subscribers AS s 
    WHERE NOT EXISTS
      ( SELECT * 
        FROM mailchimp AS m
        WHERE s.email = m.email
      )
  ) 
  AS sync
GROUP BY sync.email
ORDER BY sync.email ASC;

无论如何,将索引添加到email字段!