今天在我的mysql-slow.log
中发现了一个非常长的SQL查询我想问一些SQL专家如何正确格式化并执行此SQL。
sql背后的想法: 从2个表中返回所有不在mailchimp表中的电子邮件,并仅返回DISTINCT值(用户和订阅者电子邮件可能重复)。还包括城市和语言以及结果。
正如你所看到的,query_time是怪物长,检查的行只是wtf组合2个表,应该只有大约20k行。
Query_time: 113.216544 Lock_time: 0.000180 Rows_sent: 43 Rows_examined: 208280841
SELECT * FROM
( SELECT u.email AS email, u.city, u.language FROM users AS u
LEFT JOIN mailchimp AS m ON u.email = m.email WHERE m.email IS NULL GROUP BY u.email
UNION SELECT s.email AS email, s.city, s.language FROM subscribers AS s
LEFT JOIN mailchimp AS m ON s.email = m.email WHERE m.email IS NULL GROUP BY s.email )
AS sync GROUP BY sync.email ORDER BY sync.email ASC;
EXPLAIN for query
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 23 | Using temporary; Using filesort |
| 2 | DERIVED | u | ALL | NULL | NULL | NULL | NULL | 10482 | Using temporary; Using filesort |
| 2 | DERIVED | m | ALL | NULL | NULL | NULL | NULL | 11411 | Using where; Not exists |
| 3 | UNION | s | ALL | NULL | NULL | NULL | NULL | 2709 | Using temporary; Using filesort |
| 3 | UNION | m | ALL | NULL | NULL | NULL | NULL | 11411 | Using where; Not exists |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
6 rows in set (2 min 1.65 sec)
答案 0 :(得分:3)
在解释计划中请注意,没有可用的密钥。这会让表现变得糟糕。对于每个用户记录,您必须扫描整个mailchimp表。然后,对于每个订户记录,您扫描整个mailchimp表。你做了大约10482 * 11411 + 2709 * 11411的读取。
也许一个MySQL专家可以在这里讨论,但正如我理解MySQL文档,它不像其他数据库引擎那样进行哈希匹配。一切都是循环和匹配。
您可以通过在mailchimp.email上创建索引来显着提高性能。
答案 1 :(得分:1)
这有助于解决您的结果吗?我添加了UNION ALL,一个简单的UNION是一种浪费的循环,因为你在外部查询中进行分组。
SELECT * FROM
( SELECT u.email AS email, u.city, u.language FROM users AS u
LEFT JOIN mailchimp AS m ON u.email = m.email WHERE m.email IS NULL GROUP BY u.email
UNION ALL
SELECT s.email AS email, s.city, s.language FROM subscribers AS s
LEFT JOIN mailchimp AS m ON s.email = m.email WHERE m.email IS NULL GROUP BY s.email )
AS sync GROUP BY sync.email ORDER BY sync.email ASC;
答案 2 :(得分:1)
我猜你这三张桌子上没有索引。在所有3个表上的字段email
上添加索引; users
,subscribers
和mailchimp
并再次运行查询 - 和EXPLAIN - 。
您的查询:
SELECT *
FROM
( SELECT u.email AS email, u.city, u.language
FROM users AS u
LEFT JOIN mailchimp AS m
ON u.email = m.email
WHERE m.email IS NULL
GROUP BY u.email
UNION
SELECT s.email AS email, s.city, s.language
FROM subscribers AS s
LEFT JOIN mailchimp AS m
ON s.email = m.email
WHERE m.email IS NULL
GROUP BY s.email
)
AS sync
GROUP BY sync.email
ORDER BY sync.email ASC;
可以这样写(删除两个内部GROUP BY
并将UNION
转换为UNION ALL
):
SELECT *
FROM
( SELECT u.email AS email, u.city, u.language
FROM users AS u
LEFT JOIN mailchimp AS m
ON u.email = m.email
WHERE m.email IS NULL
UNION ALL
SELECT s.email AS email, s.city, s.language
FROM subscribers AS s
LEFT JOIN mailchimp AS m
ON s.email = m.email
WHERE m.email IS NULL
)
AS sync
GROUP BY sync.email
ORDER BY sync.email ASC;
或者像这样(将LEFT JOIN - check IS NULL
转换为NOT EXISTS
),这有时会更快:
SELECT *
FROM
( SELECT u.email AS email, u.city, u.language
FROM users AS u
WHERE NOT EXISTS
( SELECT *
FROM mailchimp AS m
WHERE u.email = m.email
)
UNION ALL
SELECT s.email AS email, s.city, s.language
FROM subscribers AS s
WHERE NOT EXISTS
( SELECT *
FROM mailchimp AS m
WHERE s.email = m.email
)
)
AS sync
GROUP BY sync.email
ORDER BY sync.email ASC;
无论如何,将索引添加到email
字段!