我最近开始将我的应用程序从一个主机移动到另一个主机。从我的家用电脑到云端的虚拟机。在新节点上测试性能时,我发现严重降级。比较相同查询的结果,使用相同的数据,使用相同版本的mysql。
在我的家用电脑上:
mysql> SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+------+
| id |
+------+
| 8238 |
| 8369 |
+------+
2 rows in set (0,57 sec)
并在新机器上:
mysql> SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+------+
| id |
+------+
| 8369 |
+------+
1 row in set (26.70 sec)
这意味着慢了46倍。那不行。我试图解释为什么它这么慢。对于我的家用电脑:
mysql> explain SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
| 1 | SIMPLE | events | ALL | PRIMARY | NULL | NULL | NULL | 5370 | Using where |
| 1 | SIMPLE | <subquery2> | eq_ref | <auto_key> | <auto_key> | 5 | eventor.events.id | 1 | NULL |
| 2 | MATERIALIZED | results | ALL | idx_event | NULL | NULL | NULL | 1319428 | Using where |
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
3 rows in set (0,00 sec)
对于我的虚拟节点:
mysql> explain SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
| 1 | PRIMARY | events | ALL | NULL | NULL | NULL | NULL | 7297 | Using where |
| 2 | DEPENDENT SUBQUERY | results | index_subquery | idx_event | idx_event | 5 | func | 199 | Using where |
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
2 rows in set (0.00 sec)
正如您所看到的,结果有所不同。我无法弄清楚区别是什么。从所有其他观点来看,这两个系统设置看起来很相似。
答案 0 :(得分:2)
在这种情况下,最可能的问题是子查询的处理。这在一些最新版本的MySQL之间发生了变化(旧版本在优化子查询方面做得不好,最新版本做得更好)。
一个简单的解决方案是将in
替换为exists
和相关的子查询:
SELECT id
FROM events
WHERE exists (SELECT 1
FROM results
WHERE status='Inactive' and results.event = events.id
) AND
(DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
这应该适用于两个版本,特别是如果您有results(status, event)
的索引。
答案 1 :(得分:0)
5.5和5.6之间的差异是因为处理子查询的新优化解释了(如评论中所讨论的)性能差异,但是这个结论也掩盖了原始查询在开始时没有以最佳方式编写的事实。这里似乎根本不需要子查询。
“events”表需要一个索引(status,form,startdate),“results”表需要(status)上的索引和(event)上的另一个索引。
SELECT DISTINCT e.id
FROM events e
JOIN results r ON r.event = e.id AND r.status = 'Inactive'
WHERE (e.form = 'IndSingleDay' OR e.form = 'IndMultiDay')
AND e.status != 10
AND start_date > DATE_SUB(DATE(NOW()), INTERVAL 30 DAY)
AND start_date < DATE_SUB(DATE(NOW()), INTERVAL 2 DAY);
您可能必须调整值“30”和“2”以获得完全相同的逻辑,但这里的重要原则是您永远不想使用列作为{{1}中函数的参数如果可以通过另一种方式重写表达式来避免它,则因为优化器无法通过函数“向后”查找您希望它找到的实际原始值范围。相反,它必须针对所有可能无法消除的可能数据来评估函数。
使用函数派生常量值以与列进行比较,如上所示,允许优化器实现它实际上正在查找start_date值的范围,并相应地缩小可能的行,假设有关的值存在索引。
如果我已正确解码您的查询,如果索引到位,此版本应该比任何子查询更快。