以下查询在“发送数据”阶段挂起了相当长的时间。这是一个大型查询,但我希望能够获得一些索引的帮助,并可能更多地了解MySQL如何实际选择它将使用哪个索引。
下面是查询以及DESCRIBE语句输出。
mysql> DESCRIBE SELECT e.employee_number, s.current_status_start_date, e.company_code, e.location_code, s.last_suffix_first_mi, s.job_title, SUBSTRING(e.job_code,1,1) tt_jobCode,
-> SUM(e.current_amount) tt_grossWages,
-> IFNULL((SUM(e.current_amount) - IF(tt1.tt_reduction = '','0',tt1.tt_reduction)),SUM(e.current_amount)) tt_taxableWages,
-> t.new_code, STR_TO_DATE(s.last_hire_date, '%Y-%m-%d') tt_hireDate,
-> IF(s.current_status_code = 'T',STR_TO_DATE(s.current_status_start_date, '%Y-%m-%d'),'') tt_terminationDate,
-> IFNULL(tt_totalHours,'0') tt_totalHours
-> FROM check_earnings e
-> LEFT JOIN (
-> SELECT * FROM summary
-> GROUP BY employee_no
-> ORDER BY current_status_start_date DESC
-> ) s
-> ON e.employee_number = s.employee_no
-> LEFT JOIN (
-> SELECT employee_no, SUM(current_amount__employee) tt_reduction
-> FROM check_deductions
-> WHERE STR_TO_DATE(pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND (
-> deduction_code IN ('DECMP','FSAM','FSAC','DCMAK','DCMAT','401KD')
-> OR deduction_code LIKE 'IM%'
-> OR deduction_code LIKE 'ID%'
-> OR deduction_code LIKE 'IV%'
-> )
-> GROUP BY employee_no
-> ORDER BY employee_no ASC
-> ) tt1
-> ON e.employee_number = tt1.employee_no
-> LEFT JOIN translation t
-> ON e.location_code = t.old_code
-> LEFT JOIN (
-> SELECT employee_number, SUM(current_hours) tt_totalHours
-> FROM check_earnings
-> WHERE STR_TO_DATE(pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND earnings_code IN ('REG1','REG2','REG3','REG4')
-> GROUP BY employee_number
-> ) tt2
-> ON e.employee_number = tt2.employee_number
-> WHERE STR_TO_DATE(e.pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(e.pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND SUBSTRING(e.job_code,1,1) != 'E'
-> AND e.location_code != '639'
-> AND t.field = 'location_state'
-> GROUP BY e.employee_number
-> ORDER BY s.current_status_start_date DESC, e.location_code ASC, s.last_suffix_first_mi ASC;
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
| 1 | PRIMARY | e | ALL | location_code | NULL | NULL | NULL | 3498603 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | t | ref | field,old_code | old_code | 303 | historical.e.location_code | 1 | Using where |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 16741 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 2530 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 2919 | |
| 4 | DERIVED | check_earnings | index | NULL | employee_number | 303 | NULL | 3498603 | Using where |
| 3 | DERIVED | check_deductions | index | deduction_code | employee_no | 303 | NULL | 6387048 | Using where |
| 2 | DERIVED | summary | index | NULL | employee_no | 303 | NULL | 17608 | Using temporary; Using filesort |
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
8 rows in set, 65535 warnings (32.77 sec)
编辑:在玩了一些索引之后,它现在花费了大部分时间在“复制到tmp表”状态。
答案 0 :(得分:1)
您无法避免在该查询中使用临时表。一个原因是您按不同的列进行分组而不是排序。
另一个原因是使用派生表(FROM / JOIN子句中的子查询)。
您可以加快速度的一种方法是创建汇总表来存储这些子查询的结果,这样您就不必在每次查询时都这样做。
您还通过搜索STR_TO_DATE()和SUBSTR()等函数的结果来强制执行表扫描。这些不能用索引进行优化。
重新评论:
我可以针对一个小得多的表运行一个SQL查询,运行72小时,但查询效果不佳。
请注意,例如在您的DESCRIBE的输出中,它显示&#34; ALL&#34;对于连接中涉及的几个表。这意味着它必须对所有行进行表扫描(显示在&#39;行&#39;列中)。
经验法则:解决连接需要多少行比较?多个&#39;行&#39;使用相同的&#39; id&#39;连接在一起的所有表格。
+----+-------------+------------------+-------+---------+
| id | select_type | table | type | rows |
+----+-------------+------------------+-------+---------+
| 1 | PRIMARY | e | ALL | 3498603 |
| 1 | PRIMARY | t | ref | 1 |
| 1 | PRIMARY | <derived2> | ALL | 16741 |
| 1 | PRIMARY | <derived3> | ALL | 2530 |
| 1 | PRIMARY | <derived4> | ALL | 2919 |
所以它可能正在评估连接条件432,544,383,105,752,610次(假设这些数字是近似的,所以它可能不会那么糟糕)。它实际上只是一个奇迹,只需要5个小时!
您需要做的是使用索引来帮助查询减少需要检查的行数。
例如,为什么您使用STR_TO_DATE(),因为您要解析的日期是MySQL的本机日期格式?为什么不将这些列存储为DATE
数据类型?然后搜索可以使用索引。
您不需要使用索引。&#34;它不像索引是一个谜或具有随机效果。有关介绍,请参阅我的演示文稿How to Design Indexes, Really。