加速大型MySQL查询

时间:2014-03-12 16:00:16

标签: mysql

以下查询在“发送数据”阶段挂起了相当长的时间。这是一个大型查询,但我希望能够获得一些索引的帮助,并可能更多地了解MySQL如何实际选择它将使用哪个索引。

下面是查询以及DESCRIBE语句输出。

mysql> DESCRIBE SELECT e.employee_number, s.current_status_start_date, e.company_code, e.location_code, s.last_suffix_first_mi, s.job_title, SUBSTRING(e.job_code,1,1) tt_jobCode,
->                      SUM(e.current_amount) tt_grossWages,
->                      IFNULL((SUM(e.current_amount) - IF(tt1.tt_reduction = '','0',tt1.tt_reduction)),SUM(e.current_amount)) tt_taxableWages,
->                      t.new_code, STR_TO_DATE(s.last_hire_date, '%Y-%m-%d') tt_hireDate,
->                      IF(s.current_status_code = 'T',STR_TO_DATE(s.current_status_start_date, '%Y-%m-%d'),'') tt_terminationDate,
->                      IFNULL(tt_totalHours,'0') tt_totalHours
->               FROM check_earnings e
->               LEFT JOIN (
->                          SELECT * FROM summary
->                          GROUP BY employee_no
->                          ORDER BY current_status_start_date DESC
->                         ) s
->               ON e.employee_number = s.employee_no
->               LEFT JOIN (
->                          SELECT employee_no, SUM(current_amount__employee) tt_reduction
->                          FROM check_deductions
->                          WHERE STR_TO_DATE(pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
->                          AND STR_TO_DATE(pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
->                          AND (
->                               deduction_code IN ('DECMP','FSAM','FSAC','DCMAK','DCMAT','401KD')
->                               OR deduction_code LIKE 'IM%'
->                               OR deduction_code LIKE 'ID%'
->                               OR deduction_code LIKE 'IV%'
->                               )
->                          GROUP BY employee_no
->                          ORDER BY employee_no ASC
->                          ) tt1
->               ON e.employee_number = tt1.employee_no
->               LEFT JOIN translation t
->               ON e.location_code = t.old_code
->               LEFT JOIN (
->                          SELECT employee_number, SUM(current_hours) tt_totalHours
->                          FROM check_earnings
->                          WHERE STR_TO_DATE(pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
->                          AND STR_TO_DATE(pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
->                          AND earnings_code IN ('REG1','REG2','REG3','REG4')
->                          GROUP BY employee_number
->                         ) tt2
->               ON e.employee_number = tt2.employee_number
->               WHERE STR_TO_DATE(e.pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
->               AND STR_TO_DATE(e.pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
->               AND SUBSTRING(e.job_code,1,1) != 'E'
->               AND e.location_code != '639'
->               AND t.field = 'location_state'
->               GROUP BY e.employee_number
->               ORDER BY s.current_status_start_date DESC, e.location_code ASC, s.last_suffix_first_mi ASC;

+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
| id | select_type | table            | type  | possible_keys  | key             | key_len | ref                        | rows    | Extra                                        |
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
|  1 | PRIMARY     | e                | ALL   | location_code  | NULL            | NULL    | NULL                       | 3498603 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | t                | ref   | field,old_code | old_code        | 303     | historical.e.location_code |       1 | Using where                                  |
|  1 | PRIMARY     | <derived2>       | ALL   | NULL           | NULL            | NULL    | NULL                       |   16741 |                                              |
|  1 | PRIMARY     | <derived3>       | ALL   | NULL           | NULL            | NULL    | NULL                       |    2530 |                                              |
|  1 | PRIMARY     | <derived4>       | ALL   | NULL           | NULL            | NULL    | NULL                       |    2919 |                                              |
|  4 | DERIVED     | check_earnings   | index | NULL           | employee_number | 303     | NULL                       | 3498603 | Using where                                  |
|  3 | DERIVED     | check_deductions | index | deduction_code | employee_no     | 303     | NULL                       | 6387048 | Using where                                  |
|  2 | DERIVED     | summary          | index | NULL           | employee_no     | 303     | NULL                       |   17608 | Using temporary; Using filesort              |
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
8 rows in set, 65535 warnings (32.77 sec)

编辑:在玩了一些索引之后,它现在花费了大部分时间在“复制到tmp表”状态。

1 个答案:

答案 0 :(得分:1)

您无法避免在该查询中使用临时表。一个原因是您按不同的列进行分组而不是排序。

另一个原因是使用派生表(FROM / JOIN子句中的子查询)。

您可以加快速度的一种方法是创建汇总表来存储这些子查询的结果,这样您就不必在每次查询时都这样做。

您还通过搜索STR_TO_DATE()和SUBSTR()等函数的结果来强制执行表扫描。这些不能用索引进行优化。


重新评论:

我可以针对一个小得多的表运行一个SQL查询,运行72小时,但查询效果不佳。

请注意,例如在您的DESCRIBE的输出中,它显示&#34; ALL&#34;对于连接中涉及的几个表。这意味着它必须对所有行进行表扫描(显示在&#39;行&#39;列中)。

经验法则:解决连接需要多少行比较?多个&#39;行&#39;使用相同的&#39; id&#39;连接在一起的所有表格。

+----+-------------+------------------+-------+---------+
| id | select_type | table            | type  | rows    |
+----+-------------+------------------+-------+---------+
|  1 | PRIMARY     | e                | ALL   | 3498603 |
|  1 | PRIMARY     | t                | ref   |       1 |
|  1 | PRIMARY     | <derived2>       | ALL   |   16741 |
|  1 | PRIMARY     | <derived3>       | ALL   |    2530 |
|  1 | PRIMARY     | <derived4>       | ALL   |    2919 |

所以它可能正在评估连接条件432,544,383,105,752,610次(假设这些数字是近似的,所以它可能不会那么糟糕)。它实际上只是一个奇迹,只需要5个小时!

您需要做的是使用索引来帮助查询减少需要检查的行数。

例如,为什么您使用STR_TO_DATE(),因为您要解析的日期是MySQL的本机日期格式?为什么不将这些列存储为DATE数据类型?然后搜索可以使用索引。

您不需要使用索引。&#34;它不像索引是一个谜或具有随机效果。有关介绍,请参阅我的演示文稿How to Design Indexes, Really