Question

抱歉，我无法在标题中更具体。

所以我得到了这个问题：

CREATE TABLE RecordPoints AS (
SELECT competitionId, personId, personCountryId, eventId, year, date,
if(regionalAverageRecord = 'WR',
(SELECT COUNT(DISTINCT personId) FROM ResultDates rd
WHERE rd.eventId=rd2.eventId AND rd.date <= rd2.date AND rd.average > 0), 0) wrAveragePoints,
if(regionalSingleRecord = 'WR',
(SELECT COUNT(DISTINCT personId) FROM ResultDates rd
WHERE rd.eventId=rd2.eventId AND rd.date <= rd2.date), 0) wrSinglePoints,
if(NOT regionalAverageRecord in('WR', 'NR'),
(SELECT COUNT(DISTINCT personId) FROM ResultDates rd
WHERE rd.eventId=rd2.eventId AND rd.date <= rd2.date AND average > 0 AND rd.personCountryId in
(SELECT Countries.id FROM Countries JOIN Continents on Countries.continentId=Continents.id where recordName = rd2.regionalAverageRecord)), 0) crAveragePoints,
if(NOT regionalAverageRecord in('WR', 'NR'),
(SELECT COUNT(DISTINCT personId) FROM ResultDates rd
WHERE rd.eventId=rd2.eventId AND rd.date <= rd2.date AND rd.personCountryId in
(SELECT Countries.id FROM Countries JOIN Continents on Countries.continentId=Continents.id where recordName = rd2.regionalSingleRecord)), 0) crSinglePoints,
if(regionalAverageRecord = 'NR',
(SELECT COUNT(DISTINCT personId) FROM ResultDates rd
WHERE rd.eventId=rd2.eventId AND rd.date <= rd2.date AND rd.personCountryId=rd2.personCountryId AND rd.average > 0 ), 0) nrAveragePoints,
if(regionalSingleRecord = 'NR',
(SELECT COUNT(DISTINCT personId) FROM ResultDates rd
WHERE rd.eventId=rd2.eventId AND rd.date <= rd2.date AND rd.personCountryId=rd2.personCountryId), 0) nrSinglePoints
FROM ResultDates rd2 WHERE (NOT regionalAverageRecord='' OR NOT regionalSingleRecord = ''));

花了9个小时才完成。为了打破它，我正在创建一个表，其中6列是完整的子查询，以计算一个personId出现在同一个表中的次数，然后我根据日期和一些事情看到的第一件事情发生了其他专栏。使用CREATE INDEX date ON ResultDates (date)创建一个日期索引我认为加快了一点，但它仍然需要花费大量的时间。

ResultDates中的行看起来像

+------------+-----------------+---------------+---------+---------+-----+---------+----------------------+-----------------------+-------+-----+------+------------+
| personId   | personCountryId | competitionId | eventId | roundId | pos | average | regionalSingleRecord | regionalAverageRecord | month | day | year | date       |
+------------+-----------------+---------------+---------+---------+-----+---------+----------------------+-----------------------+-------+-----+------+------------+
| 1982THAI01 | USA             | WC1982        | 333     | f       |   1 |       0 | WR                   |                       |     6 |   5 | 1982 | 1982-06-05 |
+------------+-----------------+---------------+---------+---------+-----+---------+----------------------+-----------------------+-------+-----+------+------------+

其中regionalSingleRecord和regionalAverageRecord可以是这些“RecordNames”中的任何一个：WR，NR，大部分时间都没有，或AfR，AsR，ER，NAR，OcR或SAR然后我用来查找countryId基于那些recordNames连接到哪个大陆。

我创建了索引来将这些recordNames连接到洲际和大陆ID到countryIds，但不确定这有多少提高了速度。

运行EXPLAIN就可以了：

+----+--------------------+------------+------------+------+-------------------+--------------+---------+----------------------------------+--------+----------+---------------------------------------------------------------+
| id | select_type        | table      | partitions | type | possible_keys     | key          | key_len | ref                              | rows   | filtered | Extra                                                         |
+----+--------------------+------------+------------+------+-------------------+--------------+---------+----------------------------------+--------+----------+---------------------------------------------------------------+
|  1 | PRIMARY            | rd2        | NULL       | ref  | idx_personId      | idx_personId | 32      | const                            |    567 |    99.00 | Using where                                                   |
|  9 | DEPENDENT SUBQUERY | rd         | NULL       | ALL  | date,idx_personId | NULL         | NULL    | NULL                             | 992294 |     0.33 | Range checked for each record (index map: 0x3)                |
|  8 | DEPENDENT SUBQUERY | rd         | NULL       | ALL  | date,idx_personId | NULL         | NULL    | NULL                             | 992294 |     0.11 | Range checked for each record (index map: 0x3)                |
|  6 | DEPENDENT SUBQUERY | Continents | NULL       | ref  | P_id,recordIndex  | recordIndex  | 9       | cubing.rd2.regionalSingleRecord  |      1 |   100.00 | Using index; Start temporary                                  |
|  6 | DEPENDENT SUBQUERY | Countries  | NULL       | ALL  | NULL              | NULL         | NULL    | NULL                             |    203 |    10.00 | Using where; Using join buffer (Block Nested Loop)            |
|  6 | DEPENDENT SUBQUERY | rd         | NULL       | ALL  | date              | NULL         | NULL    | NULL                             | 992294 |     0.33 | Range checked for each record (index map: 0x1); End temporary |
|  4 | DEPENDENT SUBQUERY | Continents | NULL       | ref  | P_id,recordIndex  | recordIndex  | 9       | cubing.rd2.regionalAverageRecord |      1 |   100.00 | Using index; Start temporary                                  |
|  4 | DEPENDENT SUBQUERY | Countries  | NULL       | ALL  | NULL              | NULL         | NULL    | NULL                             |    203 |    10.00 | Using where; Using join buffer (Block Nested Loop)            |
|  4 | DEPENDENT SUBQUERY | rd         | NULL       | ALL  | date              | NULL         | NULL    | NULL                             | 992294 |     0.11 | Range checked for each record (index map: 0x1); End temporary |
|  3 | DEPENDENT SUBQUERY | rd         | NULL       | ALL  | date,idx_personId | NULL         | NULL    | NULL                             | 992294 |     3.33 | Range checked for each record (index map: 0x3)                |
|  2 | DEPENDENT SUBQUERY | rd         | NULL       | ALL  | date,idx_personId | NULL         | NULL    | NULL                             | 992294 |     1.11 | Range checked for each record (index map: 0x3)                |
+----+--------------------+------------+------------+------+-------------------+--------------+---------+----------------------------------+--------+----------+---------------------------------------------------------------+

我一直在谷歌搜索如何提高它的速度。根据我的谷歌搜索，我知道它看起来不太好。特别是我正在查看的初始表中的992294行。

我的问题是，我不知道如何进行优化以使所有这些更快。我已经读过精心设计的索引可以提高速度，所以我很好奇这里可以使用哪种索引。

Answer 1

select子句中的子查询可能非常昂贵。相关的子查询通常表现不佳，通常有更好的选择。

我没有时间给出一个彻底的答案，但是我通过略读查询的一般印象是，您可以在主查询中将其重构为JOIN ResultDates到一次;然后在SELECT子句中使用条件聚合。像这样......

SELECT rd.competitionId, rd.personId, rd.personCountryId, rd.eventId
   , rd.year, rd.date
   , COUNT(DISTINCT IF(rd.regionalAverageRecord = 'WR' AND rdPrev.average > 0, rdPrev.person_id, NULL) AS wrAveragePoints
   , COUNT(DISTINCT IF(regionalSingleRecord = 'WR', rdPrev.person_id, NULL) AS wrSinglePoints
   , [etc....]
FROM ResultDates AS rd 
LEFT JOIN ResultDates AS rdPrev 
   ON rd.eventId=rdPrev.eventId 
   AND rdPrev.date <= rd.date
WHERE (NOT rd.regionalAverageRecord='' OR NOT rd.regionalSingleRecord = '')
;

编辑：对于涉及Countries和Continents表的子查询/字段，您也可以只LEFT JOIN这些表，并以类似的方式使用连接值至于我如何在rdPrev.average计算中使用wrAveragePoints。

注意：COUNT()和大多数其他聚合函数忽略NULL值。

优化长MySql查询的索引

1 个答案: