LEFT JOIN执行时间过长

时间:2019-12-24 14:07:27

标签: mysql sql

我有两个桌子。一个是压光机,第二个是final_registration,如下所示:

*--------------------------*
| calender_id |  datefield | 
*--------------------------*
|     1       | 2015-07-13 |
|     2       | 2015-07-14 |
|     3       | 2015-07-15 |
|     4       | 2015-07-16 |
|     -       | ---------- |
|     -       | ---------- |
|     -       | ---------- |
|     5647    | 2030-12-28 |
|     5648    | 2030-12-29 |
|     5649    | 2030-12-30 |
|     5650    | 2030-12-31 |
*--------------------------*

所以我的第一张桌子大约有5650条记录。

现在第二张表是我的注册表,我在其中存储带有预订日期的用户信息

*--------------------------------------------------*
| id |  name |     booking_date    | ticket_status | 
*--------------------------------------------------*
|  1 |  RAM  | 2018-12-24 12:54:53 |    active     |
|  2 |  RAO  | 2018-12-24 12:54:53 |    active     |
|  3 |  RAT  | 2018-12-24 12:54:53 |    active     |
|  4 |  PAL  | 2018-11-24 12:54:53 |    active     |
|  5 |  TOM  | 2018-10-24 12:54:53 |    active     |
|  6 |  SAM  | 2018-10-24 12:54:53 |    active     |
|  7 |  RAT  | 2018-09-24 12:54:53 |    active     |
|  8 |  MAT  | 2019-12-24 12:54:53 |    active     |
|  9 |  NOT  | 2019-12-24 12:54:53 |    active     |
| 10 |  RAM  | 2019-12-24 12:54:53 |    active     |
*--------------------------------------------------*

现在我想统计一下2018年哪本书按月分拆的注册量。

| booking_date | countT |
|   2018-01    |   0    |
|   2018-02    |   0    |
|   2018-03    |   0    |
|   2018-04    |   0    |
|   2018-05    |   0    |
|   2018-06    |   0    |
|   2018-07    |   0    |
|   2018-08    |   0    |
|   2018-09    |   1    |
|   2018-10    |   2    |
|   2018-11    |   1    |
|   2018-12    |   3    |

我正在使用以下查询,我的查询给了我正确的输出,但是问题是执行时间。至少要花10分钟才能执行。

SELECT 
  DATE_FORMAT(calendar.datefield, '%Y-%m') AS booking_date, 
  COUNT(final_registration.booking_date) AS countT 
FROM calendar 
LEFT JOIN final_registration ON DATE_FORMAT(final_registration.booking_date, '%Y-%m-%d') = 
    DATE_FORMAT(calendar.datefield, '%Y-%m-%d') 
  AND final_registration.ticket_status IN ('active', 'cancelled') 
WHERE DATE_FORMAT(calendar.datefield, '%Y') = $year 
GROUP BY DATE_FORMAT(calendar.datefield, '%Y-%m')

3 个答案:

答案 0 :(得分:2)

我会建议一个相关的子查询和索引:

SELECT yyyymm, 
       (SELECT COUNT(*)
        FROM final_registration fr
        WHERE fr.status IN ('active', 'cancelled') AND 
              fr.booking_date >= c.month_start AND
              fr.booking_date < c.month_start + interval 1 month
       ) as countT
FROM (SELECT DATE_FORMAT(c.datefield, '%Y-%m') as yyyymm,
             MIN(c.datefield) as month_start
      FROM calendar c
      WHERE YEAR(c.datefield) = ?  -- PASS IN AS PARAMETER!!!
      GROUP BY yyyymm
     ) c  
ORDER BY c.yyyymm;

所需的索引位于final_registration(datefield, status)上。

与您的查询相比,这有几个好处:

  • 它可以使用索引进行日期比较,因为第二个日期的日期列上没有使用任何功能。
  • 它避免了昂贵的外部GROUP BY

还请注意使用参数,而不是用字面值来修饰查询。

答案 1 :(得分:0)

我认为索引中存在这个问题。 只有在DATE_FORMAT(final_registration.booking_date, '%Y-%m-%d')上具有基于函数的索引的情况下,您的查询才能很好地工作。我不确定您使用的是哪个版本的MySQL,它是否提供了这样的选项...

但是无论如何,我敢打赌,您在final_registration.booking_date上有一个简单的索引。这样,您的join子句是不正确的,因为将不使用索引。因此,您不应将日期转换为字符以使索引起作用:

LEFT JOIN final_registration ON final_registration.booking_date = calendar.datefield

顺便说一句,WHERE子句也有此问题。总是比表字段更喜欢转换参数,例如:

WHERE calendar.datefield BETWEEN str_to_date(concat("01-01-", year(now())), "%d-%m-%Y") AND str_to_date(concat("31-12-", year(now())), "%d-%m-%Y")

答案 2 :(得分:0)

我建议在连接之前执行聚合,并实际计算出所需范围的开始和结束,并使用BETWEEN;在您的where条件会破坏性能的情况下使用DATE_FORMAT()甚至是YEAR()之类的函数(如果您在调用它们的日期字段上没有索引)...。此外,请确保您在booking_date上有一个索引。

SELECT c.booking_year, c.booking_month, bookingSummary.countT
FROM (
    SELECT DISTINCT YEAR(datefield) AS booking_year, MONTH(datefield) AS booking_month
    FROM calendar 
    WHERE c.datefield BETWEEN [firstdayofyear] AND [lastdayofyear]
) AS c 
LEFT JOIN (
    SELECT YEAR(booking_date) AS booking_year, MONTH(booking_date) AS booking_month
        , COUNT(*) AS countT 
    FROM final_registration AS fr
    WHERE fr.ticket_status IN ('active', 'cancelled') 
        AND fr.booking_date BETWEEN [firstdayofyear] AND [lastdayofyear]
    GROUP BY booking_year, booking_month
) AS bookingSummary
USING (booking_year, booking_month)
;

如果您有支持CTE的MySQL版本,则甚至可以不使用日历表。可以使用将数字1-12生成为“ booking_month”的CTE(并在该字段中加入)。

WITH calendar_months AS (
   SELECT 1 AS booking_month
   UNION SELECT booking_month + 1 FROM calendar_months WHERE booking_month < 12
)
SELECT [year] AS booking_year, cm.booking_month, bookingSummary.countT
FROM calendar_months AS cm 
LEFT JOIN (
    SELECT MONTH(booking_date) AS booking_month
        , COUNT(*) AS countT 
    FROM final_registration AS fr
    WHERE fr.ticket_status IN ('active', 'cancelled') 
        AND fr.booking_date BETWEEN [firstdayofyear] AND [lastdayofyear]
    GROUP BY booking_month
) AS bookingSummary
USING (booking_month)
;

注意:将我的[field]表示法视为参数的占位符;我建议我在第一个介绍CTE版本的原因之一是它需要维护的参数较少。