在左连接中使用group by的MySQL子查询 - 优化

时间:2012-12-22 16:54:42

标签: mysql optimization group-by subquery left-join

MySQL似乎无法使用GROUP BY子查询优化选择,并最终导致执行时间过长。对于这种常见情况,必须有已知的优化。

我们假设我们正在尝试从数据库返回所有订单,并带有一个标志,指示它是否是客户的第一个订单。

CREATE TABLE orders (order int, customer int, date date);

客户检索第一批订单是超快的。

SELECT customer, min(order) as first_order FROM orders GROUP BY customer;

然而,一旦我们使用子查询

加入完整的订单集,它就变得非常慢
SELECT order, first_order FROM orders LEFT JOIN ( 
  SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders ON orders.order=first_orders.first_order;

我希望我们缺少一个简单的技巧,否则它会快1000倍左右

CREATE TEMPORARY TABLE tmp_first_order AS 
  SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
CREATE INDEX tmp_boost ON tmp_first_order (first_order)

SELECT order, first_order FROM orders LEFT JOIN tmp_first_order 
  ON orders.order=tmp_first_order.first_order;

修改
受@ruakh提议的选项3的启发,使用INNER JOINUNION确实有一个不那么难看的解决方法,它具有可接受的性能但不需要临时表。但是,它有点特殊,我想知道是否存在更通用的优化。

SELECT order, "YES" as first FROM orders INNER JOIN ( 
    SELECT min(order) as first_order FROM orders GROUP BY customer
  ) AS first_orders_1 ON orders.order=first_orders_1.first_order
UNION
SELECT order, "NO" as first FROM orders INNER JOIN ( 
    SELECT customer, min(order) as first_order FROM orders GROUP BY customer
  ) AS first_orders_2 ON first_orders_2.customer = orders.customer 
    AND orders.order > first_orders_2.first_order;

2 个答案:

答案 0 :(得分:3)

您可以尝试以下几种方法:

  1. 从子查询的字段列表中删除customer,因为它无论如何都没有做任何事情:

    SELECT order,
           first_order
      FROM orders
      LEFT
      JOIN ( SELECT MIN(order) AS first_order
               FROM orders
              GROUP
                 BY customer
           ) AS first_orders
        ON orders.order = first_orders.first_order
    ;
    
  2. 相反,将customer添加到ON子句中,所以它实际上为您做了一些事情:

    SELECT order,
           first_order
      FROM orders
      LEFT
      JOIN ( SELECT customer,
                    MIN(order) AS first_order
               FROM orders
              GROUP
                 BY customer
           ) AS first_orders
        ON orders.customer = first_orders.customer
       AND orders.order = first_orders.first_order
    ;
    
  3. 与之前相同,但使用INNER JOIN代替LEFT JOIN,并将原始ON子句转换为CASE表达式:

    SELECT order,
           CASE WHEN first_order = order THEN first_order END AS first_order
      FROM orders
     INNER
      JOIN ( SELECT customer,
                    MIN(order) AS first_order
               FROM orders
              GROUP
                 BY customer
           ) AS first_orders
        ON orders.customer = first_orders.customer
    ;
    
  4. 使用JOIN表达式中不相关的IN - 子查询替换整个CASE方法:

    SELECT order,
           CASE WHEN order IN
                      ( SELECT MIN(order)
                          FROM orders
                         GROUP
                            BY customer
                      )
                THEN order
            END AS first_order
      FROM orders
    ;
    
  5. JOIN表达式中的相关EXISTS - 子查询替换整个CASE方法:

    SELECT order,
           CASE WHEN NOT EXISTS
                      ( SELECT 1
                          FROM orders AS o2
                         WHERE o2.customer = o1.customer
                           AND o2.order < o1.order
                      )
                THEN order
            END AS first_order
      FROM orders AS o1
    ;
    
  6. (上述某些内容很可能实际上会执行更糟,但我认为它们都值得尝试。)

答案 1 :(得分:1)

在使用变量而不是LEFT JOIN时,我希望这会更快:

SELECT
  `order`,
  If(@previous_customer<>(@previous_customer:=`customer`),
    `order`,
    NULL
  ) AS first_order
FROM orders
JOIN ( SELECT @previous_customer := -1 ) x
ORDER BY customer, `order`;

这就是我在SQL Fiddle上的示例返回的内容:

CUSTOMER    ORDER    FIRST_ORDER
1           1        1
1           2        (null)
1           3        (null)
2           4        4
2           5        (null)
3           6        6
4           7        7