MySQL似乎无法使用GROUP BY子查询优化选择,并最终导致执行时间过长。对于这种常见情况,必须有已知的优化。
我们假设我们正在尝试从数据库返回所有订单,并带有一个标志,指示它是否是客户的第一个订单。
CREATE TABLE orders (order int, customer int, date date);
客户检索第一批订单是超快的。
SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
然而,一旦我们使用子查询
加入完整的订单集,它就变得非常慢SELECT order, first_order FROM orders LEFT JOIN (
SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders ON orders.order=first_orders.first_order;
我希望我们缺少一个简单的技巧,否则它会快1000倍左右
CREATE TEMPORARY TABLE tmp_first_order AS
SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
CREATE INDEX tmp_boost ON tmp_first_order (first_order)
SELECT order, first_order FROM orders LEFT JOIN tmp_first_order
ON orders.order=tmp_first_order.first_order;
修改:
受@ruakh提议的选项3的启发,使用INNER JOIN
和UNION
确实有一个不那么难看的解决方法,它具有可接受的性能但不需要临时表。但是,它有点特殊,我想知道是否存在更通用的优化。
SELECT order, "YES" as first FROM orders INNER JOIN (
SELECT min(order) as first_order FROM orders GROUP BY customer
) AS first_orders_1 ON orders.order=first_orders_1.first_order
UNION
SELECT order, "NO" as first FROM orders INNER JOIN (
SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders_2 ON first_orders_2.customer = orders.customer
AND orders.order > first_orders_2.first_order;
答案 0 :(得分:3)
您可以尝试以下几种方法:
从子查询的字段列表中删除customer
,因为它无论如何都没有做任何事情:
SELECT order,
first_order
FROM orders
LEFT
JOIN ( SELECT MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.order = first_orders.first_order
;
相反,将customer
添加到ON
子句中,所以它实际上为您做了一些事情:
SELECT order,
first_order
FROM orders
LEFT
JOIN ( SELECT customer,
MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.customer = first_orders.customer
AND orders.order = first_orders.first_order
;
与之前相同,但使用INNER JOIN
代替LEFT JOIN
,并将原始ON
子句转换为CASE
表达式:
SELECT order,
CASE WHEN first_order = order THEN first_order END AS first_order
FROM orders
INNER
JOIN ( SELECT customer,
MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.customer = first_orders.customer
;
使用JOIN
表达式中不相关的IN
- 子查询替换整个CASE
方法:
SELECT order,
CASE WHEN order IN
( SELECT MIN(order)
FROM orders
GROUP
BY customer
)
THEN order
END AS first_order
FROM orders
;
用JOIN
表达式中的相关EXISTS
- 子查询替换整个CASE
方法:
SELECT order,
CASE WHEN NOT EXISTS
( SELECT 1
FROM orders AS o2
WHERE o2.customer = o1.customer
AND o2.order < o1.order
)
THEN order
END AS first_order
FROM orders AS o1
;
(上述某些内容很可能实际上会执行更糟,但我认为它们都值得尝试。)
答案 1 :(得分:1)
在使用变量而不是LEFT JOIN时,我希望这会更快:
SELECT
`order`,
If(@previous_customer<>(@previous_customer:=`customer`),
`order`,
NULL
) AS first_order
FROM orders
JOIN ( SELECT @previous_customer := -1 ) x
ORDER BY customer, `order`;
这就是我在SQL Fiddle上的示例返回的内容:
CUSTOMER ORDER FIRST_ORDER
1 1 1
1 2 (null)
1 3 (null)
2 4 4
2 5 (null)
3 6 6
4 7 7