优化SQL:未订购x天的客户

时间:2014-01-17 09:06:31

标签: mysql sql auto-responder

我创建了这个SQL,以便找到尚未订购X天的客户。

它返回一个结果集,所以这篇文章主要是为了得到第二个意见,以及可能的优化。

SELECT o.order_id,
       o.order_status,
       o.order_created,
       o.user_id,
       i.identity_firstname,
       i.identity_email,

  (SELECT COUNT(*)
   FROM orders o2
   WHERE o2.user_id=o.user_id
     AND o2.order_status=1) AS order_count,

  (SELECT o4.order_created
   FROM orders o4
   WHERE o4.user_id=o.user_id
     AND o4.order_status=1
   ORDER BY o4.order_created DESC LIMIT 1) AS last_order
FROM orders o
INNER JOIN user_identities ui ON o.user_id=ui.user_id
INNER JOIN identities i ON ui.identity_id=i.identity_id
   AND i.identity_email!=''
INNER JOIN subscribers s ON i.identity_id=s.identity_id
  AND s.subscriber_status=1
  AND s.subsriber_type=e
  AND s.subscription_id=1
WHERE DATE(o.order_created) = "2013-12-14"
  AND o.order_status=1
  AND o.user_id NOT IN
    (SELECT o3.user_id
     FROM orders o3
     WHERE o3.user_id=o.user_id
       AND o3.order_status=1
       AND DATE(o3.order_created) > "2013-12-14")

你们可以找到这个SQL的任何潜在问题吗?日期是动态插入的。

我投入生产的最终SQL基本上只包括o.order_id,i.identity_id和o.order_count - 这个order_count需要是正确的。其他选定的字段和'last_order'子查询将不包括在内,它仅用于测试。

这应该会给我一个在特定日期有最后订单的用户列表,并且是一个简报订阅者。我特别怀疑WHERE子句中的NOT IN部分的正确性以及order_count子查询。

2 个答案:

答案 0 :(得分:2)

有几个问题:

一个。在可索引列上使用函数

您正在通过将DATE(order_created)与某个常量进行比较来搜索订单。这是一个糟糕的主意,因为a)为每一行(CPU)执行DATE()函数,b)数据库不能使用列上的索引(假设存在一个)

B中。使用WHERE ID NOT IN (...)

使用NOT IN (...)几乎总是一个坏主意,因为优化器通常会遇到此构造的问题,并且经常会导致计划错误。您几乎总是可以将其表达为具有WHERE条件的外部联接,该条件使用加入列的IS NULL条件过滤未命中(并添加不需要DISTINCT的附带好处,因为只有一个小姐回来了)

℃。离开连接过滤大部分行太晚了

越早越好,可以通过不加入来掩盖行。您可以通过加入不太可能匹配联接表列表中较早的表的方式,并将非键条件放入join而不是where子句来尽可能早地排除行来实现此目的。无论如何一些优化器,但我经常发现它们没有

d。避免像瘟疫这样的相关子查询!

您有几个相关的子查询 - 为主表的每行执行的子查询。这真是一个非常糟糕的主意。有时候,优化器有时可以将它们组合成一个连接,但为什么依赖(希望)呢。大多数相关的子查询可以表示为连接;你的例子也不例外。

考虑到上述情况,有一些具体的变化:

  • o2和o4是相同的连接,因此o4可以完全免除 - 转换为连接后只需使用o2
  • DATE(order_created) = "2013-12-14"应写为order_created between "2013-12-14 00:00:00" and "2013-12-14 23:59:59"

此查询应该是您想要的:

SELECT
    o.order_id,
    o.order_status,
    o.order_created,
    o.user_id,
    i.identity_firstname,
    i.identity_email,
    count(o2.user_id) AS order_count,
    max(o2.order_created) AS last_order
FROM orders o
LEFT JOIN orders o2 ON o2.user_id = o.user_id AND o2.order_status=1
LEFT JOIN orders o3 ON o3.user_id = o.user_id 
    AND o3.order_status=1
    AND o3.order_created >= "2013-12-15 00:00:00"
JOIN user_identities ui ON o.user_id=ui.user_id
JOIN identities i ON ui.identity_id=i.identity_id AND i.identity_email != ''
JOIN subscribers s ON i.identity_id=s.identity_id
  AND s.subscriber_status=1
  AND s.subsriber_type=e
  AND s.subscription_id=1
WHERE o.order_created between "2013-12-14 00:00:00" and "2013-12-14 23:59:59"
AND o.order_status=1
AND o3.order_created IS NULL -- This gets only missed joins on o3
GROUP BY
    o.order_id,
    o.order_status,
    o.order_created,
    o.user_id,
    i.identity_firstname,
    i.identity_email;

最后一行是使用NOT IN (...)

LEFT JOIN达到相同的效果

免责声明:未经测试。

答案 1 :(得分:0)

无法对结果发表评论,因为您尚未发布任何表声明或示例数据,但您的查询有3个相关的子查询,这可能会使其表现不佳(好的,其中一个是针对last_order而且是仅用于测试)。

消除相关的子查询并用连接替换它们会产生如下结果: -

SELECT o.order_id,
        o.order_status,
        o.order_created,
        o.user_id,
        i.identity_firstname,
        i.identity_email,
        Sub1.order_count,
        Sub2.last_order
FROM orders o
INNER JOIN user_identities ui ON o.user_id=ui.user_id
INNER JOIN identities i ON ui.identity_id=i.identity_id
   AND i.identity_email!=''
INNER JOIN subscribers s ON i.identity_id=s.identity_id
  AND s.subscriber_status=1
  AND s.subsriber_type=e
  AND s.subscription_id=1
LEFT OUTER JOIN
(
    SELECT user_id, COUNT(*) AS order_count
    FROM orders 
    WHERE order_status=1
    GROUP BY user_id
) Sub1
ON o.user_id = Sub1.user_id
LEFT OUTER JOIN
(
    SELECT user_id, MAX(order_created) as last_order
    FROM orders 
    WHERE order_status=1
    GROUP BY user_id
) AS Sub2
ON o.user_id = Sub2.user_id
LEFT OUTER JOIN
(
    SELECT DISTINCT user_id
    FROM orders 
    WHERE order_status=1
    AND DATE(order_created) > "2013-12-14"
) Sub3
ON o.user_id = Sub3.user_id
WHERE DATE(o.order_created) = "2013-12-14"
  AND o.order_status=1
  AND Sub3.user_id IS NULL