Question

以下查询如何运行需要16个小时？（我们停止执行研究优化，但我们都不是数据库专家。）看起来执行基于集合的排除应该非常简单，对吗？

SELECT 
   field 
FROM
   (subquery that returns 1173126 rows in 20 seconds)
WHERE
   field NOT IN (subquery that returns 3927646 rows in 69 seconds)

我还应该在本说明中包含哪些内容，以便为您提供足够的信息以帮助您？

（实际的查询是在以下情况下发生的，因为它引起了问题。）

SELECT blob FROM (
      SELECT a.line1 + '|' + substring(a.zip,1,5) as blob
      FROM registrations r
      JOIN customers c ON r.custId = c.Id
      JOIN addresses a ON c.addressId = a.Id
      WHERE r.purchaseDate > DATEADD(year,-1,getdate())
      GROUP BY a.line1 + '|' + substring(a.zip,1,5)) sq
WHERE blob NOT IN (
      SELECT a.line1 + '|' + substring(a.zip,1,5) as blob
      FROM registrations r
      JOIN customers c ON r.custId = c.Id
      JOIN addresses a ON c.addressId = a.Id
      WHERE r.purchaseDate BETWEEN DATEADD(year,-5,getdate()) AND DATEADD(year,-1,getdate())
      GROUP BY a.line1 + '|' + substring(a.zip,1,5))

Answer 1

您可能没有意识到这一点，但查询引擎会将NOT IN语句转换为IF语句。因此，在您的示例中，它正在构建一个包含所有行（3.9M）的巨型IF语句。然后，它必须评估每个IF条件以查看该值是否存在。毫不奇怪它需要花费16个多小时才能运行。

试图找到将其转换为EXISTS或者加入的方法会更好。

Answer 2

您似乎正在搜索过去一年内购买但未在过去5年内购买的地址。

SELECT  DISTINCT a.line1, SUBSTRING(a.zip, 1, 5)
FROM    addresses a
WHERE   id IN
        (
        SELECT  c.addressId
        FROM    customers c
        JOIN    registrations r
        ON      r.custId = c.id
        AND     r.purchaseDate > DATEADD(year, -1 ,getdate())
        )
        AND NOT EXISTS
        (
        SELECT  NULL
        FROM    customers c
        JOIN    registrations r
        ON      r.custId = c.id
        JOIN    addresses ai
        ON      ai.id = c.addressId
        WHERE   r.purchaseDate BETWEEN DATEADD(year,-5,getdate()) AND DATEADD(year,-1,getdate())
                AND ai.line1 = a.line1
                AND SUBSTRING(ai.zip, 1, 5) = SUBSTRING(a.zip, 1, 5)
        )

此查询关注具有不同ID的地址上line1, zip的重复项。你有这样的重复吗？

Answer 3

第二个子查询对第一个子查询中的每一行都运行一次。

这意味着，预计完成时间约为（1173126 * 69）= 80945394秒

大约154年......

添加实际查询后，最好的办法是通过向表中添加索引来优化两个查询。我无法确切地告诉你要添加哪些索引，但是有很多关于为表选择正确索引的好文章。

为什么这个查询的运行时间比子查询的总和还要长？

3 个答案: