Question

我正在使用SQL Server 2005，当我想在IN子句中使用子查询时过滤某些结果时，我注意到了一些奇怪的事情。例如，这是我当前的查询，它平均在70秒内运行：

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales
from Archive 
where CustomerID = 20
and ReportDate = '2/3/2011'
and Phone in (
    select Phone
    from PlanDetails 
    where Phone is not null
    and Length is not null
    and PlannedImp > 0
    and CustomerID = 20
    and (StatusID <> 2 and StatusID <> 7)
    and SubcategoryID = 88
)
group by Phone, ZipCode

但是，如果我将它们分解为2个单独的查询，则每个查询运行时间不到1秒。

select Phone
from PlanDetails 
where Phone is not null
and Length is not null
and PlannedImp > 0
and CustomerID = 20
and (StatusID <> 2 and StatusID <> 7)
and SubcategoryID = 88

和

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales
from Archive 
where CustomerID = 20
and ReportDate = '2/3/2011'
group by Phone, ZipCode

最后，如果我这样做，它会返回与第一个查询相同的结果，但大约需要2-3秒：

select Phone
into #tempTable
from PlanDetails
where Phone is not null
and Length is not null
and PlannedImp > 0
and CustomerID = 20
and (StatusID <> 2 and StatusID <> 7)
and SubcategoryID = 88

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales
from Archive 
where CustomerID = 20
and ReportDate = '2/3/2011'
and Phone in (
    select Phone
    from #tempTable
)
group by Phone, ZipCode

在过去的几周里，我注意到不仅这个查询很慢，而且在IN子句中使用（有点复杂的）子查询的任何查询都会破坏性能。这是什么原因？

可供这些查询使用的唯一索引是两个表的CustomerID上的非聚簇索引。我查看了慢查询和快速查询的执行计划，并发现存档表上的非聚集索引查找是迄今为止最高的成本百分比（80-90％）。但是，唯一的区别是慢查询中的步骤的CPU成本为7.1，而快速查询的CPU成本为1.7。

Answer 1

这取决于数据库系统，版本，设置等，但通常最终发生的是数据库失败（或拒绝）缓存该内部查询，因此它正在执行每次迭代< / strong>的外部查询。您正在将问题从O（n）效率类更改为O（n ^ 2）。

Answer 2

引用IN vs. JOIN vs. EXISTS：

我们现在看到，与流行的观点相反，IN / EXISTS查询的效率并不低于SQL Server中的JOIN查询。

实际上，JOIN查询在非索引表上的效率较低，因为半连接方法允许对单个哈希表进行聚合和匹配，而JOIN需要分两步执行这两个操作。

除此之外，索引以及当前表统计信息如何在优化器决定执行查询的过程中发挥重要作用。

Answer 3

如果使用连接重写查询怎么办？

select a.Phone, a.ZipCode, sum(a.Calls) as Calls, sum(a.Sales) as Sales
from Archive a
    inner join PlanDetails pd
        on a.CustomerID = pd.CustomerID
            and a.Phone = pd.Phone
where a.CustomerID = 20
    and a.ReportDate = '2/3/2011'
    and pd.Length is not null
    and pd.PlannedImp > 0
    and (pd.StatusID <> 2 and pd.StatusID <> 7)
    and pd.SubcategoryID = 88
group by a.Phone, a.ZipCode

Answer 4

我建议2个解决方案：
1.尝试使用EXISTS而不是IN重写您的查询。如果您使用较旧的SQL Server版本可能会有所帮助（如果我的内存在SQL Server 2005 EXITST和IN生成不同的执行计划之前很好地为我服务）。
2.尝试使用INNER JOIN（您也可以使用CTE）：

select Phone, ZipCode, sum(Calls) as Calls, sum(Sales) as Sales
from Archive 
INNER JOIN 
(
  select DISTINCT Phone // DISTINCT to avoid duplicates
  from PlanDetails 
  where Phone is not null
  and Length is not null
  and PlannedImp > 0
  and CustomerID = 20
  and (StatusID <> 2 and StatusID <> 7)
  and SubcategoryID = 88
)XX ON (XX.Phone = Archive.Phone)  
where CustomerID = 20 and ReportDate = '2/3/2011'    
group by Phone, ZipCode

就个人而言，我希望第二种方法能给你带来更好的效果。

使用IN（子查询）时性能损失很大。为什么？

4 个答案: