SELECT行的最有效方法WHERE ID EXISTS IN第二个表

时间:2016-07-18 19:40:14

标签: sql sql-server performance sql-server-2012

我希望从第二个表格中存在ID的一个表中选择所有记录。

以下两个查询返回正确的结果:

查询1:

SELECT *
FROM Table1 t1
WHERE EXISTS (SELECT 1 FROM Table2 t2 WHERE t1.ID = t2.ID)

查询2:

SELECT *
FROM Table1 t1
WHERE t1.ID IN (SELECT t2.ID FROM Table2 t2)

其中一个查询是否比另一个更有效?我应该使用一个吗?有没有第三种方法我没想到会更有效率?

3 个答案:

答案 0 :(得分:7)

摘要:

  

IN和EXISTS在所有情况下的表现相似。以下是用于验证的参数..

     

执行成本,时间:
  两者相同,优化器生成相同的计划。
   内存授予:
  两个查询都相同    Cpu时间,逻辑读取:
  尽管读取相同,但存在似乎在CPU时间方面略微超过IN。

我使用下面的测试数据集运行每个查询10次..

  1. 非常大的子查询结果集(100000行)
  2. 重复行
  3. 空行
  4. 对于上述所有情况,INEXISTS都以相同的方式执行。

    有关用于测试的Performance V3 database的一些信息。 20000个客户拥有1000000个订单,因此每个客户在订单表中随机复制(范围为10到100个)。

    执行成本,时间:
    下面是两个运行查询的屏幕截图。观察每个查询的相对成本。

    enter image description here

    内存成本:
    两个查询的内存授权也是相同的..我强制MDOP 1,以免将它们泄露给TEMPDB ..

    enter image description here

    CPU时间,读取:

    对于存在:

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    
    (1 row(s) affected)
    
     SQL Server Execution Times:
       CPU time = 469 ms,  elapsed time = 595 ms.
    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    

    对于IN:

    (20000 row(s) affected)
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    
    (1 row(s) affected)
    
     SQL Server Execution Times:
       CPU time = 547 ms,  elapsed time = 669 ms.
    SQL Server parse and compile time: 
       CPU time = 0 ms, elapsed time = 0 ms.
    

    在每种情况下,优化器都足够智能,可以重新排列查询。

    我倾向于使用EXISTS(我的意见)。使用EXISTS的一个用例是当您不想返回第二个表结果集时。

    根据Martin Smith的查询进行更新:

    我运行了以下查询,以找到从第一个表中获取第二个表中存在引用的行的最有效方法。

    SELECT DISTINCT c.*
    FROM Customers c
    JOIN Orders o ON o.custid = c.custid   
    
    SELECT c.*
    FROM Customers c
    INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid
    
    SELECT *
    FROM Customers C
    WHERE EXISTS(SELECT 1 FROM Orders o WHERE o.custid = c.custid)
    
    SELECT *
    FROM Customers c
    WHERE custid IN (SELECT custid FROM Orders)
    

    除了第二个INNER JOIN之外,上述所有查询共享相同的费用,其余部分的计划相同。

    enter image description here

    内存授予:
    此查询

    SELECT DISTINCT c.*
    FROM Customers c
    JOIN Orders o ON o.custid = c.custid 
    

    所需的内存授予量

    enter image description here

    此查询

    SELECT c.*
    FROM Customers c
    INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid 
    

    所需的内存授予..

    enter image description here

    CPU时间,读取:
    对于查询:

    SELECT DISTINCT c.*
    FROM Customers c
    JOIN Orders o ON o.custid = c.custid   
    
    (20000 row(s) affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 48, logical reads 1344, physical reads 96, read-ahead reads 1248, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    
     SQL Server Execution Times:
       CPU time = 1453 ms,  elapsed time = 781 ms.
    

    查询:

    SELECT c.*
    FROM Customers c
    INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid
    
    (20000 row(s) affected)
    Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    
     SQL Server Execution Times:
       CPU time = 1499 ms,  elapsed time = 403 ms.
    

答案 1 :(得分:1)

它们在纯SQL语法中的含义完全相同,因此更“高效”取决于编译器如何构建计划,以及引擎如何获取数据。唯一可以确定的方法是双向运行并比较结果

即便如此,差异仅适用于在该上下文中,这意味着可能存在另一种情况,其他情况更快。

答案 2 :(得分:0)

一般情况相同但是当一个人做得更好时,存在赢了(对我而言)

如果t1.id是唯一的,那么你可以只做一个连接

SELECT t1.*
FROM Table1 t1
JOIN Table2 t2 
ON t1.ID = t2.ID 

即使t2.ID不是唯一的,你只需要唯一的行,那么你可以

SELECT distinct t1.*
FROM Table1 t1
JOIN Table2 t2 
ON t1.ID = t2.ID